What is the "strawberry problem" that large-scale language models like GPT-4 and Claude face? -Gegazin

Large-scale language model (LLM)While AI based on AI can demonstrate high ability,It is easy to be fooled by liesThere are properties likeThe ability to think about word problems in mathematics is lower than that of primary school studentsIts fragility is often pointed out, as evidenced by research results. Machine learning engineer Chinmay Gog explains the vulnerability known as the “Strawberry Problem,” which shows the limits of AI's power.

The 'Strawberry' Problem: How to Overcome the Limitations of Artificial Intelligence | venturebeat
https://venturebeat.com/ai/the-Strawberrry-problem-how-to-overcome-ais-limitations/

Generative AI like ChatGPT and Stable Diffusion allow anyone to easily demonstrate advanced abilities, such as writing advanced sentences and code and producing photorealistic illustrations and images. However, for example, image generation AI has a vulnerability called the “one banana problem.” Daniel Hook, CEO of IT news site Digital Science, said that when he created an image containing the phrase “one banana,” the result was “two bananas in a bunch.” According to CEO Hawk, the AI has biases such as “there are two bananas in the picture” and doesn’t actually understand what the prompt is specifying, and it is believed this may be due to bias.

What is the “one banana problem” that highlights the problems facing generative AI? -Gegazin

As a similar problem, Mr. Gouge raises the “strawberry problem” as a problem faced by LLM holders who deal with writing. In the image below, Mr. Gog asked ChatGPT, “How many 'rs' are there in strawberries? (How many 'r's are there in the English word strawberry?). As anyone can see by looking at the word 'strawberry', there are three Letters of “r” in total: 3rd, 8th, and 9th letter However, in the image below, ChatGPT answers, “The word 'strawberry' contains two letters 'r'.”

Also, the following is interactive AI using LLM developed by Anthropic.ClaudeI asked Claude the same question: What is the number of r's in a strawberry, and the answer was the same: “There are two r's in a strawberry.''

In other examples, LLM-based AI systems sometimes make errors when calculating “m” in “mammal” or “p” in “hippopotamus.” According to Mr. Gouge, the strawberry problem stems from the characteristics of LLM.

Almost all high-performance MBAs are deep learning models published by researchers at Google and others.adapterBased on. The converter does not take the input text directly, but uses a process to “identify” the text as a digital representation. Some symbols are whole words and some are parts of words. For example, if there is a token for the word “strawberry”, it will be read as is, but depending on the word, the input may be imported by combining the tokens “straw” and “berry”. By dividing the input into tokens, the model can more accurately predict which token will come next in the sentence.

Therefore, although LLM-based AI that deals with tokens is good at predicting the content of sentences based on context, it is difficult to segment words into alphabetic units. “This problem may not occur in a model architecture where we can see individual characters directly without encoding them,” Gouge said. “However, with the current transformer architecture, this is not possible.”

Based on the problems faced by LLM, Mr. Gouge also explains how to avoid the strawberry problem. The way to do this is to ask questions using code from a programming language, rather than interacting with it in plain text. Below is Mr. Jog's answer on ChatGPT, “Answer the number of strawberry rupees using Python and show the code and explanation.'' he asked. ChatGPT used Python's count function and was able to answer “Output = 3”, which is the integer number for “r”.

“A simple experiment in counting letters has revealed the fundamental limitations of LLM,” Gouge said. “This experiment demonstrated that LLM is a symbolic pattern matching predictive algorithm, and that it is not an ‘intelligence’ capable of understanding and inferring what kind of action can help mitigate The problem is somewhat problematic, and it is important to recognize the limitations of artificial intelligence in order to use it effectively and have realistic expectations.

Copy the title and URL of this article

・Related articles
What is the “one banana problem” that highlights the problems facing generative AI? -Gegazin

Apple AI researchers announce research findings that “current AI language models have less ability to reason about mathematical word problems than elementary school students” – GIGAZINE

Why are large-scale linguistic models (LLMs) so easy to fool? -Gegazin

Art and Engineering Experts Explain Why Artificial Intelligence Is Ineffective at Drawing 'Hands' – Gigazine

OpenAI shows groundbreaking results called 'Strawberries' to US officials, aims to bypass GPT-4 by creating training data for pioneering LLM journal codenamed 'Orion' – GIGAZINE

Nathaniel Loxley

“Travel maven. Beer expert. Subtly charming alcohol fan. Internet junkie. Avid bacon scholar.”

What is the “strawberry problem” that large-scale language models like GPT-4 and Claude face? -Gegazin

AMD Ryzen chipset driver 6.10.17.152 has been released. The motherboard supports X870E/X870 chipset

Launching “Maya 2025.3”! Graph editor, logic improvements, OpenPBR surface shader, Bifrost 2.11.0.0, and more

Touch settings in Windows 11 will be improved – Edge Gestures can now be disabled – Window Forest

Michael Newman dies: 'Baywatch' actor was 68 years old

The Dow Jones, Nasdaq and S&P 500 indexes decline as Treasury yields reach their highest level since July.

AMD Ryzen chipset driver 6.10.17.152 has been released. The motherboard supports X870E/X870 chipset

80-million-year-old dinosaur 'tiny eggs' discovered at a Chinese construction site are the smallest eggs ever found – and belong to a never-before-seen T. rex relative

More Stories

AMD Ryzen chipset driver 6.10.17.152 has been released. The motherboard supports X870E/X870 chipset

Launching “Maya 2025.3”! Graph editor, logic improvements, OpenPBR surface shader, Bifrost 2.11.0.0, and more

Touch settings in Windows 11 will be improved – Edge Gestures can now be disabled – Window Forest

You may have missed

Michael Newman dies: 'Baywatch' actor was 68 years old

The Dow Jones, Nasdaq and S&P 500 indexes decline as Treasury yields reach their highest level since July.

AMD Ryzen chipset driver 6.10.17.152 has been released. The motherboard supports X870E/X870 chipset

80-million-year-old dinosaur 'tiny eggs' discovered at a Chinese construction site are the smallest eggs ever found – and belong to a never-before-seen T. rex relative