Apple researchers have reported that large language models (LLMs), such as ChatGPT, lack true logical reasoning abilities and are easily confused by irrelevant information, according to TechCrunch. Their study, published in a paper titled “Understanding the Limitations of Mathematical Reasoning in Large Language Models,” questions the logical capabilities of AI. While LLMs can handle simple mathematical problems, adding unnecessary details often leads to incorrect results.
For instance, a model can solve: “Oliver picked 44 kiwis on Friday, 58 on Saturday, and twice as many as Friday’s amount on Sunday. How many kiwis does Oliver have in total?” However, the introduction of irrelevant information—like “On Sunday, 5 of those kiwis were slightly smaller than average”—can confuse the model. In this case, it may wrongly subtract five kiwis from the total, even though their size doesn’t affect the count.
Patterns Over Logic
Mehrdad Farajtabar, one of the paper’s co-authors, emphasizes that such mistakes show LLMs rely on patterns from training data rather than actual understanding. “This decline [in performance] suggests that current LLMs lack true logical reasoning. Instead, they attempt to replicate reasoning steps from their training data,” the paper explains.
While some researchers argue that careful prompt engineering could mitigate these issues, Farajtabar warns that solving complex problems might require vast amounts of contextual data, notes NIXSOLUTIONS. This is because LLMs are more easily distracted by irrelevant information, which a child would naturally ignore.
Future of AI Reasoning
Does this imply that LLMs are incapable of reasoning? Perhaps. However, the lack of a clear understanding of AI reasoning leaves the question open. It’s possible that these models “reason” in ways we haven’t yet recognized or cannot control. The uncertainty surrounding LLMs presents intriguing opportunities for further research, and we’ll keep you updated as new insights emerge.