Apple’s Study Proves LLM AI Models Cannot Reason

October 15, 2024 thetechtribune

Apple’s AI researchers have found that large language models (LLMs), such as those from Meta and OpenAI, still struggle with basic reasoning skills. They introduced a benchmark called GSM-Symbolic to measure reasoning capabilities, revealing that slight changes in wording can lead to significantly different answers, demonstrating the models’ fragility in mathematical reasoning.

The study showed that when irrelevant contextual information was added to math problems, the accuracy of answers could drop by as much as 65%. For example, in a question about counting kiwis, the models mistakenly subtracted kiwis due to irrelevant information about their size. This inconsistency supports findings from earlier studies, suggesting that LLMs rely on pattern matching rather than genuine reasoning. The study concludes that LLMs currently lack the reliability needed for dependable reasoning-based tasks, as small alterations in input can drastically affect their outputs.

SOURCE