Apple Study Concludes ChatGPT Has a Long Way to Go
As the use of AI has grown in functionality and popularity, language models like ChatGPT and DeepSeek have been embraced by consumers all across the globe. While it might seem like these AI models are increasingly capable of nearly human-like logic and reason, a recent study from Apple reveals that might not be the case.
Last week, Apple published a paper on their recent study, which very much pumped the brakes on some optimism about the advancements in AI technology. The study found that large reasoning models (LRMs), which are an even more advanced form of AI compared to large language models (LLMs), faced a “complete accuracy collapse” when presented with highly complex problems.
Apple tested LLMs like ChatGPT GPT-4, Claude 3.7 Sonnet, and DeepSeek V3. For LRMs, it tested ChatGPT o1, ChatGPT o3-mini, Gemini, Claude 3.7 Sonnet Thinking, and DeepSeek R1.
The study tested each model's ability to solve puzzles and found that LLMs performed better than reasoning models when the difficulty was easy, and that LRMs performed better at medium difficulty. However, the study found that once the tasks reached the hard level, all models failed.
The researchers found that as LRMs neared performance collapse, they began “reducing their reasoning effort.” In the paper, which was titled "The Illusion of Thinking," Apple researchers said they found this “particularly concerning.”
"Particularly concerning is the counterintuitive reduction in reasoning effort as problems approach critical complexity, suggesting an inherent compute scaling limit in LRMs," the paper stated. "Our detailed analysis of reasoning traces further exposed complexity-dependent reasoning patterns, from inefficient 'overthinking' on simpler problems to complete failure on complex ones. These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning."
Based on these findings, the paper concluded that the current approach to AI may have reached its limits, stating that the models displayed "fundamental barriers to generalizable reasoning."
Obviously, this is telling about the future of AI as it suggests that AI might not be as far along as some people think.
Men's Journal's Blog
- Men's Journal's profile
- 1 follower
