Researchers Find Best AI Can’t Solve Most Coding Problems

February 25, 2025 thetechtribune

OpenAI researchers admit that even the most advanced AI models still lag behind human coders, despite CEO Sam Altman’s claims that AI could surpass low-level software engineers by year’s end.

Using a new benchmark called SWE-Lancer, OpenAI tested GPT-4o, its o1 reasoning model, and Anthropic’s Claude 3.5 Sonnet on over 1,400 software engineering tasks from Upwork. While the models worked faster than humans, they struggled with bug detection, context understanding, and root cause analysis, often producing superficial or incorrect solutions.

Claude 3.5 Sonnet outperformed OpenAI’s models but still produced mostly incorrect answers, highlighting AI’s need for greater reliability before it can replace human engineers. Despite rapid advancements, AI remains insufficient for complex software development tasks.

SOURCE