We’re Still Waiting for the Next Big Leap in AI

When OpenAI announced GPT-4, its latest large language model, last March, it sent shockwaves through the tech world. It was clearly more capable than anything seen before at chatting, coding, and solving all sorts of thorny problems—including school homework.

Anthropic, a rival to OpenAI, announced today that it has made its own AI advance that will upgrade chatbots and other use cases. But although the new model is the world’s best by some measures, it’s more of a step forward than a big leap.

Anthropic’s new model, called Claude 3.5 Sonnet, is an upgrade to its existing Claude 3 family of AI models. It is more adept at solving math, coding, and logic problems as measured by commonly used benchmarks. Anthropic says it is also a lot faster, better understands nuances in language, and even has a better sense of humor.

That’s no doubt useful to people trying to build apps and services on top of Anthropic’s AI models. But the company’s news is also a reminder that the world is still waiting for another AI leap forward in AI akin to that delivered by GPT-4.

Expectation has been building for OpenAI to release a sequel called GPT-5 for more than a year now, and the company’s CEO, Sam Altman, has encouraged speculation that it will deliver another revolution in AI capabilities. GPT-4 cost more than $100 million to train, and GPT-5 is widely expected to be much larger and more expensive.

Although OpenAI, Google, and other AI developers have released new models that out-do GPT-4, the world is still waiting for that next big leap. Progress in AI has lately become more incremental and more reliant on innovations in model design and training rather than brute-force scaling of model size and computation, as GPT-4 did.

Michael Gerstenhaber, head of product at Anthropic, says the company’s new Claude 3.5 Sonnet model is larger than its predecessor but draws much of its new competence from innovations in training. For example, the model was given feedback designed to improve its logical reasoning skills.

Anthropic says that Claude 3.5 Sonnet outscores the best models from OpenAI, Google, and Facebook in popular AI benchmarks including GPQA, a graduate-level test of expertise in biology, physics, and chemistry; MMLU, a test covering computer science, history, and other topics; and HumanEval, a measure of coding proficiency. The improvements are a matter of a few percentage points though.

This latest progress in AI might not be revolutionary but it is fast-paced: Anthropic only announced its previous generation of models three months ago. “If you look at the rate of change in intelligence you’ll appreciate how fast we’re moving,” Gerstenhaber says.

More than a year after GPT-4 spurred a frenzy of new investment in AI, it may be turning out to be more difficult to produce big new leaps in machine intelligence. With GPT-4 and similar models trained on huge swathes of online text, imagery, and video, it is getting more difficult to find new sources of data to feed to machine-learning algorithms. Making models substantially larger, so they have more capacity to learn, is expected to cost billions of dollars. When OpenAI announced its own recent upgrade last month, with a model that has voice and visual capabilities called GPT-4o, the focus was on a more natural and humanlike interface rather than on substantially more clever problem-solving abilities.