OpenAI Upgrades Its Smartest AI Model With Improved Reasoning Skills

How to Evaluate an LLM's Ability to Follow Instructions


OpenAI today announced an improved version of its most capable artificial intelligence model to date—one that takes even more time to deliberate over questions—just a day after Google announced its first model of this type.

OpenAI’s new model, called o3, replaces o1, which the company introduced in September. Like o1, the new model spends time ruminating over a problem in order to deliver better answers to questions that require step-by-step logical reasoning.

The o3 model scores much higher on several measures than its predecessor, OpenAI says, including ones that measure complex coding-related skills and advanced math and science competency. It is three times better than o1 at answering questions posed by ARC-AGI, a benchmark designed to test an AI models’ ability to reason over problems they’re encountering for the first time.

Google is pursuing a similar line of research. Noam Shazeer, a Google researcher, yesterday revealed in a post on X that the company has developed its own reasoning model, called Gemini 2.0 Flash Thinking. Google’s CEO, Sundar Pichai, called it “our most thoughtful model yet” in his own post.

The two dueling models show competition between OpenAI and Google to be fiercer than ever. It is crucial for OpenAI to demonstrate that it can keep making advances as it seeks to attract more investment and build a profitable business. Google is meanwhile desperate to show that it remains at the forefront of AI research.

The new models also show how AI companies are increasingly looking beyond simply scaling up AI models in order to wring greater intelligence out of them.

Large language models can answer many questions remarkably well, but they often stumble when asked to solve puzzles that require basic math or logic. OpenAI’s o1 incorporates training on step-by-step problem-solving that makes an AI model better able to tackle these types of problems.

Models that reason over problems will also be important as companies seek to deploy so-called AI agents that can reliably figure out how to solve complex problems on a users’ behalf. The o3 model is 20 percent better than o1 at a SWE-Bench, a test that measures a models’ agentic abilities.

While a true breakthrough moment has eluded tech giants at the end of the year, the pace of AI announcements has been dizzying of late.

Early this month Google announced a new version of its flagship model, called Gemini 2.0, and demonstrated it as a web browsing helper and as an assistant that sees the world through a smartphone or a pair of smart glasses.

OpenAI has made numerous announcements in the run up to Christmas, including a new version of its video-generating model, a free version of its ChatGPT-powered search engine, and a way to access ChatGPT over the phone by calling 1-800-ChatGPT.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.