Forget GPT-5! OpenAI launches new AI model family o1 claiming PhD-level performance

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Since the launch of OpenAI’s powerful proprietary large language model (LLM) GPT-4 in March 2023 — 18 months ago — users and developers have wondered about when the company that kicked off the generative AI craze in Silicon Valley, and around the world, would launch the next version, presumed to be called GPT-5.

As it turns out, the GPT series is being leapfrogged for now by a whole new family of models.

Today, following months of reports and rumors that intensified in recent days, OpenAI announced its “o1” AI model family beginning with two models: o1-preview and o1-mini, which the company says are designed to “reason through complex tasks and solve harder problems” than the GPT series models.

Both models are available today for ChatGPT Plus users but are initially limited to 30 messages per week for o1-preview and 50 for o1-mini.

However, OpenAI also cautions that “As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT-4o will be more capable in the near term.”

Indeed, our initial tests trying to use it to create an image for this article found that it could not. On OpenAI’s API platform website, the company clarifies that in its beta state, the model family supports “text only, images are not supported.”

What o1 does better than GPT

OpenAI claims its new o1 series is particularly well-suited for users tackling complex problems in fields like science, healthcare, and technology.

OpenAI envisions the models being used for a wide range of applications, from helping physicists generate mathematical formulas for quantum optics to assisting healthcare researchers in annotating cell sequencing data.

Developers will also find the o1-mini model effective for building and executing multi-step workflows, debugging code, and solving programming challenges efficiently.

o1-preview performs at PhD levels

The o1-preview model is designed to handle challenging tasks by dedicating more time to thinking and refining its responses, similar to how a person would approach a complex problem.

In tests, this approach has allowed the model to perform at a level close to that of PhD students in areas like physics, chemistry, and biology.

Additionally, the o1-preview model excels in coding, ranking in the 89th percentile in Codeforces competitions, showcasing its ability to handle multi-step workflows, debug complex code, and generate accurate solutions.

In benchmark tasks such as the International Mathematics Olympiad (IMO) qualifying exam, o1-preview demonstrated its prowess by solving 83% of the problems, a sharp improvement over the 13% success rate of its predecessor, GPT-4o.

It is already available for use in ChatGPT by Plus and Team users, with Enterprise and Edu users gaining access next week. The models are also available via the OpenAI API for developers who qualify for API usage tier 5, though initial rate limits will apply.

o1-mini is less powerful but 80% cheaper

In conjunction with o1-preview, OpenAI has also launched the o1-mini model, a more streamlined version designed to offer faster and cheaper reasoning capabilities.

While optimized primarily for coding and STEM tasks, the o1-mini still delivers strong performance, particularly in math and programming.

On the IMO math benchmark, o1-mini scored 70%, nearly matching the 74% of o1-preview while offering a significantly lower inference cost. It also performed competitively in coding evaluations, achieving an Elo score of 1650 on Codeforces, positioning it among the top 86% of programmers.

With an 80% lower price tag compared to o1-preview, the o1-mini is aimed at developers and researchers who require reasoning capabilities but don’t need the broader knowledge that the more advanced o1-preview model offers.

This cost-effective solution will also be available to ChatGPT Plus, Team, Enterprise, and Edu users, with plans to extend access to ChatGPT Free users in the future.

Safety and security enhancements

In line with OpenAI’s commitment to safety, both models incorporate a new safety training approach that enhances their ability to follow safety and alignment guidelines.

OpenAI highlights that o1-preview scored an impressive 84 on one of its toughest jailbreaking tests, a significant improvement over GPT-4o’s score of 22. The ability to reason about safety rules in context allows these models to better handle unsafe prompts and avoid generating inappropriate content.

As part of broader safety efforts, OpenAI has entered into agreements with the U.S. and U.K. AI Safety Institutes.

These partnerships include granting early access to a research version of the o1 models to help in the evaluation and testing of future AI systems.

OpenAI’s safety work also includes comprehensive internal governance and collaboration with the federal government, reinforced by regular testing, red-teaming, and board-level oversight from the company’s Safety & Security Committee.

What’s next for OpenAI’s o1 Series

Although the o1-preview and o1-mini models are powerful tools for reasoning and problem-solving, OpenAI acknowledges that this is just the beginning.

The company plans to regularly update and improve these models, including adding features like browsing, file and image uploading, and function calling, which are currently not available in the API version.

Looking ahead, OpenAI will continue to develop both its GPT and o1 series, further expanding the capabilities of AI in various fields. Users can expect ongoing advancements as the company works to increase the usefulness and accessibility of these models across different applications.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link lol