OpenAI Launches o3 Model, Breaks ARC-AGI Benchmark After 5 Years

In the last of the “12 Days of OpenAI” drop announcements, OpenAI has today announced a stunning large update: the opening of the o3 reasoning models and the o3-mini models. Fittingly, o3 also became the first AI model to break the famous ARC-AGI benchmark, which had been undefeated for five years.

OpenAI

OpenAI Launches o3 and o3-mini Models, First to Break ARC-AGI Benchmark in 5 Years, Paving Path to AGI

Using high-compute resources and an extended processing window, o3 reached 87.5% on the ARC-AGI Semi-Private Evaluation Set. It surpasses the ARC Prize hurdle (85%) which is generally the level humans achieve. By comparison, OpenAI’s o1 model from earlier has a score of only 32%.

OpenAI o32 1 OpenAI Launches o3 Model, Breaks ARC-AGI Benchmark After 5 Years

The new measurement represents AI models on their performance in solving new and unseen problems, rather than how good they are at pattern generalization, hence it is called the ARC-AGI test. If passing this test is an indicator, OpenAI’s o3 model is a large step toward Artificial General Intelligence (AGI), which can be considered a system that has the potential to be equal to or even surpass human intelligence.

Besides the ARC-AGI milestone, the o3 model also received high scores in other challenging benchmarks such as 71.7 in SWE-bench Verified, 2,727 in Codeforces, 96.7 in AIME 2024, and 87.7 in GPQA Diamond. On the Frontier Math benchmark, which has historically posed tests for human mathematicians and experts, o3 attained a record-best 25.2% success, following a best-in-class 2.0 score.

OpenAI o33 1 OpenAI Launches o3 Model, Breaks ARC-AGI Benchmark After 5 Years

The o3-mini model is a distilled version of the o3 model, which is optimized for faster code generation, higher performance, and lower costs. The o3-mini comes with three compute settings—low, medium, and high—beating the o1 model at medium and with lower latency. To avoid legal disputes with the UK-based mobile operator O2, OpenAI avoided calling the model o2. The o3 and o3-mini are being safety tested by OpenAI right now, with the o3-mini expected to become publicly available around January 2025. The o3 model will come next, following further tests and regulatory approval.

FAQs

What is the ARC-AGI benchmark?

ARC-AGI tests AI for generalized intelligence, focusing on problem-solving skills rather than pattern recognition.

When will OpenAI release the o3 and o3-mini models?

The o3-mini will be available by January 2025, while the full o3 model will follow after safety testing.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More like this

OpenAI’s “Operator” Is Here: Meet the AI Agent That Does Your Web Tasks

OpenAI’s “Operator” Is Here: Meet the AI Agent That...

Big news from the creators of ChatGPT! OpenAI has just introduced Operator, a brand-new AI agent that...
What is OpenAI's Sora? The Video Gen AI Platform in 2025

What is OpenAI’s Sora? The Video Gen AI Platform...

The rapid evolution of artificial intelligence (AI) never ceases to amaze. As visionaries in the tech space...

OpenAI Brings ChatGPT WhatsApp Number: How It Works and...

OpenAI has introduced an experimental service that brings ChatGPT to WhatsApp, making the popular AI chatbot more...

OpenAI Launches Sora: AI Text-to-Video Tool Now Public

OpenAI has officially launched Sora, its AI text-to-video generator, to the public as part of its “12...

Indian-American OpenAI Whistleblower Suchir Balaji Found Dead in US...

The tech world is in shock following the tragic death of Suchir Balaji, a 26-year-old Indian-American former...

LATEST NEWS

Aston Villa’s Jhon Duran Set for €77m Move to Al Nassr: Medical Imminent

Aston Villa's Colombian forward, Jhon Duran, is on the verge of completing a €77 million transfer to Saudi Arabian giants Al Nassr. Set to...

iPhone 17’s Dynamic Island Revealed: No Size Change from iPhone 16

Hey there, Apple fans! If you’ve been keeping up with the latest iPhone rumors, you’ve probably heard some buzz about the iPhone 17 lineup....

Virat Kohli’s Triumphant Return to Ranji Trophy: The Legend Comes Home

With cricket fans buzzing and the Arun Jaitley Stadium packed to the rafters, Virat Kohli made an electric return to the Ranji Trophy after...

Why DeepSeek Is Causing a Stir in the AI Industry in 2025?

It took about a month for the finance world to understand the significance of DeepSeek, but when it did, it did so by knocking...

Featured