OpenAI Launches o3 Model, Breaks ARC-AGI Benchmark After 5 Years

In the last of the “12 Days of OpenAI” drop announcements, OpenAI has today announced a stunning large update: the opening of the o3 reasoning models and the o3-mini models. Fittingly, o3 also became the first AI model to break the famous ARC-AGI benchmark, which had been undefeated for five years.

OpenAI

OpenAI Launches o3 and o3-mini Models, First to Break ARC-AGI Benchmark in 5 Years, Paving Path to AGI

Using high-compute resources and an extended processing window, o3 reached 87.5% on the ARC-AGI Semi-Private Evaluation Set. It surpasses the ARC Prize hurdle (85%) which is generally the level humans achieve. By comparison, OpenAI’s o1 model from earlier has a score of only 32%.

OpenAI o32 1 OpenAI Launches o3 Model, Breaks ARC-AGI Benchmark After 5 Years

The new measurement represents AI models on their performance in solving new and unseen problems, rather than how good they are at pattern generalization, hence it is called the ARC-AGI test. If passing this test is an indicator, OpenAI’s o3 model is a large step toward Artificial General Intelligence (AGI), which can be considered a system that has the potential to be equal to or even surpass human intelligence.

Besides the ARC-AGI milestone, the o3 model also received high scores in other challenging benchmarks such as 71.7 in SWE-bench Verified, 2,727 in Codeforces, 96.7 in AIME 2024, and 87.7 in GPQA Diamond. On the Frontier Math benchmark, which has historically posed tests for human mathematicians and experts, o3 attained a record-best 25.2% success, following a best-in-class 2.0 score.

OpenAI o33 1 OpenAI Launches o3 Model, Breaks ARC-AGI Benchmark After 5 Years

The o3-mini model is a distilled version of the o3 model, which is optimized for faster code generation, higher performance, and lower costs. The o3-mini comes with three compute settings—low, medium, and high—beating the o1 model at medium and with lower latency. To avoid legal disputes with the UK-based mobile operator O2, OpenAI avoided calling the model o2. The o3 and o3-mini are being safety tested by OpenAI right now, with the o3-mini expected to become publicly available around January 2025. The o3 model will come next, following further tests and regulatory approval.

FAQs

What is the ARC-AGI benchmark?

ARC-AGI tests AI for generalized intelligence, focusing on problem-solving skills rather than pattern recognition.

When will OpenAI release the o3 and o3-mini models?

The o3-mini will be available by January 2025, while the full o3 model will follow after safety testing.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More like this

OpenAI Unveils GPT-4.5: More Human-Like Conversations, Smarter AI

OpenAI has officially launched GPT-4.5, its latest and most advanced AI model for chat-based applications. The company...

GPT-4.5: OpenAI Groundbreaking Leap into a New Era of...

OpenAI Updates! Imagine an artificial intelligence so nuanced, so intuitively human, that it feels less like a machine...
OpenAI Expands Operator

OpenAI : OpenAI Expands Operator AI to India, UK,...

OpenAI : OpenAI is making waves once again with the expansion of its powerful AI agent, Operator,...
OpenAI ChatGPT

OpenAI’s ChatGPT Hits 400M Weekly Users, GPT-5 Coming Soon

OpenAI ChatGPT : OpenAI’s ChatGPT has hit a major milestone, surpassing 400 million weekly active users...

OpenAI’s Game-Changer: ChatGPT-4.5 Set to Revolutionize AI in Weeks

ChatGPT-4.5 Updates! In a groundbreaking announcement that has sent ripples through the tech world, OpenAI CEO Sam...

LATEST NEWS

AMD’s Medusa Point APUs May Stick with RDNA 3.X Instead of RDNA 4 or 5

AMD’s next-gen Medusa Point APUs, set to feature the powerful Zen 6 architecture, might not be upgrading to RDNA 4 or RDNA 5 for...

HMD Unveils Fusion X1: A Safe Smartphone with Parental Controls

HMD Fusion X1: As digital safety concerns continue to rise, HMD Global has introduced a new smartphone tailored for children: the HMD Fusion X1....

Infinix Elevates AI Game with DeepSeek-R1 Integration

Infinix is taking a bold step into the future of artificial intelligence by integrating DeepSeek-R1, an advanced reasoning model, across its smartphone lineup. This...

Free Fire OB48: Revolutionizing Mobile Battle Royale in 2025

In the dynamic world of mobile battle royale, few updates generate as much excitement as Free Fire’s OB48 release. This isn’t just another patch...

Featured