In the last of the “12 Days of OpenAI” drop announcements, OpenAI has today announced a stunning large update: the opening of the o3 reasoning models and the o3-mini models. Fittingly, o3 also became the first AI model to break the famous ARC-AGI benchmark, which had been undefeated for five years.
Table of Contents
OpenAI Launches o3 and o3-mini Models, First to Break ARC-AGI Benchmark in 5 Years, Paving Path to AGI
Using high-compute resources and an extended processing window, o3 reached 87.5% on the ARC-AGI Semi-Private Evaluation Set. It surpasses the ARC Prize hurdle (85%) which is generally the level humans achieve. By comparison, OpenAI’s o1 model from earlier has a score of only 32%.
The new measurement represents AI models on their performance in solving new and unseen problems, rather than how good they are at pattern generalization, hence it is called the ARC-AGI test. If passing this test is an indicator, OpenAI’s o3 model is a large step toward Artificial General Intelligence (AGI), which can be considered a system that has the potential to be equal to or even surpass human intelligence.
Besides the ARC-AGI milestone, the o3 model also received high scores in other challenging benchmarks such as 71.7 in SWE-bench Verified, 2,727 in Codeforces, 96.7 in AIME 2024, and 87.7 in GPQA Diamond. On the Frontier Math benchmark, which has historically posed tests for human mathematicians and experts, o3 attained a record-best 25.2% success, following a best-in-class 2.0 score.
The o3-mini model is a distilled version of the o3 model, which is optimized for faster code generation, higher performance, and lower costs. The o3-mini comes with three compute settings—low, medium, and high—beating the o1 model at medium and with lower latency. To avoid legal disputes with the UK-based mobile operator O2, OpenAI avoided calling the model o2. The o3 and o3-mini are being safety tested by OpenAI right now, with the o3-mini expected to become publicly available around January 2025. The o3 model will come next, following further tests and regulatory approval.
FAQs
What is the ARC-AGI benchmark?
ARC-AGI tests AI for generalized intelligence, focusing on problem-solving skills rather than pattern recognition.
When will OpenAI release the o3 and o3-mini models?
The o3-mini will be available by January 2025, while the full o3 model will follow after safety testing.