Google on Wednesday announced Gemini 2.0, its most advanced AI model to date, ready for the nascent “agentic era”. By adding features like native image and audio output, multimodal functionality, and enhanced tool integration, Gemini 2.0 is about building the type of powerful AI agents that bring Google closer to its dream of a universal assistant.
Table of Contents
Google Introduces Gemini 2.0: A Leap Toward Advanced AI Agents with Multimodal and Tool Integration Features
The latest version, Gemini 2.0 Flash, builds on the previous version, Gemini 1.5 Flash, yielding a significant performance boost, doubling up on the 1.5 Pro model. The new model is multimodal in that it takes diverse inputs such as images, videos, and audio, and outputs generated images, multilingual text, and text-to-speech audio. It also adds local tool integrations — to Google Search and third-party tools — making it a more versatile product.
Gemini 2.0 Flash is powered by state-of-the-art features, including multimodal reasoning, long-context comprehension, and advanced planning. Consequently, these features facilitate the development of more advanced artificial intelligence agents. The search giant has been on these features with a series of projects. One of these, called “Project Astra,” is centered around Android devices and bolsters multilingual dialogue, integrated tools (like Google Search, Lens, and Maps), and session memory (up to 10 minutes long).
The Project Mariner focuses on task completion by humans and agents over the web through elements of text, images, and code that are considered an accomplishment with 83.5% rates. On the other hand, Jules for Developers is an AI agent built on GitHub that helps developers find issues and suggest solutions.
Google is building on that knowledge of games through DeepMind to develop AI agents that make real-time suggestions in titles such as Clash of Clans and Hay Day, reasoning based on in-game actions. The firm is looking into the use of Gemini 2.0 for robotics and spatial reasoning in physical environments. Google is committed to responsible AI development, with a focus on safety and security. The company works in partnership with its Responsibility and Safety Committee (RSC), employs AI-assisted red teaming to iterate on safety at scale, and actively ensures multimodal safety across all data types.
For AI technology, the launch of Gemini 2.0 is a pivotal step for AGI, with model capability fast-forwarding AI progress. Gemini 2.0 Flash is available on desktop and mobile web via the Gemini app, and developers can access it through Google AI Studio and Vertex AI. A new multimodal live API for real-time audio, video streaming, and combined tool use will also be available. Testing for advanced tasks, including math and coding, is expected to expand in early 2025.
FAQs
What is Gemini 2.0?
Gemini 2.0 is Google’s latest AI model, designed to support more advanced, multimodal AI agents, with features like image and audio output, improved reasoning, and enhanced tool integration.
When will Gemini 2.0 be available?
Gemini 2.0 Flash is available on desktop and mobile web, with a mobile app coming soon. Developers can access it through Google AI Studio, with broader availability expected in January.