Google announced many new AI models, projects to come, and various AI qualities for its items during I/O 2024. The one that genuinely stands out is the Google Gemini 1.5 Flash model. This model is astonishingly quick and successful while also possessing multimodal characteristics and a context window that can expand to up to 1 million symbols and 2 million via the waitlist.
Straightforwardly Gemini 1.5 Flash, Google has not released the number of its parameters and is doing well in all three modalities: text, vision, and audio. The Google Gemini 1.5 technical report states that Gemini 1.5 Flash surpasses both 1.0 Ultra and 1.0 Promently in most tasks. Mainly, it lags behind in speech recognition and translation.
Unlike the sparse MoE model Gemini 1.5 Pro, Gemini 1.5 Flash is a smaller, denser model that has been online distilled from the larger 1.5 Pro model to improve quality. It is also faster than all prior smaller models, such as Claude 3 Haiku, and runs on Google’s custom TPU. Pricing is ridiculously low: $0.35 input and $0.53 output per 128K tokens, and $0.70 and $1.05 for 1 million tokens.
It is cheaper than Llama 3 70B, Mistral Medium, GPT-3.5 Turbo, or other larger models. Therefore, for developers who want multimodal reasoning with a large context window at a cheap price, the Flash model is very compelling. How you can get to try Gemini 1.5 Flash for free:
How to Use Google Gemini 1.5 Flash For Free
- Visit aistudio.google.com and sign in with your Google account. There is no waitlist to use the Flash model.
- Select the “Gemini 1.5 Flash” model from the drop-down menu.
- Start chatting with the Flash model. You can also upload images, videos, audio clips, files, and folders.
First Impression of Google Gemini 1.5 Flash
Even though it is by no means the most advanced model out there, Google Gemini 1.5 Flash is defined by its speed, high efficiency, and low price. Compared to the Google Gemini 1.5 Pro and some larger models from, for instance, OpenAI or Anthropic, it is very limited. When it was asked with five common reasoning questions, it gave one correct answer. Nevertheless, it may serve even better for tasks that require multimodality and a large context window. Moreover, the Gemini models are known for being particularly good at creative tasks, which is useful for both developers and end-users.