Amid a potential breakup threat from US authorities, Google releases next generation of its AI tool Gemini
Alphabet’s Google has launched its most advanced AI model to date, Gemini 2.0, as the tech giant contends with a possible breakup from US authorities.
Google announced the latest AI model in a blog post on Wednesday from CEO Sundar Pichai, Google Deepmind CEO Demis Hassabis, and the CTO of Deepmind Koray Kavukcuoglu.
Gemini 2.0 comes with what Google is touting as ‘Agentic AI ‘(think of them as ‘AI Agents’), which are essentially individual bots that go off and carry out particular tasks for the user.
Gemini 2.0
According to all three men, Gemini 2.0 can generate images, text, code and audio, while being faster and cheaper to run.
“Information is at the core of human progress. It’s why we’ve focused for more than 26 years on our mission to organise the world’s information and make it accessible and useful,” noted CEO Sundar Pichai.
“That was our vision when we introduced Gemini 1.0 last December,” wrote Pichai. “The first model built to be natively multimodal, Gemini 1.0 and 1.5 drove big advances with multimodality and long context to understand information across text, video, images, audio and code, and process a lot more of it.”
“Now millions of developers are building with Gemini,” he wrote. “Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision.”
“Today we’re excited to launch our next era of models built for this new agentic era: introducing Gemini 2.0, our most capable model yet,” wrote Pichai. “With new advances in multimodality – like native image and audio output – and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant.”
“We’re getting 2.0 into the hands of developers and trusted testers today,” he added. “And we’re working quickly to get it into our products, leading with Gemini and Search. Starting today our Gemini 2.0 Flash experimental model will be available to all Gemini users. We’re also launching a new feature called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant, exploring complex topics and compiling reports on your behalf. It’s available in Gemini Advanced today.”
“If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful,” Pichai concluded. “I can’t wait to see what this next era brings.”
Agentic era
Meanwhile Demis Hassabis and Koray Kavukcuoglu noted the “incredible progress” made by Google Deepmind over the past year, which culminated in the release of “the first model in the Gemini 2.0 family of models: an experimental version of Gemini 2.0 Flash. It’s our workhorse model with low latency and enhanced performance at the cutting edge of our technology, at scale.”
Gemini 2.0 Flash is being touted as offering enhanced performance at similarly fast response times for developers.
“Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed,” both men wrote. “2.0 Flash also comes with new capabilities. In addition to supporting multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions.”
“Gemini 2.0 Flash is available now as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI with multimodal input and text output available to all developers, and text-to-speech and native image generation available to early-access partners,” they wrote. “General availability will follow in January, along with more model sizes.”
From Wednesday Gemini users globally can access a chat optimised version of 2.0 Flash experimental by selecting it in the model drop-down on desktop and mobile web and it will be available in the Gemini mobile app soon. With this new model, users can experience an even more helpful Gemini assistant.
Project Astra
They wrote that the “practical application of AI agents is a research area full of exciting possibilities. We’re exploring this new frontier with a series of prototypes that can help people accomplish tasks and get things done. These include an update to Project Astra, our research prototype exploring future capabilities of a universal AI assistant; the new Project Mariner, which explores the future of human-agent interaction, starting with your browser; and Jules, an AI-powered code agent that can help developers.”
Improvements in the latest version of Project Astra built with Gemini 2.0 include:
- Better dialogue: Project Astra now has the ability to converse in multiple languages and in mixed languages, with a better understanding of accents and uncommon words.
- New tool use: With Gemini 2.0, Project Astra can use Google Search, Lens and Maps, making it more useful as an assistant.
- Better memory: We’ve improved Project Astra’s ability to remember things while keeping the user in control. It now has up to 10 minutes of in-session memory and can remember more conversations the user had with it in the past, so it is better personalised to the user.
- Improved latency: With new streaming capabilities and native audio understanding, the agent can understand language at about the latency of human conversation.
Project Mariner, Jules
Project Mariner meanwhile has agents that can help the user accomplish complex tasks, by pushing human-agent interaction, starting with the web browser.
“As a research prototype, it’s able to understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms, and then uses that information via an experimental Chrome extension to complete tasks for you,” they wrote.
There is also Jules, which are essentially AI agents for developers that integrates directly into a GitHub workflow. It can tackle an issue, develop a plan and execute it, all under a developer’s direction and supervision.
Version 1.0 of the Gemini large language model (LLM) had been released in December 2023, but that was not the first AI offering from Google, which had launched the Bard AI chatbot back in March 2023.
In February 2024 Google officially rebranded the Bard chatbot as Gemini.