Google DeepMind Unveils Upgraded Gemini-Powered Robot in Mountain View Office

By Madz Dizon

Updated: Jul 12 2024, 03:08 AM EDT

Google DeepMind Unveils Upgraded Gemini-Powered Robot in Mountain View Office — (not the actual photo) Alexander Koerner/Getty Images

This week, Google's DeepMind Robotics team is highlighting a promising intersection between two disciplines: navigation.

DeepMind, the AI research lab owned by Google, has unveiled an upgraded language model that has transformed a wheeled robot into a helpful tour guide and office assistant.

Google Showcases Smarter Robot Using Gemini AI

This remarkable development is taking place in a bustling open-plan office in Mountain View, California.

The robot uses Google's advanced Gemini large language model to effectively interpret commands and navigate its surroundings.

Upon receiving the request to find a suitable writing location, the robot promptly sets off, guiding the individual to a spotless whiteboard within the premises.

The versatility of Gemini is evident in its capability to process both video and text, as well as its impressive ability to analyze extensive amounts of data from pre-recorded office tours.

This advanced functionality enables the "Google helper" robot to comprehend its surroundings and accurately navigate through commonsense reasoning commands.

The robot uses Gemini technology along with an advanced algorithm to generate precise actions based on commands and visual input.

In December, Gemini was introduced by Demis Hassabis, the CEO of Google DeepMind. Hassabis mentioned to WIRED that the multimodal capabilities of Gemini could potentially lead to the development of new robot abilities. He mentioned that the company's researchers were diligently testing the model's robotic capabilities.

Google Deepmind System Becomes Smarter

The team states that DeepMind's system has made significant advancements in enhancing the naturalness of human-robot interaction and improving the usability of robots.The demonstration effectively showcases how powerful language models can extend their capabilities beyond digital realms and perform practical tasks.

According to Wired, chatbots like Gemini primarily function within web browsers or apps, but they are becoming more adept at processing visual and auditory information, as shown by recent advancements from Google and OpenAI. In May, Hassabis unveiled an enhanced version of Gemini that can analyze an office layout using a smartphone camera.

Sectors