Google DeepMind recently launched Genie, a generative interactive environment AI model that can generate interactive animated games based on text or image prompts without prior training on game mechanics and operations.
Contents
Toggle
Google DeepMind launches generative interactive environment tool “Genie”
What is Genie?
Multi-model architecture
Learning to reproduce actions and identify controllable parts
Creating games from synthetic or real images
Google and OpenAI in fierce competition
Genie
As an artificial intelligence company acquired by Google in 2014, Google DeepMind submitted a paper on the 23rd, stating that the company has launched a generative interactive environment AI model called “Genie,” which can generate controllable interactive virtual environments based solely on text, images, or sketches.
The content states that Genie is trained using a large amount of publicly available online videos, rather than relying on specific game or scene data. This has broader applications in game development and creative entertainment industries.
As a novel creation of generative AI, we have introduced the generative interactive environment “Genie,” which can generate interactive and playable environments based on a single image prompt.
Multi-model architecture
First, the paper shows that Genie, as a fundamental world model, is composed of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model, with a total of 11 billion parameters.
Content of the Genie paper
Therefore, it can autonomously train itself in an unsupervised manner from 2D platform game and robotics videos on the internet without explicit instructions. It can also infer consistent or multiple latent actions from the generated environments based on the external images we provide, including real-world photos or sketches, which can be controlled and interacted with by people.
What sets Genie apart is its ability to learn and identify which parts of the actions are controllable and generate interactive scenarios.
Additionally, Genie can create a complete new interactive environment with just one image. It first uses the generative model Imagen 2 to generate keyframes from text-to-image conversion and then applies dynamics to the images through Genie.
Genie can generate interactive animated environments through synthesized images
At the same time, Genie can also accept unseen image prompts, including real-world photos or simple sketches, allowing people to interact with previously immovable objects in reality.
Genie can generate interactive animated environments through real photos and drawing sketches
Blog
The article states:
Genie’s capabilities allow anyone, even children, to create and enter controllable simulated environments or interactive generated worlds.
At the end of the article, it also mentions the ambitious goal of the Genie product:
Genie’s applications are not limited to entertainment or creative development. It can also serve as an excellent testing platform for training intelligent agents, thus promoting the development of the AI field.
It is reported that an intelligent agent refers to an autonomous entity that can observe the surrounding environment and take actions to achieve goals. This is a core concept and important goal in current AI research.
In recent months, Google has released several generative AI models or information, including the powerful AI advisor “Gemini,” the text-to-video generation tool “Lumiere,” and the keyword image generation tool “ImageFX,” all of which have attracted public attention.
On the other hand, OpenAI’s text-to-video tool Sora, as the first video generation product, also sparked an AI frenzy a few weeks ago.
(Why can OpenAI’s Sora bring a big leap in AI video generation just by giving text to AI for making movies?)
However, recent controversies related to Gemini in generating images, involving racial bias, caused Alphabet, the parent company, to experience a more than 4% drop in stock price in a single day (26).
Demis Hassabis, the head of Google DeepMind’s research department, stated at the Mobile World Congress (MWC Barcelona 2024) yesterday:
We have taken down that feature of Gemini and will fix the issue and restore it in the coming weeks.
AI
Gemini
Genie
Google
Google DeepMind
ImageFX
Lumiere
OpenAI
Generative AI
Further Reading
Reddit and Google sign partnership to provide content for training AI models
Nvidia’s financial report exceeds expectations again, AI currency celebrates