"Unleash Your Imagination with 'Genie': Google's New AI Model Creates Games through Text and Images"

Google DeepMind recently launched Genie, a generative interactive environment AI model that can generate interactive animated games based on text or image prompts without prior training on game mechanics and operations.

Contents
Toggle
Google DeepMind launches generative interactive environment tool “Genie”
What is Genie?
Multi-model architecture
Learning to reproduce actions and identify controllable parts
Creating games from synthetic or real images
Google and OpenAI in fierce competition
Genie
As an artificial intelligence company acquired by Google in 2014, Google DeepMind submitted a paper on the 23rd, stating that the company has launched a generative interactive environment AI model called “Genie,” which can generate controllable interactive virtual environments based solely on text, images, or sketches.

The content states that Genie is trained using a large amount of publicly available online videos, rather than relying on specific game or scene data. This has broader applications in game development and creative entertainment industries.

As a novel creation of generative AI, we have introduced the generative interactive environment “Genie,” which can generate interactive and playable environments based on a single image prompt.

Multi-model architecture
First, the paper shows that Genie, as a fundamental world model, is composed of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model, with a total of 11 billion parameters.

Content of the Genie paper

Therefore, it can autonomously train itself in an unsupervised manner from 2D platform game and robotics videos on the internet without explicit instructions. It can also infer consistent or multiple latent actions from the generated environments based on the external images we provide, including real-world photos or sketches, which can be controlled and interacted with by people.

What sets Genie apart is its ability to learn and identify which parts of the actions are controllable and generate interactive scenarios.

Additionally, Genie can create a complete new interactive environment with just one image. It first uses the generative model Imagen 2 to generate keyframes from text-to-image conversion and then applies dynamics to the images through Genie.

Genie can generate interactive animated environments through synthesized images

At the same time, Genie can also accept unseen image prompts, including real-world photos or simple sketches, allowing people to interact with previously immovable objects in reality.

Genie can generate interactive animated environments through real photos and drawing sketches

Blog
The article states:
Genie’s capabilities allow anyone, even children, to create and enter controllable simulated environments or interactive generated worlds.

At the end of the article, it also mentions the ambitious goal of the Genie product:
Genie’s applications are not limited to entertainment or creative development. It can also serve as an excellent testing platform for training intelligent agents, thus promoting the development of the AI field.

It is reported that an intelligent agent refers to an autonomous entity that can observe the surrounding environment and take actions to achieve goals. This is a core concept and important goal in current AI research.

In recent months, Google has released several generative AI models or information, including the powerful AI advisor “Gemini,” the text-to-video generation tool “Lumiere,” and the keyword image generation tool “ImageFX,” all of which have attracted public attention.

On the other hand, OpenAI’s text-to-video tool Sora, as the first video generation product, also sparked an AI frenzy a few weeks ago.

(Why can OpenAI’s Sora bring a big leap in AI video generation just by giving text to AI for making movies?)

However, recent controversies related to Gemini in generating images, involving racial bias, caused Alphabet, the parent company, to experience a more than 4% drop in stock price in a single day (26).

Demis Hassabis, the head of Google DeepMind’s research department, stated at the Mobile World Congress (MWC Barcelona 2024) yesterday:
We have taken down that feature of Gemini and will fix the issue and restore it in the coming weeks.

AI
Gemini
Genie
Google
Google DeepMind
ImageFX
Lumiere
OpenAI
Generative AI

Further Reading
Reddit and Google sign partnership to provide content for training AI models
Nvidia’s financial report exceeds expectations again, AI currency celebrates

Hot News

Meta Labels Cryptocurrency Content as “Fraud,” Resulting in Account Suspensions for Several Crypto KOLs

ZachXBT: Politicians Leading the Pinnacle of Crypto Crime, Where Hacking is More Profitable than Serious Development

Iran’s Banking System and Cryptocurrency Exchanges Completely Paralyzed! Can Holding Bitcoin Serve as a Hedge in the Event of an Information War in the Taiwan Strait?

“Unleash Your Imagination with ‘Genie’: Google’s New AI Model Creates Games through Text and Images”

Meta Labels Cryptocurrency Content as “Fraud,” Resulting in Account Suspensions for Several Crypto KOLs

Coinbase Plans to Launch Tokenized Stocks, Emerging as the Blockchain Version of Robinhood

Taiwan Targets Export Controls on Huawei and SMIC Wafer Technology as Cross-Strait Chip Wars Intensify

Infini Announces Closure of Cryptocurrency Financial Card Services: Is the U Card Destined to Be Stifled by Traditional Financial Payment Channels?

Financial Secretary Paul Chan: Hong Kong’s Stock Market Recovers, Positioning the City to Become the World’s Largest Cross-Border Asset Management Hub

Coinbase Sponsors Trump’s Parade, Investors Withdraw Funds, and Netizens Express Outrage: Violating Political Neutrality

Leave A Reply Cancel Reply

Decoding Cryptography: It’s Actually Easier to Grasp Than You Think!

Insider’s Guide to CoinMarketCap: What Veteran Cryptocurrency Enthusiasts Don’t Know

NFT Unveiled: A Comprehensive Guide to 6 Prominent Categories of NFTs

Meta Labels Cryptocurrency Content as “Fraud,” Resulting in Account Suspensions for Several Crypto KOLs

ZachXBT: Politicians Leading the Pinnacle of Crypto Crime, Where Hacking is More Profitable than Serious Development

Iran’s Banking System and Cryptocurrency Exchanges Completely Paralyzed! Can Holding Bitcoin Serve as a Hedge in the Event of an Information War in the Taiwan Strait?

Can AI-Generated Fake Videos Teach You Wealth Freedom? Japanese Company Unveils Latest Technology to Identify Fake Animations Created by AI

Popular

Decoding Cryptography: It’s Actually Easier to Grasp Than You Think!

Insider’s Guide to CoinMarketCap: What Veteran Cryptocurrency Enthusiasts Don’t Know

NFT Unveiled: A Comprehensive Guide to 6 Prominent Categories of NFTs

Our selection

Meta Labels Cryptocurrency Content as “Fraud,” Resulting in Account Suspensions for Several Crypto KOLs

ZachXBT: Politicians Leading the Pinnacle of Crypto Crime, Where Hacking is More Profitable than Serious Development

Iran’s Banking System and Cryptocurrency Exchanges Completely Paralyzed! Can Holding Bitcoin Serve as a Hedge in the Event of an Information War in the Taiwan Strait?

Hot News

“Unleash Your Imagination with ‘Genie’: Google’s New AI Model Creates Games through Text and Images”

Related Posts

Leave A Reply Cancel Reply