A new artificial intelligence (AI) model called Genie 3 can create interactive worlds based on simple text instructions. This general-purpose world model, developed by researchers at Google DeepMind, generates dynamic environments that people can explore in real time at 24 frames per second.
These worlds stay consistent for a few minutes at a 720p resolution, a standard for clear video quality. Users can type a description, like a volcanic area or a forest, and Genie 3 builds a virtual space to navigate. This technology builds on earlier models, Genie 1 and Genie 2, which also created environments but lacked real-time interaction.
World models are AI systems that simulate parts of the real world, helping predict how environments change and how actions affect them. Genie 3 takes this further by allowing users to move through its creations, such as a robot crossing rocky terrain or a person walking in a hurricane. It can mimic natural elements like water, lighting, and plant life, and even imagine fantastical scenes with animated characters. The model uses video generation advances from Veo 2 and Veo 3, which understand basic physics, to make these worlds more realistic.
Exploring new possibilities
Genie 3’s ability to maintain consistency over time is a big step forward. For example, a user can revisit a location after a minute, and the model remembers what was there before. This is done by generating each frame based on past actions, a process that happens quickly to keep up with user inputs. The model can also create specific events, like changing weather or adding objects, based on text commands. This feature helps test AI agents, like those controlling robots, by giving them goals to achieve in these worlds.
The technology has limits, such as short interaction times and challenges with complex actions or perfect real-world details. Researchers are sharing Genie 3 with a small group to gather feedback.