
Dreamer 4 learns to solve complex control tasks through reinforcement learning within the world model. We decode an imaginary training sequence for visualization and show that the world model learns how to simulate a wide range of game mechanics from low-level mouse and keyboard actions, such as breaking blocks, using tools, and interacting with crafting tables. Credit: arXiv (2025). DOI: 10.48550/arxiv.2509.24527
Over the past decade, deep learning has changed the way artificial intelligence (AI) agents perceive and behave in digital environments, enabling them to master board games, control simulated robots, and reliably tackle a variety of other tasks. However, most of these systems still rely on a huge amount of direct experience – millions of trial-and-error interactions – to achieve even modest capabilities.
This brute-force approach limits its usefulness in the physical world, where such experiments are slow, expensive, and unsafe.
To overcome these limitations, researchers turned to world models, simulated environments in which agents can safely practice and learn.
These world models aim to capture not only the visuals of the world, but also the underlying dynamics of how objects move, collide, and react to actions. However, while simple games like Atari and Go have served as effective testbeds, world models remain inadequate to represent the rich, open-ended physics of complex worlds such as Minecraft or robotic environments.
Google DeepMind researchers recently developed Dreamer 4, a new artificial agent that can learn fully complex behaviors within a scalable world model using a limited set of pre-recorded videos.
The new model, announced in a paper published on the arXiv preprint server, was the first artificial intelligence (AI) agent to earn diamonds within Minecraft without any practice in the real game. This remarkable achievement highlights the potential of using Dreamer 4 to train successful AI agents purely through imagination, with important implications for the future of robotics.
“We humans choose our actions based on our deep understanding of the world and predict potential outcomes in advance,” Danihal Hafner, lead author of the paper, told Tech Xplore.
“This ability requires an internal model of the world, allowing it to solve new problems very quickly. In contrast, traditional AI agents typically learn through brute force learning, which involves a huge amount of trial and error. But that’s not possible in applications like physical robots that break easily.”
Some of the AI agents developed at DeepMind over the past few years have already seen great success in games like Go and Atari by training on small-world models. However, the world models these models relied on were unable to capture the rich physical interactions in more complex worlds, such as the Minecraft video game.
Meanwhile, “video models such as Veo and Sora are rapidly improving toward producing realistic video in a wide variety of situations,” Hafner said.
“But they are not interactive and generate too slowly to be used as ‘neural simulators’ to train agents internally. The goal of Dreamer 4 was to train successful agents only inside a world model that could realistically simulate a complex world.”
Hafner and his colleagues decided to use Minecraft as a test bed for their AI agent. Because Minecraft is a complex video game with infinitely generated worlds and long-term tasks that require over 20,000 consecutive mouse/keyboard movements to complete.
One of these tasks is diamond mining, which requires agents to perform a long series of prerequisites such as cutting down trees, crafting tools, and mining and smelting ore.
In particular, the researchers wanted to train the agents in purely “imaginary” scenarios, rather than having them practice in real games. This is similar to how smart robots need to learn in simulations, as direct practice in the physical world can easily break them. This requires the model to learn object interactions with a sufficiently accurate internal model of the Minecraft world.
The artificial agents developed by Hafner and his colleagues are based on large-scale Transformer models that are trained to predict future observations, actions, and rewards associated with specific situations. Dreamer 4 was trained on a fixed offline dataset containing recorded Minecraft gameplay videos collected by human players.
“After completing this training, Dreamer 4 will learn how to choose better actions in a wide range of imagined scenarios through reinforcement learning,” Hafner said.
“To train agents within a scalable world model, we needed to push the frontiers of generative AI. We designed an efficient transformer architecture and a new training objective named shortcut enforcement. These advances enable accurate predictions while increasing production speed by more than 25x compared to typical video models.”
Dreamer 4 is the first AI agent to earn diamonds in Minecraft when trained solely on offline data, without practicing skills in a real game. This finding highlights the agent’s ability to autonomously learn how to correctly solve complex and long-term tasks.
“Purely offline learning is very relevant for training robots, which can easily break when practiced in the physical world,” Hafner said. “Our research introduces a promising new approach to building smart robots that perform household and factory tasks.”
Initial tests conducted by researchers showed that the Dreamer 4 agent accurately predicts the interactions of various objects and game mechanics, and develops a reliable internal world model. The model of the world established by the agent vastly outperforms the model on which previous agents relied.
“This model supports real-time interaction on a single GPU, making it easy for human players to explore the dream world and test its functionality,” Hafner said. “We found that this model accurately predicted the dynamics of mining and placing blocks, crafting simple items, and even using doors, chests, and boats.”
An additional advantage of Dreamer 4 is that it achieved remarkable results despite being trained on a very small amount of action data. This is basically video footage that shows you the effects of pressing various keys and mouse buttons within the Minecraft video game.
“Instead of requiring thousands of hours of gameplay recordings containing action, the world model can actually learn most of its knowledge from video alone,” Hafner said.
“Using just a few hundred hours of action data, the world model understands the effects of mouse movements and key presses in a common way as it transitions to new situations. This is interesting because recording robot data takes time, but the Internet contains tons of videos of humans interacting with the world, which Dreamer 4 may learn from in the future.”
This recent work by Hafner and his colleagues at DeepMind could contribute to advances in robotics systems, simplifying the training of algorithms that reliably complete manual tasks in the real world.
Meanwhile, researchers plan to further improve Dreamer 4’s world model by integrating long-term memory components. This ensures that the simulated world in which the agent is trained remains consistent over time.
“Incorporating language understanding also brings us closer to agents that collaborate with humans and perform tasks on our behalf,” Hafner added.
“Finally, training the world model on common internet videos provides the agent with common sense knowledge of the physical world, allowing us to train the robot in a variety of imaginary scenarios.”
This article written for you by author Ingrid Fadelli, edited by Sadie Harley, and fact-checked and reviewed by Robert Egan is the result of careful human labor. We rely on readers like you to sustain our independent science journalism. If this reporting is important to you, please consider making a donation (especially monthly). As a thank you, we’re giving away an ad-free account.
More information: Danijar Hafner et al., Training agents in scalable world models, arXiv (2025). DOI: 10.48550/arxiv.2509.24527
Magazine information: arXiv
© 2025 Science X Network
Citation: DeepMind deploys AI agents that learn to complete various tasks in scalable world models (October 25, 2025), Retrieved October 25, 2025 from https://techxplore.com/news/2025-10-deepmind-ai-agent-tasks-scalable.html
This document is subject to copyright. No part may be reproduced without written permission, except in fair dealing for personal study or research purposes. Content is provided for informational purposes only.
