Neural Motion Simulator
Pushing the Limit of World Models in Reinforcement Learning

CVPR 2025

Chenjie Hao1*, Weyl Lu1*, Yifan Xu2, Yubei Chen1,2†
1UC Davis 2Open Path AI Foundation
*Equal contribution Corresponding author

Code and checkpoints will be released soon.

Interpolate start reference image.

30-step prediction results of MoSim on a humanoid robot. The top row represents the ground truth generated by humanoid walk policy, while the bottom row shows the predictions. Each frame corresponds to four physical steps.

Abstract

An embodied system must not only model the patterns of the external world but also understand its own motion dynamics. A motion dynamic model is essential for efficient skill acquisition and effective planning. In this work, we introduce Neural Motion Simulator (MoSim), a world model that predicts the physical future state of an embodied system based on current observations and actions. MoSim achieves state-of-the-art performance in physical state prediction, also provides competitive performance across a range of downstream tasks. This model enables embodied systems to perform long-horizon predictions, facilitating efficient skill acquisition in imagined environments and even enabling zero-shot reinforcement learning learning. Furthermore, MoSim can transform any model-free reinforcement learning (RL) algorithm into a model-based approach, effectively decoupling the physical environment modeling from RL algorithm development. This separation allows for independent advancements in RL algorithms and world modeling, significantly improving sample efficiency and enhancing generalization capabilities. Our findings highlight that modeling world models for motion dynamics is a promising direction for developing more versatile and capable embodied systems.

High Precision Motion Prediction

Prediction MSE Loss
Versus DreamerV3

Environment DreamerV3 MoSim
Humanoid 2.1291 1.2737
Cheetah 0.1925 0.1206
Reacher 0.0972 0.0005
Go 2 0.3685 0.0410
Hopper 0.1114 0.0375
Acrobot 0.1015 0.0001
Panda 0.0434 0.0010

MoSim demonstrates remarkable accuracy for predicting the future. In a 16-step prediction test within a robotic simulation environment, MoSim significantly outperformed the previous world model, DreamerV3.

Latent Space Prediction MSE Loss
Versus TD-MPC2

Environment TD-MPC2 MoSim
Humanoid 0.00011 0.00009
Cheetah 0.0009 0.0007
Reacher 4.8101e-5 2.9256e-7

Even in the latent space, MoSim maintains superior predictive capabilities. Using the encoder provided by TD-MPC2, MoSim's 3-step prediction performance in latent space surpasses that of TD-MPC2.

Transforming Any Model-Free Method to Model-Based

Zero-Shot Reinforcement Learning

rl1 rl2 rl3

Leveraging the powerful predictive capabilities of our world model, we have remarkably achieved zero-shot reinforcement learning , i.e., learning solely from imagined data generated by the pretrained world model without any interaction with the real environment. More importantly, MoSim can transform any model-free RL algorithm into a model-based RL approach, allowing it to fully reap the benefits of the latter's data efficiency and training efficiency. In the figure, the purple dashed line represents the DreamerV3 score, while the red line represents the score of the RL model based on MoSim.


Inductive Bias for Handling Robotic Dynamics

MoSim Architecture.

We introduce a simple yet effective inductive bias to address robotic dynamics. By encoding generalized positions and velocities separately and combining the outputs in a manner analogous to rigid body dynamics, we achieve a more efficient modeling of robotic motion. Additionally, we incorporate a bias-free corrector and a multi-stage training method, enabling superior performance, improved generalization, and more efficient training.