Neural Motion Simulator

Neural Motion Simulator
Pushing the Limit of World Models in Reinforcement Learning

CVPR 2025

Chenjie Hao^1*, Weyl Lu^1*, Yifan Xu², Yubei Chen^1,2†

¹UC Davis ²Open Path AI Foundation

^*Equal contribution ^†Corresponding author

Abstract

An embodied system must not only model the patterns of the external world but also understand its own motion dynamics. A motion dynamic model is essential for efficient skill acquisition and effective planning. In this work, we introduce Neural Motion Simulator (MoSim), a world model that predicts the physical future state of an embodied system based on current observations and actions. MoSim achieves state-of-the-art performance in physical state prediction, also provides competitive performance across a range of downstream tasks. This model enables embodied systems to perform long-horizon predictions, facilitating efficient skill acquisition in imagined environments and even enabling zero-shot reinforcement learning learning. Furthermore, MoSim can transform any model-free reinforcement learning (RL) algorithm into a model-based approach, effectively decoupling the physical environment modeling from RL algorithm development. This separation allows for independent advancements in RL algorithms and world modeling, significantly improving sample efficiency and enhancing generalization capabilities. Our findings highlight that modeling world models for motion dynamics is a promising direction for developing more versatile and capable embodied systems.

High Precision Motion Prediction

Prediction MSE Loss
Versus DreamerV3

Environment	DreamerV3	MoSim
Humanoid	2.1291	1.2737
Cheetah	0.1925	0.1206
Reacher	0.0972	0.0005
Go 2	0.3685	0.0410
Hopper	0.1114	0.0375
Acrobot	0.1015	0.0001
Panda	0.0434	0.0010

MoSim demonstrates remarkable accuracy for predicting the future. In a 16-step prediction test within a robotic simulation environment, MoSim significantly outperformed the previous world model, DreamerV3.

Latent Space Prediction MSE Loss
Versus TD-MPC2

Environment	TD-MPC2	MoSim
Humanoid	0.00011	0.00009
Cheetah	0.0009	0.0007
Reacher	4.8101e-5	2.9256e-7

Even in the latent space, MoSim maintains superior predictive capabilities. Using the encoder provided by TD-MPC2, MoSim's 3-step prediction performance in latent space surpasses that of TD-MPC2.

Transforming Any Model-Free Method to Model-Based

Zero-Shot Reinforcement Learning

Leveraging the powerful predictive capabilities of our world model, we have remarkably achieved zero-shot reinforcement learning , i.e., learning solely from imagined data generated by the pretrained world model without any interaction with the real environment. More importantly, MoSim can transform any model-free RL algorithm into a model-based RL approach, allowing it to fully reap the benefits of the latter's data efficiency and training efficiency. In the figure, the purple dashed line represents the DreamerV3 score, while the red line represents the score of the RL model based on MoSim.

Inductive Bias for Handling Robotic Dynamics

We introduce a simple yet effective inductive bias to address robotic dynamics. By encoding generalized positions and velocities separately and combining the outputs in a manner analogous to rigid body dynamics, we achieve a more efficient modeling of robotic motion. Additionally, we incorporate a bias-free corrector and a multi-stage training method, enabling superior performance, improved generalization, and more efficient training.

Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning