MotionTrans Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies

Chengbo Yuan1,2    Rui Zhou*5    Mengzhen Liu*3    Yingdong Hu1,2    Shengjie Wang1,2    Li Yi1,2    Chuan Wen4    Shanghang Zhang3    Yang Gao1,2
1 Institute for Interdisciplinary Information Sciences, Tsinghua University   2 Shanghai Qi Zhi Institute
3 State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
4 Shanghai Jiao Tong University   5 Wuhan University
* Equal contribution Corresponding author

Project Video

Teaser / Project video

Abstract

Scaling real robot data is a key bottleneck in imitation learning, leading to the use of auxiliary data for policy training. While other aspects of robotic manipulation such as image or language understanding may be learned from internet-based datasets, acquiring motion knowledge remains challenging. Human data, with its rich diversity of manipulation behaviors, offers a valuable resource for this purpose. While previous works show that using human data can bring benefits, such as improving robustness and training efficiency, it remains unclear whether it can realize its greatest advantage: enabling robot policies to directly learn new motions for task completion. In this paper, we systematically explore this potential through multi-task human-robot cotraining. We introduce MotionTrans, a framework that includes a data collection system, a human data transformation pipeline, and a weighted cotraining strategy. By cotraining 30 human-robot tasks simultaneously, we directly transfer more than 10 motions from human data to deployable end-to-end robot policies. Notably, 9 tasks achieve non-trivial success rates in zero-shot manner. MotionTrans also significantly enhances pretraining–finetuning performance (+40% success rate). Through ablation study, we identify key factors for successful motion learning: cotraining with robot data. These findings unlock the potential of motion-level learning from human data, offering insights into its effective use for training robotic manipulation policies. All data, code, and model weights are open-sourced.

Motion-Level Learning from Human Demonstrations

MotionTrans pipeline from VR human data to deployable robot policies

From VR human demonstrations to deployable, zero/few-shot robot skills via a unified state–action space, human→robot data transformation, and weighted multi-task co-training.

MotionTrans Zero-Shot Inference Results on Real Robot

(Left: Human Data · Middle: MotionTrans-DP · Right: MotionTrans-Pi0-VLA)

Videos
Quantitative Results
Zero-shot success rate across tasks
Zero-shot Success Rate (SR, %) across tasks. Comparison between MotionTrans-DP (blue) and MotionTrans-π0-VLA (pink).
Zero-shot motion progress score across tasks
Zero-shot Motion Progress Score across tasks. Comparison between MotionTrans-DP (red) and MotionTrans-π0-VLA (yellow).

MotionTrans Few-Shot Inference Results on Real Robot

(Left: Human Data · Right: MotionTrans-DP, few-shot)

Videos
Quantitative Results
Few-shot motion progress score across tasks and average
Few-shot Motion Progress Score (higher is better).
Few-shot success rate across tasks and average
Few-shot Success Rate (SR, %) across tasks and average.

Collecting Human and Robot Demonstrations

Hardware System: VR-based Human Data Capture + Single-arm Robot Platform
MotionTrans hardware overview

We synchronize VR headset/controllers and multi-view cameras to capture 3D hand trajectories, egocentric video, and robot state with precise time alignment.

Data Collection Demo

During collection we log hand keypoints, egocentric observations, and textual annotations under a unified clock, making downstream alignment and training straightforward.

MotionTrans Dataset at a Glance

We collected 3,213 demonstrations across 15 human tasks and 15 robot tasks in 10+ real-world scenes. Tasks are grouped by motion-similar skill categories to support cross-embodiment (Human→Robot) co-training and transfer.

Human Tasks

Robot Tasks

Acknowledgments

We would like to express our sincere gratitude to Shuo Wang, Gu Zhang, Enshen Zhou, Haoxu Huang, Jialei Huang, Ruiqian Nai, Zhengrong Xue, Junmin Zhao, and Weirui Ye for their valuable discussions. We are especially grateful to Ruiqian Nai and Fanqi Lin for their assistance with the implementation of Pi0-VLA, and to Yankai Fu for his support with the hardware implementation. Our thanks also extend to the SpiritAI and InspireRobot team for their assistance.

BibTeX


@inproceedings{yuanmotiontrans,
  title={MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies},
  author={Yuan, Chengbo and Zhou, Rui and Liu, Mengzhen and Hu, Yingdong and Wang, Shengjie and Yi, Li and Zhang, Shanghang and Wen, Chuan and Gao, Yang},
  booktitle={Human to Robot: Workshop on Sensorizing, Modeling, and Learning from Humans}
}