TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction
Sibo Tian1, Minghui Zheng1,*, Xiao Liang2,*
1J. Mike Walker β66 Department of Mechanical Engineering, Texas A&M University, 2Zachry Department of Civil and Environmental Engineering, Texas A&M University, *Corresponding Authors
[IEEE Xplore] | [Code]
Predicting human motion plays a crucial role in ensuring a safe and effective human-robot close collaboration in intelligent remanufacturing systems of the future. Existing works can be categorized into two groups: those focusing on accuracy, predicting a single future motion, and those generating diverse predictions based on observations. The former group fails to address the uncertainty and multi-modal nature of human motion, while the latter group often produces motion sequences that deviate too far from the ground truth or become unrealistic within historical contexts. To tackle these issues, we propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction which can generate samples that are more likely to happen while maintaining a certain level of diversity. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. Additionally, we employ the discrete cosine transform to model motion sequences in the frequency space, thereby improving performance. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization to condition the prediction on past observed motion, we treat all inputs, including conditions, as tokens to create a more practical and effective model compared to existing approaches. Extensive experimental studies are conducted on benchmark datasets to validate the effectiveness of our human motion prediction model.
[2024/05/23]: Code released!
[2024/04/28]: Our work is accepted by IEEE Robotics and Automation Letters (RA-L)!
[2024/03/25]: TransFusion prediction demos released!
The following code is tested on Linux-64 in a cluster environment as well as on Windows 11. If you are using Linux-64 in a cluster environment, please use source activate transfusion
instead of conda activate transfusion
.
mkdir ./checkpoints
mkdir ./data
mkdir ./inference
mkdir ./results
conda create -n transfusion python=3.8
conda activate transfusion
python -m pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
python -m pip install -r requirement.txt
Datasets for Human3.6M, HumanEva-I and AMASS:
For Human3.6M and HumanEva-I, we adopt the data preprocessing from GSPS. For AMASS, we carefully adopt the data preprocessing from BeLFusion. We provide all the processed data here for convenience. Download all files into the ./data
directory and the final ./data
directory structure is shown below:
data
βββ data_3d_amass.npz
βββ data_3d_amass_test.npz
βββ data_3d_h36m.npz
βββ data_3d_h36m_test.npz
βββ data_3d_humaneva15.npz
βββ data_3d_humaneva15_test.npz
βββ data_multi_modal
β βββ data_candi_t_his25_t_pred100_skiprate20.npz
β βββ t_his25_1_thre0.500_t_pred100_thre0.100_filtered_dlow.npz
βββ humaneva_multi_modal
βββ data_candi_t_his15_t_pred60_skiprate15.npz
βββ t_his15_1_thre0.500_t_pred60_thre0.010_index_filterd.npz
We provide the pretrained models for all three datasets here. Download all files into the ./checkpoints
directory and the final ./checkpoints
directory structure is shown below:
checkpoints
βββ humaneva_ckpt.pt
βββ h36m_ckpt.pt
βββ amass_ckpt.pt
Evaluate on Human3.6M:
python main.py --cfg h36m --mode eval --ckpt ./checkpoints/h36m_ckpt.pt
Evaluate on HumanEva-I:
python main.py --cfg humaneva --mode eval --ckpt ./checkpoints/humaneva_ckpt.pt
Evaluate on AMASS:
python main.py --cfg amass --mode eval --ckpt ./checkpoints/amass_ckpt.pt --multimodal_threshold 0.4 --seed 6
Note: We change the random seed to 6 instead of 0 for AMASS dataset to fairly compared with BeLFusion. GPU is required for evaluation.
For training TransFusion from scratch for all three datasets, run the following scripts:
python main.py --cfg h36m --mode train
python main.py --cfg humaneva --mode train
python main.py --cfg amass --mode train --multimodal_threshold 0.4 --seed 6 --milestone [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800]
Run the following scripts for visualization purpose:
python main.py --cfg h36m --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/h36m_ckpt.pt
python main.py --cfg humaneva --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/humaneva_ckpt.pt
python main.py --cfg amass --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/amass_ckpt.pt
More prediction demos can be found in ./assets
.
Project structure is borrowed from HumanMAC. We would like to thank the authors for making their code publicly available.
If you find our work useful in your research, please consider citing our paper:
@article{tian2024transfusion,
title={TransFusion: A practical and effective transformer-based diffusion model for 3d human motion prediction},
author={Tian, Sibo and Zheng, Minghui and Liang, Xiao},
journal={IEEE Robotics and Automation Letters},
year={2024},
volume={9},
number={7},
pages={6232-6239},
publisher={IEEE}
}
The software in this repository is freely available for free non-commercial use (see license for further details).