Skip to content

[IEEE RA-L] Official repo for paper "TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction"

License

Notifications You must be signed in to change notification settings

sibotian96/TransFusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

48 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[IEEE RA-L] TransFusion

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Sibo Tian1, Minghui Zheng1,*, Xiao Liang2,*

1J. Mike Walker ’66 Department of Mechanical Engineering, Texas A&M University, 2Zachry Department of Civil and Environmental Engineering, Texas A&M University, *Corresponding Authors

PWC PWC

[IEEE Xplore] | [Code]

Predicting human motion plays a crucial role in ensuring a safe and effective human-robot close collaboration in intelligent remanufacturing systems of the future. Existing works can be categorized into two groups: those focusing on accuracy, predicting a single future motion, and those generating diverse predictions based on observations. The former group fails to address the uncertainty and multi-modal nature of human motion, while the latter group often produces motion sequences that deviate too far from the ground truth or become unrealistic within historical contexts. To tackle these issues, we propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction which can generate samples that are more likely to happen while maintaining a certain level of diversity. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. Additionally, we employ the discrete cosine transform to model motion sequences in the frequency space, thereby improving performance. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization to condition the prediction on past observed motion, we treat all inputs, including conditions, as tokens to create a more practical and effective model compared to existing approaches. Extensive experimental studies are conducted on benchmark datasets to validate the effectiveness of our human motion prediction model.

πŸ“’ News

[2024/05/23]: Code released!

[2024/04/28]: Our work is accepted by IEEE Robotics and Automation Letters (RA-L)!

[2024/03/25]: TransFusion prediction demos released!

πŸ›  Setup

1. Python/Conda Environment

The following code is tested on Linux-64 in a cluster environment as well as on Windows 11. If you are using Linux-64 in a cluster environment, please use source activate transfusion instead of conda activate transfusion.

mkdir ./checkpoints
mkdir ./data
mkdir ./inference
mkdir ./results
conda create -n transfusion python=3.8
conda activate transfusion
python -m pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
python -m pip install -r requirement.txt

2. Datasets

Datasets for Human3.6M, HumanEva-I and AMASS:

For Human3.6M and HumanEva-I, we adopt the data preprocessing from GSPS. For AMASS, we carefully adopt the data preprocessing from BeLFusion. We provide all the processed data here for convenience. Download all files into the ./data directory and the final ./data directory structure is shown below:

data
β”œβ”€β”€ data_3d_amass.npz
β”œβ”€β”€ data_3d_amass_test.npz
β”œβ”€β”€ data_3d_h36m.npz
β”œβ”€β”€ data_3d_h36m_test.npz
β”œβ”€β”€ data_3d_humaneva15.npz
β”œβ”€β”€ data_3d_humaneva15_test.npz
β”œβ”€β”€ data_multi_modal
β”‚   β”œβ”€β”€ data_candi_t_his25_t_pred100_skiprate20.npz
β”‚   └── t_his25_1_thre0.500_t_pred100_thre0.100_filtered_dlow.npz
└── humaneva_multi_modal
    β”œβ”€β”€ data_candi_t_his15_t_pred60_skiprate15.npz
    └── t_his15_1_thre0.500_t_pred60_thre0.010_index_filterd.npz

3. Pretrained Models

We provide the pretrained models for all three datasets here. Download all files into the ./checkpoints directory and the final ./checkpoints directory structure is shown below:

checkpoints
β”œβ”€β”€ humaneva_ckpt.pt
β”œβ”€β”€ h36m_ckpt.pt
└── amass_ckpt.pt

πŸ”Ž Evaluation

Evaluate on Human3.6M:

python main.py --cfg h36m --mode eval --ckpt ./checkpoints/h36m_ckpt.pt

Evaluate on HumanEva-I:

python main.py --cfg humaneva --mode eval --ckpt ./checkpoints/humaneva_ckpt.pt

Evaluate on AMASS:

python main.py --cfg amass --mode eval --ckpt ./checkpoints/amass_ckpt.pt --multimodal_threshold 0.4 --seed 6

Note: We change the random seed to 6 instead of 0 for AMASS dataset to fairly compared with BeLFusion. GPU is required for evaluation.

⏳ Training

For training TransFusion from scratch for all three datasets, run the following scripts:

python main.py --cfg h36m --mode train
python main.py --cfg humaneva --mode train
python main.py --cfg amass --mode train --multimodal_threshold 0.4 --seed 6 --milestone [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800]

πŸŽ₯ Visualization

Run the following scripts for visualization purpose:

python main.py --cfg h36m --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/h36m_ckpt.pt
python main.py --cfg humaneva --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/humaneva_ckpt.pt
python main.py --cfg amass --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/amass_ckpt.pt

🎞 Demos of Human Motion Prediction

More prediction demos can be found in ./assets.

Human3.6M -- Walking

Human3.6M -- Walk Together

Human3.6M -- Photo

Human3.6M -- Purchases

HumanEva-I -- Jog

HumanEva-I -- ThrowCatch

HumanEva-I -- Walking

HumanEva-I -- Gestures

AMASS -- DanceDB

AMASS -- DFaust

AMASS -- SSM

AMASS -- Transitions

🌹 Acknowledgment

Project structure is borrowed from HumanMAC. We would like to thank the authors for making their code publicly available.

πŸ“ Citation

If you find our work useful in your research, please consider citing our paper:

@article{tian2024transfusion,
  title={TransFusion: A practical and effective transformer-based diffusion model for 3d human motion prediction},
  author={Tian, Sibo and Zheng, Minghui and Liang, Xiao},
  journal={IEEE Robotics and Automation Letters},
  year={2024},
  volume={9},
  number={7},
  pages={6232-6239},
  publisher={IEEE}
}

πŸ“š License

The software in this repository is freely available for free non-commercial use (see license for further details).

About

[IEEE RA-L] Official repo for paper "TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages