[IEEE RA-L] TransFusion

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Sibo Tian¹, Minghui Zheng^1,*, Xiao Liang^2,*

¹J. Mike Walker ’66 Department of Mechanical Engineering, Texas A&M University, ²Zachry Department of Civil and Environmental Engineering, Texas A&M University, ^*Corresponding Authors

[IEEE Xplore] | [Code]

Predicting human motion plays a crucial role in ensuring a safe and effective human-robot close collaboration in intelligent remanufacturing systems of the future. Existing works can be categorized into two groups: those focusing on accuracy, predicting a single future motion, and those generating diverse predictions based on observations. The former group fails to address the uncertainty and multi-modal nature of human motion, while the latter group often produces motion sequences that deviate too far from the ground truth or become unrealistic within historical contexts. To tackle these issues, we propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction which can generate samples that are more likely to happen while maintaining a certain level of diversity. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. Additionally, we employ the discrete cosine transform to model motion sequences in the frequency space, thereby improving performance. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization to condition the prediction on past observed motion, we treat all inputs, including conditions, as tokens to create a more practical and effective model compared to existing approaches. Extensive experimental studies are conducted on benchmark datasets to validate the effectiveness of our human motion prediction model.

📢 News

[2024/05/23]: Code released!

[2024/04/28]: Our work is accepted by IEEE Robotics and Automation Letters (RA-L)!

[2024/03/25]: TransFusion prediction demos released!

🛠 Setup

1. Python/Conda Environment

The following code is tested on Linux-64 in a cluster environment as well as on Windows 11. If you are using Linux-64 in a cluster environment, please use source activate transfusion instead of conda activate transfusion.

mkdir ./checkpoints
mkdir ./data
mkdir ./inference
mkdir ./results
conda create -n transfusion python=3.8
conda activate transfusion
python -m pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
python -m pip install -r requirement.txt

2. Datasets

Datasets for Human3.6M, HumanEva-I and AMASS:

For Human3.6M and HumanEva-I, we adopt the data preprocessing from GSPS. For AMASS, we carefully adopt the data preprocessing from BeLFusion. We provide all the processed data here for convenience. Download all files into the ./data directory and the final ./data directory structure is shown below:

data
├── data_3d_amass.npz
├── data_3d_amass_test.npz
├── data_3d_h36m.npz
├── data_3d_h36m_test.npz
├── data_3d_humaneva15.npz
├── data_3d_humaneva15_test.npz
├── data_multi_modal
│   ├── data_candi_t_his25_t_pred100_skiprate20.npz
│   └── t_his25_1_thre0.500_t_pred100_thre0.100_filtered_dlow.npz
└── humaneva_multi_modal
    ├── data_candi_t_his15_t_pred60_skiprate15.npz
    └── t_his15_1_thre0.500_t_pred60_thre0.010_index_filterd.npz

3. Pretrained Models

We provide the pretrained models for all three datasets here. Download all files into the ./checkpoints directory and the final ./checkpoints directory structure is shown below:

checkpoints
├── humaneva_ckpt.pt
├── h36m_ckpt.pt
└── amass_ckpt.pt

🔎 Evaluation

Evaluate on Human3.6M:

python main.py --cfg h36m --mode eval --ckpt ./checkpoints/h36m_ckpt.pt

Evaluate on HumanEva-I:

python main.py --cfg humaneva --mode eval --ckpt ./checkpoints/humaneva_ckpt.pt

Evaluate on AMASS:

python main.py --cfg amass --mode eval --ckpt ./checkpoints/amass_ckpt.pt --multimodal_threshold 0.4 --seed 6

Note: We change the random seed to 6 instead of 0 for AMASS dataset to fairly compared with BeLFusion. GPU is required for evaluation.

⏳ Training

For training TransFusion from scratch for all three datasets, run the following scripts:

python main.py --cfg h36m --mode train

python main.py --cfg humaneva --mode train

python main.py --cfg amass --mode train --multimodal_threshold 0.4 --seed 6 --milestone [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800]

🎥 Visualization

Run the following scripts for visualization purpose:

python main.py --cfg h36m --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/h36m_ckpt.pt

python main.py --cfg humaneva --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/humaneva_ckpt.pt

python main.py --cfg amass --mode pred --vis_row 3 --vis_col 10 --ckpt ./checkpoints/amass_ckpt.pt

🎞 Demos of Human Motion Prediction

More prediction demos can be found in ./assets.

Human3.6M -- Walking

Human3.6M -- Walk Together

Human3.6M -- Photo

Human3.6M -- Purchases

HumanEva-I -- Jog

HumanEva-I -- ThrowCatch

HumanEva-I -- Walking

HumanEva-I -- Gestures

AMASS -- DanceDB

AMASS -- DFaust

AMASS -- SSM

AMASS -- Transitions

🌹 Acknowledgment

Project structure is borrowed from HumanMAC. We would like to thank the authors for making their code publicly available.

📝 Citation

If you find our work useful in your research, please consider citing our paper:

@article{tian2024transfusion,
  title={TransFusion: A practical and effective transformer-based diffusion model for 3d human motion prediction},
  author={Tian, Sibo and Zheng, Minghui and Liang, Xiao},
  journal={IEEE Robotics and Automation Letters},
  year={2024},
  volume={9},
  number={7},
  pages={6232-6239},
  publisher={IEEE}
}

📚 License

The software in this repository is freely available for free non-commercial use (see license for further details).

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets		assets
cfg		cfg
data_loader		data_loader
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[IEEE RA-L] TransFusion

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

📢 News

🛠 Setup

1. Python/Conda Environment

2. Datasets

3. Pretrained Models

🔎 Evaluation

⏳ Training

🎥 Visualization

🎞 Demos of Human Motion Prediction

Human3.6M -- Walking

Human3.6M -- Walk Together

Human3.6M -- Photo

Human3.6M -- Purchases

HumanEva-I -- Jog

HumanEva-I -- ThrowCatch

HumanEva-I -- Walking

HumanEva-I -- Gestures

AMASS -- DanceDB

AMASS -- DFaust

AMASS -- SSM

AMASS -- Transitions

🌹 Acknowledgment

📝 Citation

📚 License

About

Releases

Packages

Languages

License

sibotian96/TransFusion

Folders and files

Latest commit

History

Repository files navigation

[IEEE RA-L] TransFusion

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

📢 News

🛠 Setup

1. Python/Conda Environment

2. Datasets

3. Pretrained Models

🔎 Evaluation

⏳ Training

🎥 Visualization

🎞 Demos of Human Motion Prediction

Human3.6M -- Walking

Human3.6M -- Walk Together

Human3.6M -- Photo

Human3.6M -- Purchases

HumanEva-I -- Jog

HumanEva-I -- ThrowCatch

HumanEva-I -- Walking

HumanEva-I -- Gestures

AMASS -- DanceDB

AMASS -- DFaust

AMASS -- SSM

AMASS -- Transitions

🌹 Acknowledgment

📝 Citation

📚 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages