This is the official PyTorch implementation of the paper "Realistic Human Motion Generation with Cross-Diffusion Models". Our method leverages intricate 2D motion knowledge and builds a cross-diffusion mechanism to enhance 3D motion generation.
This code has been tested with Python 3.8 and PyTorch 1.11.
conda create -n crossdiff python=3.8
conda activate crossdiff
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
Execute the following script to download the necessary materials:
mkdir data/
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
Run the script below to download the pre-trained model:
bash prepare/download_pretrained_models.sh
HumanML3D - Follow the instructions provided in the HumanML3D. Afterward, execute the following command to obtain the corresponding 2D motion:
python prepare/project.py --data_root YOUR_DATA_ROOT
Additionally, please set the data_root
in the configuration file configs/base.yaml
for subsequent training.
UCF101 - This dataset is used to train the model with real-world 2D motion.
Download the original data from the UCF101 project page. Then, estimate the 2D pose using the off-the-shelf model ViTPose and process the 2D data in the same manner as HumanML3D.
For the first stage, execute the following command:
python train.py --cfg configs/crossdiff_pre.yaml
The results will be stored in ./save/crossdiff_pre
. Locate the best checkpoint and set the resume_checkpoint
in configs/crossdiff_finetune.yaml
.
For the second stage, run:
python train.py --cfg configs/crossdiff_finetune.yaml
The final results will be saved in ./save/crossdiff_finetune
After training, run the following command to test the model:
python test.py --cfg configs/crossdiff_finetune.yaml
By default, the code will use the final model for testing. Alternatively, you can set the test_checkpoint
in the configuration file to test a specific model.
You may also configure the following options:
test_mm
: Test Multimodality.eval_part
: Choose fromall
,upper
, orlower
to test metrics for different body parts.
To generate motion from text, use:
python generate.py --cfg configs/crossdiff_finetune.yaml test_checkpoint=./data/checkpoints/pretrain.pt
You can edit the text in the configuration file using the captions
parameter. The output will be saved in ./save/crossdiff_finetune/eval
. Then, execute:
python fit_smpl.py -f YOUR_KEYPOINT_FILE
This will fit the selected .npy
file of body keypoints, and you will obtain the mesh file _mesh.npy
.
For visualizing SMPL results, refer to MLD-Visualization and TEMOS-Rendering motions for Blender setup.
Run the following command to visualize SMPL:
blender --background --python render_blender.py -- --file=YOUR_MESH_FILE
We express our gratitude to MDM, MLD, T2M-GPT, TEMOS. Our code is partially adapted from their work.
If you find this code useful in your research, please cite:
@article{ren2023realistic,
title={Realistic Human Motion Generation with Cross-Diffusion Models},
author={Ren, Zeping and Huang, Shaoli and Li, Xiu},
journal={arXiv preprint arXiv:2312.10993},
year={2023}
}