2025/02/24
: We have released both 256 and 512 model weights, and provided inference scripts. Check out our HuggingFace repo for the weights.2025/01/20
: Our paper has been published on ArXiv.
CatV2TON is a DiT-based method for Vision-Based Virtual Try-On (V2TON) with Temporal Concatenation of Video Frames and Garment Condition.
We provide the evaluation script for VITONHD and DressCode datasets. You can download our generated VITONHD and DressCode results to evaluate the performance of our method. Or you can infer your own results following the Inference section which may be slightly different due to the randomness of the inference process.
CUDA_VISIBLE_DEVICES=0 python eval_image_metrics.py \
--gt_folder YOUR_GT_FOLDER \
--pred_folder YOUR_PRED_FOLDER \
--batch_size 16 \
--num_workers 16 \
--paired
We provide the evaluation script for ViViD-S-Test and VVT-Test datasets. You can download our generated ViViD-S-Test and VVT-Test results to evaluate the performance of our method. Or you can infer your own results following the Inference section which may be slightly different due to the randomness of the inference process.
CUDA_VISIBLE_DEVICES=0 python eval_video_metrics.py \
--gt_folder YOUR_GT_FOLDER \
--pred_folder YOUR_PRED_FOLDER \
--num_workers 16 \
--paired
YOUR_GT_FOLDER
is the path to the ground truth video folder which includes only mp4
files.
YOUR_PRED_FOLDER
is the path to the predicted video folder which includes only mp4
files.
We provide the inference script for VITONHD and DressCode datasets.
The datasets can be downloaded from VITONHD and DressCode.
You can run the following command to do inference with some edited parameters for your own settings.
CUDA_VISIBLE_DEVICES=0 python eval_image_try_on.py \
--dataset vitonhd | dresscode \
--data_root_path YOUR_DATASET_PATH \
--output_dir OUTPUT_DIR_TO_SAVE_RESULTS \
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 42 \
--mixed_precision bf16 \
--allow_tf32 \
--repaint \
--eval_pair
The Video Try-On Test datasets are provided: ViViD-S-Test and VVT. You can run the following command to do inference with some edited parameters for your own settings.
CUDA_VISIBLE_DEVICES=0 python eval_video_try_on.py \
--dataset vivid | vvt \
--data_root_path YOUR_DATASET_PATH \
--output_dir OUTPUT_DIR_TO_SAVE_RESULTS \
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 42 \
--mixed_precision bf16 \
--allow_tf32 \
--repaint \
--eval_pair