cd framebridge-cogvideox
conda create -n framebridge-cogvideox python=3.10
conda activate framebridge-cogvideox
pip install -r requirements.txt
First download the CogVideoX-2B model. For our I2V fine-tuning, we slightly modify the transformer of CogVideoX (doubling the number of input channels for the input layer to receive image input). You can directly download our modified version CogVideoX-2B-modified.
Download the metadata file results_2M_train.csv
of WebVid-2M and put all videos in the folder 2M_train
with the following structure:
2M_train/
├── 00001.mp4
├── 00002.mp4
| ......
└── *****.mp4
Modify the following path arguments in finetune_single_rank_i2v_bridge_2b.sh
:
MODEL_PATH
: path to CogVideoX-2B-modifiedDATASET_PATH
: path toresults_2M_train.csv
--video_folder
: path to2M_train
Run the script:
bash finetune_single_rank_i2v_bridge_2b.sh
The fine-tuned FrameBridge-CogVideoX model can be downloaded from the Google Drive link. (Unfortunately, due to limited computational resources and dataset quality, the performance of FrameBridge model is not as satisfactory as the official I2V version of CogVideoX.)
Before running inference, update the following arguments in sample.sh
:
--model_path
: path to the original CogVideoX-2B model or CogVideoX-2B-modified (either option is ok as the transformer will be reloaded with the fine-tuned bridge model)--image_or_video_path
: path to the image prompt--transformer_path
: path to the FrameBridge-CogVideoX model (or the transformer subfolder from fine-tuned models)
Run the script:
bash sample.sh
cd framebridge-latte
conda create -n framebridge-latte python=3.9
cconda activate framebridge-latte
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Download UCF-101 dataset.
Download VAE model stabilityai/sd-vae-ft-ema and put the files into the folder checkpoints/vae
.
We have three different config files for training different models:
configs/ucf101/ucf101_train_bridge.yaml
: training vanilla FrameBridgeconfigs/ucf101/ucf101_train_nwarp.yaml
: training the neural prior modelconfigs/ucf101/ucf101_train_bridge_nwarp.yaml
: training FrameBridge with neural prior
Choose the corresponding config file based on the model you want to train, and set the path arguments inside:
data_path
: path to the extracted UCF101 folderpretrained_model_path
: path to thecheckpoints
folder which include the downloaded VAE.
If you want to train FrameBridge with neural prior, you also need to set:
nwarp_config
: path toconfigs/ucf101/ucf101_sample_nwarp.yaml
nwarp_ckpt
: path to checkpoint of trained neural prior model
# Train vanilla FrameBridge
bash train_scripts/ucf101_bridge_train.sh
# Train neural prior model
bash train_scripts/ucf101_nwarp_train.sh
# Train FrameBridge with neural prior
bash train_scripts/ucf101_bridge_nwarp_train.sh
You can download FrameBridge checkpoints from the Google Drive link:
There are also different config files for sampling with different models:
configs/ucf101/ucf101_sample_bridge.yaml
: sampling with vanilla FrameBridgeconfigs/ucf101/ucf101_sample_bridge_nwarp.yaml
: sampling with FrameBridge with neural prior
Similarly,
- set
pretrained_model_path
to the path of thecheckpoints
folder containing the VAE - set
data_path
to the UCF-101 folder (the UCF-101 dataset is needed to obtain image prompts during sampling).
To use FrameBridge with neural prior, you also need to set:
nwarp_config
: path toconfigs/ucf101/ucf101_sample_nwarp.yaml
nwarp_ckpt
: path to checkpoint of trained neural prior model
in the config file configs/ucf101/ucf101_sample_bridge_nwarp.yaml
.
Set --ckpt
arguments in the corresponding script with downloaded or trained checkpoint, and run the script:
# vanilla FrameBridge
bash sample/ucf101_bridge.sh
# FrameBridge with neural prior
bash sample/ucf101_bridge_neural_prior.sh
This repository is built upon the excellent work of CogVideoX, cogvideox-factory, Latte, and DynamiCrafter. We sincerely thank the authors and contributors of these projects.