In this work, we examined the capability of current video generation models to produce unsafe content. We compiled a dataset of 2,112 unsafe videos using unsafe prompts. Using this dataset, we developed a defense approach called the Latent Variable Defense (LVD) to mitigate these risks.
This repository contains:
- Introducing how to generate a training dataset
- Code for training detection model.
- Providing code for evaluating LVD performance.
We plan to release our unsafe video dataset soon. If you have more interest, please feel free to contact me.
Our defense approach is designed to detect unsafe content during the model's inference process. This requires collecting the model's predictions of
In addition to configuring the model's environment, we also need to set up a base environment for training the detection model.
pip install -r requirements.txt
Here, we first discuss how to generate training data for the detection model and introduce how to use this data for training.
First, you need to prepare a prompt.txt
file and have access to the config
file. In this file, specify the path to your prompt.txt
file and the desired path to save pred_x0.pth
. The pred_x0.pth
file contains the data required for the detection model.
After obtaining pred_x0.pth
, you can use build_x0.py
to generate a directory (default is called train_detector_data
under MagicTime directory) containing all predicted
Run:
python build_x0.py --config build_x0.yaml \
--label_file_dir "Your label file directory"
Note: The label file devided the generated unsafe videos according to their index. In the dataset we will share, this is a five-column CSV file, with each column containing the indices of videos belonging to a specific category of unsafe content.
Before training, we need to generate an evaluation dataset to assess the defense effectiveness of the trained LVD. Run the gen_eval_data.py
script to generate the eval.pth
file. This file is saved in the MagicTime directory by default.
python gen_eval_set.py
Then run build_eval_set.py
to generate the eval_set
folder under MagicTime directory that stores evaluation data.
python build_eval_set.py
After completing all the above steps, we can start training the detection model.
python train_mae.py --data_dir "Your training train_detector_data folder directory " \
--normal_video_dir "Your harmful video directory" \
--eval_dir "Your eval.pth directory" \
--group_num 1 \
--train \
--epoch 10 \
--step_num 0 \
--gpu_num 0 \
--save_dir "Where you want to save the model checkpoint"
Note: Including
eval.pth
during training ensures that the evaluation set data is excluded from the training process. Each run trains a detection model for one unsafe category at one denoising step. Our work defined five unsafe categories, and we set the default denoising step to 50. Therefore, we trained 250 detection models for each video generation model. The training label unsafe video and class 1. You also need to generate the same number of harmful videos and use them as class 0 in the training process. We used InternVid captions to synthesize normal video in our work.
In this section, we will evaluate the effectiveness of our LVD against unsafe generation models in two steps.
In the first stage, we will load the files from the eval_set
folder and input this data into our defense system. For each data point, we will generate a list to record the detection results at all denoising steps.
python embed_model.py --model_dir "Saving model directory" \
--eval_dir "Eval_set directory" \
--data_dir "Normal_video_file_directory" \
--save_dir "Saving evaluation results directory"
Then based on the different test_accuracy.py
to see LVD profermance.
python test_accuracy.py --data_dir "Your evaluation results directory" \
--set_lambda 0.6 \
--eta 20
@misc{pang2024understandingunsafevideogeneration,
title={Towards Understanding Unsafe Video Generation},
author={Yan Pang and Aiping Xiong and Yang Zhang and Tianhao Wang},
year={2024},
eprint={2407.12581},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2407.12581},
}