Towards Visually Explaining Video Understanding Networks With Perturbation

This is a PyTorch demo implemented several visualization methods for video classification networks. The target is to provide a toolkit (as TorchRay to image) to interprete commonly utilized video classfication networks, such as I3D, R(2+1)D, TSM et al., which is also called attribution task, namely the problem of determining which part of the input video is responsible for the value computed by a neural network. More information can also be referenced in our paper Towards Visually Explaining Video Understanding Networks With Perturbation.

The current version supports attribution methods and video classification models as following:

Video classification models:

Pretrained on Kinetics-400: I3D, R(2+1)D, R3D, MC3, TSM;
Pretrained on EPIC-Kitchens (noun & verb): TSM.

Attribution methods:

Backprop-based: Gradients, Gradients x Inputs, Integrated Gradients;
Activation-based: GradCAM (does not support TSM now);
Perturbation-based:
- 2D-EP: An extended version of Entremal Perturbations on the video input that perturbs each frame separately and regularizes the perturbation area in each frame to the target ratio equally.
- 3D-EP: An extended version of Entremal Perturbations on the video input that perturbs across all frames and regularizes the whole perturbation area in all frames to the target ratio.
- STEP: Spatio-Temporal Extremal Perturbations with a special regularization term for the spatiotemporal smoothness in the video attribution results.

Requirements

Python 3.6.5 or greater
PyTorch 1.2.0 or greater
matplotlib==2.2.3
numpy==1.14.3
opencv_python==4.1.2.30
torchvision==0.4.0a0
torchray==1.0.0.2
tqdm==4.45.0
pandas==0.23.3
scikit_image==0.15.0
Pillow==7.1.2
scikit_learn==0.22.2.post1

Running the code

Inputs frames: Testing frames are provided in the directory ./test_data/$dataset_name$/sampled_frames
Outputs: The results will be defaultly saved to the directory ./visual_res/$vis_method$/$model$/$save_label$/.

Arguments

videos_dir: Directory for video frames. Frames belonging to one video should be put in one file under the directory, and the first part splited by '-' will be considered as label name.
model: Name of test model. Default is R(2+1)D, choices include R(2+1)D, R3D, MC3, I3D and TSM currently.
pretrain_dataset: Dataset name that test model pretrained on. Choices include 'kinetics', 'epic-kitchens-verb', 'epic-kitchens-noun'.
vis_method: Name of visualization methods. Choices include 'grad', 'grad*input', 'integrated_grad', 'grad_cam', '2d_ep', '3d_ep', 'step'.
save_label: Extra label for saving results. If given, visualization results will be saved in ./visual_res/$vis_method$/$model$/$save_label$.
no_gpu: If set, the demo will be run on CPU, else run on only one GPU.

Arguments for perturb:

num_iter: Number of iterations to get the perturbation results. Default is set to 2000 for better convergence.
perturb_area: Target area for preserving parts on input. Default is 0.1 (10% of all). Choices include [0.01, 0.02, 0.05, 0.1, 0.15, 0.2].

Arguments for gradient methods:

polarity: The polarity of showing gradients. Default is 'positive', which means the negative gradients will be set as 0 before visualization.

Examples

STEP + R(2+1)D (pretrained on Kinetics-400)

$ python main.py --videos_dir VideoVisual/test_data/kinetics/sampled_frames --model r2plus1d --pretrain_dataset kinetics --vis_method step --num_iter 2000 --perturb_area 0.1

3D-EP + TSM (pretrained on EPIC-Kitchens-noun)

$ python main.py --videos_dir VideoVisual/test_data/epic-kitchens-noun/sampled_frames --model tsm --pretrain_dataset epic-kitchens-noun --vis_method 3d_ep --num_iter 2000 --perturb_area 0.05

Integrated Gradients + I3D (pretrained on Kinetics-400)

$ python main.py --videos_dir VideoVisual/test_data/kinetics/sampled_frames --model i3d --pretrain_dataset kinetics --vis_method integrated_grad

Results

GIF visualization of perturbation results (on UCF101 and EPIC-Kitchens-Noun datasets by STEP)

Kinectis-400 (GT = ironing)

'Perturbation' denotes 3D-EP here.

EPIC-Kitchens-Noun (GT = cupboard)

'Perturbation' denotes 3D-EP here.

Reference

Ours paper for perturbation-based video attribution (Accepted by WACV2021):

@article{li2020towards,
  title={Towards Visually Explaining Video Understanding Networks with Perturbation},
  author={Li, Zhenqiang and Wang, Weimin and Li, Zuoyue and Huang, Yifei and Sato, Yoichi},
  journal={arXiv preprint arXiv:2005.00375},
  year={2020}
}

or

@InProceedings{Li_2021_WACV,
    author    = {Li, Zhenqiang and Wang, Weimin and Li, Zuoyue and Huang, Yifei and Sato, Yoichi},
    title     = {Towards Visually Explaining Video Understanding Networks With Perturbation},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2021},
    pages     = {1120-1129}
}

Intergated Gradients:

@inproceedings{sundararajan2017axiomatic,
  title={Axiomatic attribution for deep networks},
  author={Sundararajan, Mukund and Taly, Ankur and Yan, Qiqi},
  booktitle={ICML},
  year={2017},
}

GradCAM:

@inproceedings{selvaraju2017grad,
  title={Grad-cam: Visual explanations from deep networks via gradient-based localization},
  author={Selvaraju, Ramprasaath R and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv},
  booktitle={ICCV},
  year={2017}
}

Extremal Perturbation:

@inproceedings{fong2019understanding,
  title={Understanding deep networks via extremal perturbations and smooth masks},
  author={Fong, Ruth and Patrick, Mandela and Vedaldi, Andrea},
  booktitle={ICCV},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
datasets		datasets
figures		figures
model_def		model_def
model_param		model_param
test_data		test_data
utils		utils
visual_meth		visual_meth
.gitignore		.gitignore
README.md		README.md
main.py		main.py
path_dict.py		path_dict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Visually Explaining Video Understanding Networks With Perturbation

Video classification models:

Attribution methods:

Requirements

Running the code

Arguments

Examples

STEP + R(2+1)D (pretrained on Kinetics-400)

3D-EP + TSM (pretrained on EPIC-Kitchens-noun)

Integrated Gradients + I3D (pretrained on Kinetics-400)

Results

GIF visualization of perturbation results (on UCF101 and EPIC-Kitchens-Noun datasets by STEP)

Kinectis-400 (GT = ironing)

EPIC-Kitchens-Noun (GT = cupboard)

Reference

Ours paper for perturbation-based video attribution (Accepted by WACV2021):

Intergated Gradients:

GradCAM:

Extremal Perturbation:

About

Releases

Packages

Languages

shinkyo0513/Towards-Visually-Explaining-Video-Understanding-Networks-With-Perturbation

Folders and files

Latest commit

History

Repository files navigation

Towards Visually Explaining Video Understanding Networks With Perturbation

Video classification models:

Attribution methods:

Requirements

Running the code

Arguments

Examples

STEP + R(2+1)D (pretrained on Kinetics-400)

3D-EP + TSM (pretrained on EPIC-Kitchens-noun)

Integrated Gradients + I3D (pretrained on Kinetics-400)

Results

GIF visualization of perturbation results (on UCF101 and EPIC-Kitchens-Noun datasets by STEP)

Kinectis-400 (GT = ironing)

EPIC-Kitchens-Noun (GT = cupboard)

Reference

Ours paper for perturbation-based video attribution (Accepted by WACV2021):

Intergated Gradients:

GradCAM:

Extremal Perturbation:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages