https://aistudio.baidu.com/aistudio/education/group/info/1340
This is a PaddlePaddle Implementation of the paper
Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh, "Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs", arXiv preprint, arXiv:2004.04968, 2020.
- PaddlePaddle (1.8.3 required)
python -m pip install paddlepaddle-gpu==1.8.3.post97 -i https://mirror.baidu.com/pypi/simple
-
FFmpeg, FFprobe
-
Python 3
Pre-trained models are available here.
pretrain.pdparams: --model resnet --model_depth 50 --n_pretrain_classes 1039
- Download videos and train/test splits here.
- Convert from avi to jpg files using
util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path ucf101
- Generate annotation file in json format similar to ActivityNet using
util_scripts/ucf101_json.py
annotation_dir_path
includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt
python -m util_scripts.ucf101_json annotation_dir_path jpg_video_dir_path dst_json_path
Assume the structure of data directories is the following:
~/
data/
UCF-jpg/
.../ (directories of class names)
.../ (directories of video names)
... (jpg files)
UCF_annotation/
ucf101_01.json
results/
val.json
Confirm all options.
python main.py --root_path ~/ --video_path data/UCF-jpg --annotation_path data/UCF_json/ucf101_01.json \
--result_path results --dataset ucf101 --model resnet --n_pretrain_classes 1039 \
--pretrain_path data/pretrain --model_depth 50 --n_classes 101 --batch_size 128 \
--checkpoint 5 --n_epochs 20 --learning_rate 0.003 --train_crop 'random' --lr_scheduler multistep\
--inference --inference_batch_size 1
Evaluate top-1 video accuracy of a recognition result (~/results/val.json).
# 计算top1 accuracy
python -m util_scripts.eval_accuracy --ground_truth_path data/UCF_json/ucf101_01.json \
--result_path results/val_random.json --subset validation --k 1 --ignore --save