Two-stream ConvNet has been recognized as one of the most deep ConvNet on video understanding, specifically human action recogniton. However, it suffers from the insufficient temporal datas for training.
This repository aims to implement the temporal segments RNN for training on vidoes with temporal augmentation. The implementation is based on example code from fb.resnet.torch, and was largely modified in order to work with frame level features.
Pre-saved features generated from ResNet-101 is provided.
The start code provided here should be relatively easy to adapt for other dataset. For example:
I re-trained the two-stream ConvNet using pre-trained ResNet-101 on the UCF101 datasets. Please download the frame level features from the links below.
The features are coming soon.
UCF-101 split 1
You can certainly generate features for split 2 and 3 by rearranging the features according the split list provided by UCF101.
Specify the downloaded features and the types of RNN model you would like to use in opt.lua
.
th main.lua
Please cite our paper, if you think the codes are useful.
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition
@article{ma2017tslstm,
title={TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition},
author={Ma, Chih-Yao and Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan},
journal={arXiv preprint arXiv:1703.10667},
year={2017}
}