PVCD-Pytorch

This repo includes the implementations of the Neural Networks for partial video copy detection.

[1] Temporal Relational Reasoning Network: http://relation.csail.mit.edu/

Model input

The protocol to generate the input for CNN

A video -> total_frames -> num_segments -> d_frames -> k * d_frames, where d_frames means the frame pairs such as 2-frames, 3-frames. K_random stands for the number of subsamples of d_frames.

In particular, a VideoRecord = [image_list_path, total_frames, classIDx] e.g., The VideoRecord [v_CricketShot_g08_c01, 75, 0] means that the video 'v_CricketShot_g08_c01' has a total of 75 frames. After that, we select 'num_segments' = 30 evenly spaced samples for generating the number of segments. From num_segments (i.e., 30 frames), we then generate a combination of d-frames (e.g., 2_frames, 3_frames) like (0, 1), (0, 2) (0,3) ... Specifically, the total number of the combination of 2_frames is 30! / 2! / (30-2)!). In order to train efficiently, we only select k subsamples over the overall combinations for efficient training. These steps are repeated for the other d_frames.

As a result, the input of data to feed the model is formated [batch_size, num_segments, num_channels, img_width, img_width]. The k * d_frames is then generated by TRN Module within the model.

VCDB - Dataset

[1] https://fvl.fudan.edu.cn/dataset/vcdb/list.htm

This original dataset [1] contains 528 videos included 28 classes. For each video, we have used a GUI software to cut short video segments (e.g., 2-4 seconds) that served as positive videos for training. However, there are some videos containing different visual content within the same class. Thus, we have removed those inconsistent videos and obtained 454 short videos of 28 classes.

Dataset preparation

From those videos (i.e., 454 videos), we splitted them into three parts for training, validation and testing. In particular, we used 70% of the data for training. For remaining (30%) data, we divided equally them into two parts containing 50% of the total in 30%. More precise, the total numbers of videos that use for training, evaluation and testing are 317, 68, 69, respectively. The 'prepare_vcdb_dataset.py' illustrates the pre-proccesing steps.

Extract frames

The frames of all videos are extracted by using FFmpeg library. Thus, a video contains 'num_frames' / 'total_frames'.

Training

The image features was extracted from the pre-trained ResNet50 (2048-D) model.

We have trained the TRN network with the following paramters:

num_segments=30, d_frames=8, k_random=3, num_classes=28, image_size=224,

batch_size=4, num_workers=2, learning_rate=1e-3, momentum=0.9, weight_decay=5e-4, step_size=10, num_epochs=50, dropout=0.6, img_feature_dim=2048

Result

For testing, the accuracy reached: 4/69 -> it seems like the model was overfitting.

SVD - Dataset

[2] https://svdbase.github.io/

Update git after ignoring redundant files

git rm -rf --cached .

git add .

git commit -m ".gitignore is now working"

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
chip		chip
data		data
model		model
model_assets		model_assets
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
create_dataset.py		create_dataset.py
demo.py		demo.py
draft.py		draft.py
extract_frames.py		extract_frames.py
plot.py		plot.py
prepare_vcdb_dataset.py		prepare_vcdb_dataset.py
test.py		test.py
train.py		train.py
transforms.py		transforms.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PVCD-Pytorch

Model input

VCDB - Dataset

Dataset preparation

Extract frames

Training

Result

SVD - Dataset

Update git after ignoring redundant files

About

Releases

Packages

Languages

vanhao-le/PVCD-Pytorch

Folders and files

Latest commit

History

Repository files navigation

PVCD-Pytorch

Model input

VCDB - Dataset

Dataset preparation

Extract frames

Training

Result

SVD - Dataset

Update git after ignoring redundant files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages