Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

are there any preprocessing to the input video clips? #35

Closed
FesianXu opened this issue Apr 29, 2020 · 10 comments
Closed

are there any preprocessing to the input video clips? #35

FesianXu opened this issue Apr 29, 2020 · 10 comments

Comments

@FesianXu
Copy link

Hi, great work and gave my lots of help ! However, I still need some help.
I am really not familiar with caffe2 and could not find out whether the caffe2 version IG65M model used any pre-processing to the input video clips or not.
In my experiment,I just simply normalized the pixel to [0,1]. but the performance didn't look very good (about 92% on ucf101, with ig65m pretrained model, I did some finetune on ucf101, or the performance even worse)。 So I wonder if we need to do some specify pre-processing to the video clips like substract the means or somethings else ?
Thanks for your attention and kindly help :)

@daniel-j-h
Copy link
Member

Yeap, check the extract tool

Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),

@FesianXu
Copy link
Author

@daniel-j-h Thanks sooooo much for your rapid reply :) have a nice day

@daniel-j-h
Copy link
Member

You too! 🤗

@FesianXu
Copy link
Author

@daniel-j-h Hi daniel, sorry for bothering you again, I tried to normalize the input video clips following the code you shown me yesterday, and I found this didn't work when I evaluated the Kinetics 400 database. I wonder if I was wrong in the channel order, Is the order RGB or BGR ? thanks

@FesianXu
Copy link
Author

FesianXu commented Apr 30, 2020

also, I had tried to use torchvision.transforms.Normalize to normalize with the same means and stds you provided frame by frame, instead of using the code you provided (simply appended your code into my projection seems make some problem so I used the library methods instead), Would it be the root of the problem? BTW, I would like to know have you evaluate the result on kinetics-400 and could you reach the accuracy the paper claimed ? thanks

PS: to provide more detail of my pre-processing, the methods looks like:

self.transform = transforms.Compose(
    [
        transforms.ToPILImage(),   # the latter resize need to use PIL image
        transforms.Resize(size=(128,171)),
        transforms.CenterCrop((112,112)),
        transforms.ToTensor()
    ]
)

the model i was using is r2puls1d_34_8_kinetics, I think it should be fine-tuned on kinetics 400. And I just wanted to evaluate the kinetics 400. :)

PSS: I used torchvision.io.read_video() to decode the .mp4 video in database, but I am not sure whether it would be the problem if I didn't use opencv to decode it. (I think both of them use ffmpeg)

@FesianXu
Copy link
Author

FesianXu commented May 2, 2020

I had solved this problem. Thanks for your attention.

@yushuinanrong
Copy link

@FesianXu
Could you share your solution to solve the problem? I encountered a similar problem. Also, could you share your validation results on Kinetics400?

@FesianXu
Copy link
Author

@yushuinanrong check this link for the validation results on kinetics 400 #2 (comment)

I just normalize the RGB clips (in the right order of RGB channel) into pixel value from 0 to 1. And then substract their means and then divide them by the std. the mean and std are

Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),

@yushuinanrong
Copy link

@FesianXu
Thank you!

@Yueeeeee-1
Copy link

hi, do you performing the fine-tuned on UCF101 dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants