You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello
Thank you friend for sharing your work and knowledge. I am sorry for asking these question but I am not familiar with
tensor flow at all.
Please could you clarify the following questions:
1- During the down stream task (action recognition) training, did you sample one clip from each training video using random
starting index ? If Yes, then at each epoch the total number of training videos would be equal to the size of the training split.
Or
Did you use temporal jittering during training? If Yes how many clips did you sample from each training video ?
What is the size of one epoch then ?
2- During down stream task evaluation, you mentioned in the paper that you used all the sub sequences of each testing video
in the test split to get the video level prediction.
What if the testing video length is not divisible by the clip length, then there would be extra frames that are not enough
to sample one clip ? What is your approach to over come this issue ?
For example: When the testing video has a 173 frames and the clip length is 16 frames then 10 non overlapping clips
can be sampled and 13 extra frames that are not enough to sample one clip are left over.
Thanks for your help
The text was updated successfully, but these errors were encountered:
The number of epochs is defined by the number of videos in the training set. I apply random temporal cropping during the preprocessing.
In the code I actually use a maximum of 32 clips from each video (see parameter num_test_seq=32 in the Preprocessor). All the clips are uniformly sampled over the duration of the video (the clips are often overlapping as a result).
Hello
Thank you friend for sharing your work and knowledge. I am sorry for asking these question but I am not familiar with
tensor flow at all.
Please could you clarify the following questions:
1- During the down stream task (action recognition) training, did you sample one clip from each training video using random
starting index ? If Yes, then at each epoch the total number of training videos would be equal to the size of the training split.
2- During down stream task evaluation, you mentioned in the paper that you used all the sub sequences of each testing video
in the test split to get the video level prediction.
What if the testing video length is not divisible by the clip length, then there would be extra frames that are not enough
to sample one clip ? What is your approach to over come this issue ?
Thanks for your help
The text was updated successfully, but these errors were encountered: