Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using temporal context network, my labeled data is the first or last frame, is this okay? #203

Open
wyclearnpy opened this issue Oct 11, 2024 · 6 comments

Comments

@wyclearnpy
Copy link

When I use the temporal context network, the labeled data I have is the first frame or the last frame of the video. At this time, it does not satisfy [t-2, t-1, t, t+1, t+2] and must Do you want to modify the data? Is there any other solution?
The following is the terminal printing error
RuntimeError: Not enough valid frames to make a context representation.

@themattinthehatt
Copy link
Collaborator

@wyclearnpy there shouldn't be any issue with you labeled data. If you are using frame 0, for example, the code will automatically load frames [0, 0, 0, 1, 2]. If you are using the last frame (say n), the code will automatically load frames [n-2, n-1, n, n, n].

The error that you're getting appears to be from the unlabeled data - are you fitting a semi-supervised model? If so, I would first recommend fitting a supervised context model to make sure the above error does not appear (don't need to train it out fully). If that works, then the issue is that your unlabeled batch size is too small. Since the context model requires two frames before and after the frame that is being processed, if your unlabeled batch size (in the config file under dali.context.train.batch_size) is <=4 you'll get this error. So it needs to be at least 5.

I'll make that error message more descriptive, thanks for the flag. Let me know how it goes!

@themattinthehatt
Copy link
Collaborator

@wyclearnpy just wanted to check in on this and see if you were able to train with an increased unlabeled batch size?

@wyclearnpy
Copy link
Author

Yes, increasing it to 5 allowed it to run successfully, mainly due to insufficient device memory. I'm really looking forward to using multiple GPUs for training. Additionally, the performance of the semi-supervised method seems to be slightly worse compared to the supervised method. What could be the reasons for this? In the semi-supervised setup, my image size is 256, while in the supervised setup, it is 384.
labeled_img

@themattinthehatt
Copy link
Collaborator

@wyclearnpy glad you were able to get the unsupervised model training. One comment is that to really compare the semi-supervised and supervised models you should set the resizing to be the same, otherwise the comparison between the two will be confounded by that (very important) factor. So maybe you could try super vs semi-super on 256x256 frames first, just to get an idea of how they compare?

I have a few other questions:

  • how big are your original images?
  • how much GPU memory do you have?
  • how many labeled frames do you have?

@wyclearnpy
Copy link
Author

My original image size is 1280x1024, gup memory is two 11g 1080ti servers, but it can only be trained with a single gpu, my label image has a total of 420 frames

@wyclearnpy
Copy link
Author

Also, I'm trying semi-supervised training in 384x384 size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants