-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to increase infering speed? #44
Comments
Hi @kszpxxzmc, Thanks for the feedback. As far as I see, most of the latency is due to initialization of the dynamic mask (here). Since this process runs on CPU, it could vary greatly across hardware. One simple way is to turn off the flow loss for the optimization (by adding I also noticed that the latency of feed-forward inference (5:37 for 890 pairs) is unusual. Based on my experience (and reports from other users, e.g., #10 (comment)), this should be done in less than one minute. You could probably try to set larger batchsize in the Best. |
thanks for your help on making it faster! could you comment on why you used sam2 to refine the mask, and not simply to init the mask instead? is it better that way? |
Hi @huddyyeo, Because SAM2 requires a prompt as input (point / box / mask), and we use our initialized mask as the prompt for SAM2 to refine. You could definitely use "click" to get the SAM2 mask for initialization, though this will not be a fully automated way. Another possible way is to use off-the-shelf motion segmentation method (e.g., https://github.com/TonyLianLong/RCF-UnsupVideoSeg) to get the initialized mask, or even use it as the prompt for SAM2. Thanks. |
thanks @Junyi42 for the quick reply 🙏 just to clarify, then what did you mean by passing sam2 mask to
|
Hi @huddyyeo, Sorry for the confusion. What I meant is that if one already has a better motion segmentation mask (via "click" for SAM2 or off-the-shelf motion segmentation methods), then you can load the segmentation mask with variable |
Thanks for your nice work!

I have a confusion regarding inference speed. In your paper, you claim that the inference time of Monst3r on the A6000 is about 90 seconds. I conducted a practical test with 94 images on the A100 and found that the inference time for the whole process is more than 1 hour. I want to know why it is so slow and how I can improve inference speed, even by sacrificing some video memory.
The text was updated successfully, but these errors were encountered: