Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to increase infering speed? #44

Open
kszpxxzmc opened this issue Nov 13, 2024 · 5 comments
Open

How to increase infering speed? #44

kszpxxzmc opened this issue Nov 13, 2024 · 5 comments

Comments

@kszpxxzmc
Copy link

Thanks for your nice work!
I have a confusion regarding inference speed. In your paper, you claim that the inference time of Monst3r on the A6000 is about 90 seconds. I conducted a practical test with 94 images on the A100 and found that the inference time for the whole process is more than 1 hour. I want to know why it is so slow and how I can improve inference speed, even by sacrificing some video memory.
1731509344688

@Junyi42
Copy link
Owner

Junyi42 commented Nov 13, 2024

Hi @kszpxxzmc,

Thanks for the feedback. As far as I see, most of the latency is due to initialization of the dynamic mask (here). Since this process runs on CPU, it could vary greatly across hardware. One simple way is to turn off the flow loss for the optimization (by adding --flow_loss_weight=0.0) though it may degrade the performance. You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks.

I also noticed that the latency of feed-forward inference (5:37 for 890 pairs) is unusual. Based on my experience (and reports from other users, e.g., #10 (comment)), this should be done in less than one minute. You could probably try to set larger batchsize in the demo.py. Hope this helps!

Best.

@huddyyeo
Copy link

thanks for your help on making it faster! could you comment on why you used sam2 to refine the mask, and not simply to init the mask instead? is it better that way?

@Junyi42
Copy link
Owner

Junyi42 commented Nov 14, 2024

thanks for your help on making it faster! could you comment on why you used sam2 to refine the mask, and not simply to init the mask instead? is it better that way?

Hi @huddyyeo,

Because SAM2 requires a prompt as input (point / box / mask), and we use our initialized mask as the prompt for SAM2 to refine. You could definitely use "click" to get the SAM2 mask for initialization, though this will not be a fully automated way. Another possible way is to use off-the-shelf motion segmentation method (e.g., https://github.com/TonyLianLong/RCF-UnsupVideoSeg) to get the initialized mask, or even use it as the prompt for SAM2.

Thanks.

@huddyyeo
Copy link

thanks @Junyi42 for the quick reply 🙏 just to clarify, then what did you mean by passing sam2 mask to self.dynamic_masks in here? since we cannot just init the mask via sam2

You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks.

@Junyi42
Copy link
Owner

Junyi42 commented Nov 14, 2024

thanks @Junyi42 for the quick reply 🙏 just to clarify, then what did you mean by passing sam2 mask to self.dynamic_masks in here? since we cannot just init the mask via sam2

You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks.

Hi @huddyyeo,

Sorry for the confusion. What I meant is that if one already has a better motion segmentation mask (via "click" for SAM2 or off-the-shelf motion segmentation methods), then you can load the segmentation mask with variable self.dynamic_masks. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants