How to increase infering speed？ #44

kszpxxzmc · 2024-11-13T14:50:40Z

Thanks for your nice work!
I have a confusion regarding inference speed. In your paper, you claim that the inference time of Monst3r on the A6000 is about 90 seconds. I conducted a practical test with 94 images on the A100 and found that the inference time for the whole process is more than 1 hour. I want to know why it is so slow and how I can improve inference speed, even by sacrificing some video memory.

Junyi42 · 2024-11-13T17:43:23Z

Hi @kszpxxzmc,

Thanks for the feedback. As far as I see, most of the latency is due to initialization of the dynamic mask (here). Since this process runs on CPU, it could vary greatly across hardware. One simple way is to turn off the flow loss for the optimization (by adding --flow_loss_weight=0.0) though it may degrade the performance. You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks.

I also noticed that the latency of feed-forward inference (5:37 for 890 pairs) is unusual. Based on my experience (and reports from other users, e.g., #10 (comment)), this should be done in less than one minute. You could probably try to set larger batchsize in the demo.py. Hope this helps!

Best.

huddyyeo · 2024-11-14T18:15:07Z

thanks for your help on making it faster! could you comment on why you used sam2 to refine the mask, and not simply to init the mask instead? is it better that way?

Junyi42 · 2024-11-14T18:23:34Z

thanks for your help on making it faster! could you comment on why you used sam2 to refine the mask, and not simply to init the mask instead? is it better that way?

Hi @huddyyeo,

Because SAM2 requires a prompt as input (point / box / mask), and we use our initialized mask as the prompt for SAM2 to refine. You could definitely use "click" to get the SAM2 mask for initialization, though this will not be a fully automated way. Another possible way is to use off-the-shelf motion segmentation method (e.g., https://github.com/TonyLianLong/RCF-UnsupVideoSeg) to get the initialized mask, or even use it as the prompt for SAM2.

Thanks.

huddyyeo · 2024-11-14T18:41:52Z

thanks @Junyi42 for the quick reply 🙏 just to clarify, then what did you mean by passing sam2 mask to self.dynamic_masks in here? since we cannot just init the mask via sam2

You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks.

Junyi42 · 2024-11-14T23:09:30Z

thanks @Junyi42 for the quick reply 🙏 just to clarify, then what did you mean by passing sam2 mask to self.dynamic_masks in here? since we cannot just init the mask via sam2

You could also try to use the mask from SAM2 model for this motion mask initialization by parsing the SAM2 mask to the self.dynamic_masks.

Hi @huddyyeo,

Sorry for the confusion. What I meant is that if one already has a better motion segmentation mask (via "click" for SAM2 or off-the-shelf motion segmentation methods), then you can load the segmentation mask with variable self.dynamic_masks. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to increase infering speed？ #44

How to increase infering speed？ #44

kszpxxzmc commented Nov 13, 2024

Junyi42 commented Nov 13, 2024

huddyyeo commented Nov 14, 2024

Junyi42 commented Nov 14, 2024

huddyyeo commented Nov 14, 2024

Junyi42 commented Nov 14, 2024

How to increase infering speed？ #44

How to increase infering speed？ #44

Comments

kszpxxzmc commented Nov 13, 2024

Junyi42 commented Nov 13, 2024

huddyyeo commented Nov 14, 2024

Junyi42 commented Nov 14, 2024

huddyyeo commented Nov 14, 2024

Junyi42 commented Nov 14, 2024