Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmenting Video Processing and Manually Setting Camera Intrinsics #22

Open
Rosetta-Leong opened this issue Oct 23, 2024 · 4 comments
Open

Comments

@Rosetta-Leong
Copy link

Hi Junyi!

Firstly, thank you for your outstanding work on Monst3R! I encountered an issue due to GPU memory limitations when processing videos with a larger number of frames. Currently, Monst3R can only run successfully with 65 frames, requiring approximately 33GB of VRAM. When attempting to process videos with a higher frame count, I experience memory overflow.

To work around this, I was considering processing the video in segments. However, I am concerned that independently processing different segments could lead to misalignment between the estimated outputs across segments.

To mitigate this, I am wondering if there is a way to manually specify and fix the camera intrinsics (for example, using camera intrinsics obtained from COLMAP) across different segments. This would help ensure consistency and alignment between the outputs for the entire video, regardless of segment size.

Could you provide any guidance on how to implement this or whether this feature is supported?

Thank you for your time, and I appreciate any insights you can offer.

@Junyi42
Copy link
Owner

Junyi42 commented Oct 23, 2024

Hi @Rosetta-Leong,

Thanks for your interest in our work!

Yes, you could preset camera intrisics (focal) for the optimization (here is a reference for setting the focals).

However, by connecting different segments of prediction, there could still be issues of error accumulation. I would suggest to implement a window-wise optimization here instead of current implementation that batches all pairs together.

Hope this helps!

Best.

@npmhung
Copy link

npmhung commented Dec 12, 2024

@Junyi42
First, thank you for your excellent work!

I have a question regarding the implementation of window-wise optimization.

For instance, consider two windows, each containing the following video frames: [1–5] and [3–7] (inclusive). After optimizing all the depth maps and camera poses for frames 1–5 in the first window, should we freeze the parameters for frames 3–5 when optimizing the graph in the second window? Otherwise, I’m concerned that the parameters for these overlapping frames might drift, potentially resulting in an inconsistent global view.

@YunjieYu
Copy link

I'd like to know if anyone has implemented window-wise optimization? If it's done, should we freeze the parameters for overlapping frames when optimizing the graph in the second window?

@YunjieYu
Copy link

YunjieYu commented Mar 5, 2025

@npmhung @Rosetta-Leong Hi, everyone, I just submitted a merge request for a window-wise optimization. Waiting for the author to review #72. Now one can directly optimize a long video with a larger number of frames, and obtain expected results. Please enjoy these changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants