Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the sequence of reference images #48

Closed
gengrui1983 opened this issue Jan 8, 2019 · 2 comments
Closed

Question on the sequence of reference images #48

gengrui1983 opened this issue Jan 8, 2019 · 2 comments

Comments

@gengrui1983
Copy link

Hi Clement

Thanks for your prompt answer for my last question, but I have another one here which is not related with the code.

As mentioned in the paper and the code, the ref images are [t-n, t-n+1, .. ,t-1, t+1, ...t+n], but what if I use [t-n, .. t-1] only? What do you suppose the impact of the result? Thank you.

Cheers,
Rui

@ClementPinard
Copy link
Owner

Network can still converge, but the fact that in KITTI the camera is more or less always going forward will restrict your network from getting useful information.

By restricting the ref frame to anterior frames will make warping always do the same thing, which is zooming out. It's better than restricting to posterior frames, which feature a whole set of out of bounds pixels, but you will lack precision in the pixels closer to the center of the image, where the optical is very low because it's close to the focus of expansion.

A quick analysis of what translation between ref frame and target frame (excluding rotations) is the best can lead to believe :

  • forward translation is good for pixel close to the focus of expansion
  • backward translation (your case) is good for pixel close to the boundaries
  • lateral translation is good for every pixel, which is partly why monodepth is better than sfmlearner, but you then have to be careful with occlusions, which will be more proheminent with this kind of translation.

All in all, the best is probably a mix of everything, which is helped by the posterior/anterior mix that having the target in the center of the sequence can get.

You may be thinking of an inline learning, where posterior frames are not there yet, but even in that case, I think that it would be better to make target frame a bit back in time to be able to compare it to more recent frames, so that you will gain more translation heterogeneity.

@gengrui1983
Copy link
Author

Thank you Clement! ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants