Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref_distance, ref_normal returning NANs #5

Closed
fasogbon opened this issue Nov 21, 2023 · 4 comments
Closed

ref_distance, ref_normal returning NANs #5

fasogbon opened this issue Nov 21, 2023 · 4 comments

Comments

@fasogbon
Copy link

fasogbon commented Nov 21, 2023

No description provided.

@gard-n
Copy link

gard-n commented Apr 17, 2024

Hello,

I am currently encountering the same issue.
I've traced it to the "OmnidataNormalPredictor," which returns a Tensor containing entirely NaN values (pano_joint_predictor.py, line 171):

pred_normals = self.normal_predictor.predict_normal(pers_imgs[i: i+1])

Additionally, I discovered that the output of "layer_4" is NaN for the vision_transformer (dpt_depth.py, line 70):

layer_1, layer_2, layer_3, layer_4 = forward_vit(self.pretrained, x)

Loading the checkpoint for normal prediction appears to be functioning correctly.

Could you please inform me of how this issue was resolved?

Best regards,
Niklas

@manurare
Copy link

manurare commented Jul 17, 2024

@fasogbon @gard-n I am having the same issue. Did you manage to solve it?

@gard-n
Copy link

gard-n commented Jul 23, 2024

@manurare

I could solve it by reconfiguring the backbone, as I observed the features are NaN in the deep layers of the Encoder.

In the file dpt_depth.py (line 42) I changed the hooks of the backbone from "vitb_rn50_384": [0, 1, 8, 11] to "vitb_rn50_384": [0, 1, 7, 8] to omit those deep layers. Also I set use_pretrained == True in the following call of _make_encoder but I can't remember if that was related to the problem. It's already a while ago. In the end, I was able to train the nerf and get results, but it is possible that the change affects the quality of the normal prediction.

@manurare
Copy link

@gard-n Thanks!

Sorry I forgot to update with my solution. Maybe it is no useful to you anymore but will write it here as it might help others.

I realised the NaNs came from using fused_attn defined in timm. It is activated by default. You can deactivate fuse_attn using the environment variable TIMM_FUSED_ATTN=0. This fixes NaNs for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants