Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLAKY] test_onnx_nodes[llvm-test_nllloss_NCd1d2d3_none_no_weight_negative_ii] #8918

Closed
masahi opened this issue Sep 3, 2021 · 5 comments · Fixed by #8971
Closed

[FLAKY] test_onnx_nodes[llvm-test_nllloss_NCd1d2d3_none_no_weight_negative_ii] #8918

masahi opened this issue Sep 3, 2021 · 5 comments · Fixed by #8971

Comments

@masahi
Copy link
Member

masahi commented Sep 3, 2021

This is causing issues on main:
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/1630/pipeline
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/1629/pipeline

It seems this test was introduced in #8872

@AndrewZhaoLuo

@AndrewZhaoLuo
Copy link
Contributor

#8919

@AndrewZhaoLuo
Copy link
Contributor

Thanks for bringing this to my attention, I'll dig a bit deeper into the reasons. But for now to unblock CI^

@AndrewZhaoLuo
Copy link
Contributor

This is quite a weird bug! It seems to result from multiplying elementwise two tensors which do not contain NaN values!

https://github.com/apache/tvm/blob/main/python/tvm/relay/frontend/onnx.py#L3541

This operation seems to be the cause. Interestingly enough loss and mask_tensor dont seem to contain NaNs but after multiplying them the result does! Specifically it occurs where the mask_tensor = 0 only (the intention is to set the indices where the mask_tensor = 0 to 0 in the loss tensor)

@AndrewZhaoLuo
Copy link
Contributor

AndrewZhaoLuo commented Sep 3, 2021

Even more interesting, the inputs are fixed and not random, so randomness must occur during execution, compilation, etc... The relay graph is fixed so it must be in something with the execution of the graph :0

@AndrewZhaoLuo
Copy link
Contributor

This appears to be because gather does not support negative indices correctly even though it says it does.

@masahi masahi linked a pull request Sep 14, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants