-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Onnx] Fix NLL Loss tests #8971
[Onnx] Fix NLL Loss tests #8971
Conversation
@AndrewZhaoLuo Maybe we can remove this function? tvm/python/tvm/relay/frontend/onnx.py Lines 1510 to 1511 in cf439ec
|
This is ready for a full review now. |
Another topic of discussion is whether we should just allow negative indices to be always on. There is overhead but I believe it is very small. I did it with an indexmod operation instead of a branch and it increased runtime of gather by 3% across 100 iterations and a relatively small input size of [3, 3, 3] |
@AndrewZhaoLuo, can you try profiling that with a larger input? |
This failed the dynamic gather test in test_any |
Should be fixed now. In general it seems people are inconsistent with using kwargs in the codebase. |
include/tvm/topi/transform.h
Outdated
@@ -1242,6 +1244,8 @@ inline Tensor gather(const Tensor& data, int axis, const Tensor& indices, | |||
out_shape.push_back(indices->shape[i]); | |||
} | |||
|
|||
PrimExpr axis_size = data->shape[axis]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove it
include/tvm/topi/transform.h
Outdated
@@ -1252,12 +1256,13 @@ inline Tensor gather(const Tensor& data, int axis, const Tensor& indices, | |||
Array<PrimExpr> real_indices; | |||
for (size_t i = 0; i < ndim_i; ++i) { | |||
if (i == static_cast<size_t>(axis)) { | |||
real_indices.push_back(indices(indices_position)); | |||
PrimExpr index = indices(indices_position); | |||
real_indices.push_back(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this diff
src/te/tensor.cc
Outdated
for (size_t i = 0; i < shape.size(); i++) { | ||
PrimExpr new_index = if_then_else(indices[i] < make_const(indices[i]->dtype, 0), | ||
indices[i] + shape[i], indices[i]); | ||
indices.Set(i, new_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative indices handling is also done in
tvm/python/tvm/relay/op/transform.py
Lines 926 to 927 in d9fe672
begin = _make.where(begin < cast_like(const(0), begin), begin + ishape_slice, begin) | |
begin = _make.where(begin >= ishape_slice, ishape_slice, begin) |
tvm/include/tvm/topi/detail/strided_slice.h
Lines 45 to 48 in cbe3dca
int64_t end_range = stride < 0 ? extent - 1 : extent; | |
if (index < 0) { | |
index += extent; | |
} |
tvm/include/tvm/topi/detail/strided_slice.h
Line 105 in cbe3dca
PrimExpr b = begin[i] < 0 ? b_expr + idim : b_expr; |
I believe there are other cases like this spread across the code base. Maybe we should revisit all index-taking op and centralize negative indices handling. Generally I think people prefer not making a change down the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is a good point. I think pushing down the stack is the right choice personally since I expect the most basic indexing op to work with negative indices. Since all of the other operations will use these basic indexing ops we should therefore get these things for free. In our case, we add a flag to a basic indexing operation which turns on this features.
Otherwise we'll get a lot of copies of the same code everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that implementation-wise, this is more convenient. Since this is a fundamental data structure change, how about we open a separate PR for negative indexing support to te::Tensor
, to get opinions from more people?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a fair point. I'll refactor this to use normalize_gather_indices()
in the meantime and do as you say.
Ok folks, I've removed the controversial changes and did an alternate work around. PTAL when you have time. |
#9023 <-- discussion about making negative indices simpler |
@masahi can you land this one if you are OK? |
* main: (102 commits) Implementation of relay_to_tir target hook (apache#8423) [Onnx] Fix NLL Loss tests (apache#8971) [Bugfix] Fix other div zero errors also in rewrite_simplify (apache#8983) [ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019) [Hexagon] Disable `thread_local` on Hexagon (apache#9025) [Hexagon] Allow undefined symbols in libtvm_runtime.so on Hexagon (apache#9024) [Onnx] Add momentum (apache#9000) fix (apache#9021) [Community] @AndrewZhaoLuo -> Reviewer (apache#9020) [Hexagon] Implement model launcher (apache#8986) [Relay][Pass] Add ExtractOperators pass (apache#8996) [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration (apache#8808) [ONNX] Add Einsum converter (apache#8985) Add standalone_crt/ to be part of the wheel package, when available. (apache#9005) [Relay] Remove memory planing from LowerTEPass (apache#8974) [Hexagon] Treat floats as float32 when passing args to offloaded kernels (apache#9010) [Runtime] Pipeline Executor Initial patch. (apache#8702) [Hexagon] `llvm-options` attribute is an array of strings (apache#9011) disable cuda int8 schedule for non-cuda gpu target (apache#9014) [Torch] Add an option to make imported models compatible with the Relay text parser (apache#9015) ...
* support negatibve indices in gather * move check to Tensor level indexing, gathernd * add test, update transform.h * remove unneeded gather * missing gather nd change * update tests * proper tensor comparison * blacking * lint * fix error * turn on test * missing test case * revert changes * add normalize_gather_indices * undo change * update * more removing diffs * more undoing Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>
* support negatibve indices in gather * move check to Tensor level indexing, gathernd * add test, update transform.h * remove unneeded gather * missing gather nd change * update tests * proper tensor comparison * blacking * lint * fix error * turn on test * missing test case * revert changes * add normalize_gather_indices * undo change * update * more removing diffs * more undoing Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>
Supports negative indices for gather and gathernd op.
This was what caused issues #8918 and #8964. Finally, the onnx tests throw invalid indices at you but mask them with ignore_index. This was not accounted for and sometimes caused failures too.
This should fix the flaky onnx test and allow for the remaining non-expanded onnx tests
Testing:
Ran
pytest tests/python/frontend/onnx/test_forward.py::test_onnx_nodes -k test_nllloss_NCd1d2d3_none_no_weight_negative_ii --count=100
and confirmed no failuresRan
pytest tests/python/frontend/onnx/test_forward.py::test_onnx_nodes -k test_nllloss_NCd1d2d3_sum_weight_high_ii --count=100
and confirmed no failuresRan
pytest tests/python/frontend/onnx/test_forward.py::test_onnx_nodes -k test_gather --count=100
and confirmed no failures