Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve GH issue 12706 #12815

Merged
merged 5 commits into from
Sep 2, 2022
Merged

Resolve GH issue 12706 #12815

merged 5 commits into from
Sep 2, 2022

Conversation

hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Sep 1, 2022

Description:
Only the CPU Resize kernel handlles NHWC input. Adjust the Transpose optimizer's Resize handler accordingly. Without this, the CUDA Resize kernel runs the NCHW logic on NHWC input and produces garbage output.

Motivation and Context
Fix regression in 1.12 (Resolve #12706)
Relevant PR - #10824

@hariharans29 hariharans29 changed the title Resolve #12706 WIP: Resolve #12706 Sep 1, 2022
// Adjust this restriction once the other EPs' Resize
// kernel(s) supports NHWC input.
if (args.node.GetExecutionProviderType() != "CPUExecutionProvider") {
return false;
Copy link
Member Author

@hariharans29 hariharans29 Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some more thought as the TransposeOptimizer is a L1 optimizer and there is no partitioning info at this point. But, we do need to handle the CUDA EP case as the CUDA Resize kernel is implemented assuming the provided input is NCHW.
CC:@ yihonglyu. Any comment ? This causes a regression for CUDA EP users in 1.12. Ideally, a fix similar to the CPU Resize op should be applied to the CUDA kernel as well.

EDIT: Since the EP info can't be ascertained at runtime, the Resize handler has to be temporarily dropped from the handler map in CUDA builds until the CUDA Resize kernel gets a fix similar to the one the CPU Resize kernel got in #10824.

@hariharans29 hariharans29 changed the title WIP: Resolve #12706 Resolve GH issue 12706 Sep 1, 2022
// Per tests included in #10824, the ROCM EP also generates
// incorrect results when this handler is used, so the Resize
// handler is not enabled even for those builds.
#if !defined(USE_CUDA) && !defined(USE_ROCM)
Copy link
Contributor

@yihonglyu yihonglyu Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to give a warning or skip the tests instead of just comment out the tests for CUDA or ROCM?

Copy link
Member Author

@hariharans29 hariharans29 Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the resize handler is not part of such builds (CUDA or ROCM), we will have to skip it (can't even continue with warning). But I am curious - what value will it add instead of this approach ? Can you elaborate please ?

@@ -498,6 +509,7 @@
/*opset_version*/ 13);
}

#endif
TEST(TransposeOptimizerTests, TestAdd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: #endif // !defined(USE_CUDA) && !defined(USE_ROCM) would be nicer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will include the change in another PR to avoid running all the CIs again.

Copy link
Contributor

@skottmckay skottmckay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@hariharans29 hariharans29 merged commit 931c8b0 into main Sep 2, 2022
@hariharans29 hariharans29 deleted the hari/gh_issue branch September 2, 2022 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA onnxruntime-gpu=1.12.1 inference result differs from onnxruntime-gpu=1.11.0
4 participants