-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect result for converted FP16 model with Conv Op when run on arm64 Linux with onnxruntime >= 1.15.0 #18992
Comments
This problem has been fixed in latest main branch. Please install nightly version from this page. |
Close for now. Feel free to re-open. Thanks. |
I have tried my code snippet above with Following are the installed python package version
|
From my observation, It looks like when running a fp16 model on arm64 linux with onnxruntime >= 1.15.0 (even with nightly build), the bias of the Conv Op is get ignored. If I exported the model that setting 0 to Conv2's bias, the computation result of fp16 model will match the fp32 one. |
My
FYI: I built PyTorch and ORT locally with a commit two days ago. |
Just want to confirm, your testing environment is on arm64/aarch64 Linux? As there is no issue on Intel CPU. My Actually I am testing in docker, you should be able to reproduce my environment by using docker
and following requirements.txt
|
@jasonkit, I tried to reproduce your issue on Windows 11 ARM64 but cannot reproduce it. Here is my package versions:
Could you reproduce the issue on Windows 11 ARM64? |
Sorry that, I don't have access to Window 11 ARM64 machine. Actually my reported issue is happened on Linux ARM64, not on Window 11. As I mentioned in the issue description
I suspect that the same issue might not appear on Window 11 ARM64 machine, and might only reproducible on Linux Arm64 machine. The environment I used to reproduce the issue is mentioned in #18992 (comment) I have just re-run the test with following package
And the issue still exists. If you don't have access to docker, and you Window 11 ARM64, could you try reproduce the issue on WSL? |
- Improved accuracy for face-detection, image-classification, and object-detection in the GeekBench ML benchmark on ARM64. - Fixed issue #18992
@jasonkit This issue has been resolved. |
Confirmed this issue has resolve in latest build, thanks! |
Describe the issue
An onnx model which are exported from PyTorch with nn.Conv2 and converted to FP16 are not giving correct result during inference.
This issue is not observed on the original exported FP32 onnx model
This issue also not observed on onnxruntime 1.13 or .1.14. I first observe it on onnxruntime >= 1.15.0
Also this issue is only observed on arm64 linux (actually I observe this issue on docker running on M1 macOS).
It works fine on macOS with M1 CPU, or Linux with intel CPU.
To reproduce
On arm64 Linux (or using
python:3.10-bullseye
docker image),run following code with onnxruntime >= 1.15.0
It prints
However, the expected output should be
It gives the correct output when downgrade onnxruntime to 1.14.1
Urgency
This seems to be a degrade on onnxruntime as it works before 1.15.0.
I can workaround the issue by adding
Conv
toop_block_list
when converting the model to fp16.Platform
Linux
OS Version
Debian Bullseye
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
>= 1.15.0
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: