Incorrect result for converted FP16 model with Conv Op when run on arm64 Linux with onnxruntime >= 1.15.0 #18992

jasonkit · 2024-01-03T15:24:56Z

Describe the issue

An onnx model which are exported from PyTorch with nn.Conv2 and converted to FP16 are not giving correct result during inference.

This issue is not observed on the original exported FP32 onnx model
This issue also not observed on onnxruntime 1.13 or .1.14. I first observe it on onnxruntime >= 1.15.0
Also this issue is only observed on arm64 linux (actually I observe this issue on docker running on M1 macOS).
It works fine on macOS with M1 CPU, or Linux with intel CPU.

To reproduce

On arm64 Linux (or using python:3.10-bullseye docker image),
run following code with onnxruntime >= 1.15.0

import torch
from torch import nn

import onnx
from onnxconverter_common import float16
import onnxruntime as ort
import numpy as np


class ModelUnderTest(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Conv2d(1, 1, 1)
        nn.init.constant_(self.model.weight.data, 0.5)
        if self.model.bias is not None:
            # It works fine for this test case if bias is initialised to 0
            nn.init.constant_(self.model.bias.data, 0.5)

    def forward(self, x):
        return self.model(x)


if __name__ == "__main__":
    m = ModelUnderTest()
    x = torch.ones(1, 1, 1)
    torch.onnx.export(m, x, "m1.onnx", export_params=True)

    model = onnx.load("m1.onnx")
    m_16 = float16.convert_float_to_float16(
        model,
        keep_io_types=True,
        # It works fine if we block Conv Op
        # op_block_list=float16.DEFAULT_OP_BLOCK_LIST + ["Conv"],
    )
    onnx.save(m_16, "m1_fp16.onnx")

    # ---

    session_option = ort.SessionOptions()
    session_option.log_severity_level = 3
    session_option.enable_cpu_mem_arena = False
    session_option.enable_mem_pattern = False
    session_option.enable_mem_reuse = False

    x = np.ones((1, 1, 1))
    session_fp32 = ort.InferenceSession("m1.onnx", session_option)
    y1 = session_fp32.run(None, {"input": x.astype(np.float32)})[0]
    print("fp32 output")
    print(y1)
    session_fp16 = ort.InferenceSession("m1_fp16.onnx", session_option)
    y2 = session_fp16.run(None, {"input": x.astype(np.float32)})[0]
    print("fp16 output")
    print(y2)

    y_diff = y1 - y2
    y_diff_2 = y_diff * y_diff
    print("SSD")
    print(np.sum(y_diff_2))

It prints

fp32 output
[[[1.]]]
fp16 output
[[[0.5]]]
SSD
0.25

However, the expected output should be

fp32 output
[[[1.]]]
fp16 output
[[[1.]]]
SSD
0.0

It gives the correct output when downgrade onnxruntime to 1.14.1

Urgency

This seems to be a degrade on onnxruntime as it works before 1.15.0.
I can workaround the issue by adding Conv to op_block_list when converting the model to fp16.

Platform

Linux

OS Version

Debian Bullseye

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

>= 1.15.0

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

wschin · 2024-01-03T17:50:07Z

This problem has been fixed in latest main branch. Please install nightly version from this page.

wschin · 2024-01-03T17:51:10Z

Close for now. Feel free to re-open. Thanks.

jasonkit · 2024-01-04T02:53:05Z

@wschin

I have tried my code snippet above with ort-nightly==1.17.0.dev20240103001
However I am still getting the same incorrect output.

Following are the installed python package version

root@a68dcd6fb452:/app# pip freeze
coloredlogs==15.0.1
filelock==3.13.1
flatbuffers==23.5.26
fsspec==2023.12.2
humanfriendly==10.0
Jinja2==3.1.2
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.3
onnx==1.15.0
onnxconverter-common==1.14.0
ort-nightly==1.17.0.dev20240103001
packaging==23.2
protobuf==3.20.2
sympy==1.12
torch==2.1.2
typing_extensions==4.9.0

jasonkit · 2024-01-04T03:04:49Z

@wschin

From my observation, It looks like when running a fp16 model on arm64 linux with onnxruntime >= 1.15.0 (even with nightly build), the bias of the Conv Op is get ignored.

If I exported the model that setting 0 to Conv2's bias, the computation result of fp16 model will match the fp32 one.

wschin · 2024-01-04T23:31:12Z

My pip freeze output:

astunparse==1.6.3
attrs==23.2.0
black==23.10.1
Cerberus==1.3.5
certifi==2023.11.17
charset-normalizer==3.3.2
clang-format==17.0.4
click==8.1.7
expecttest==0.2.1
filelock==3.13.1
flatbuffers==23.5.26
fsspec==2023.12.2
h5py==3.10.0
hypothesis==6.92.2
idna==3.6
isort==5.12.0
Jinja2==3.1.2
lintrunner==0.11.0
lintrunner-adapters==0.12.1
MarkupSafe==2.1.3
mpi4py @ file:///work/ci_py311/mpi4py_1676858691457/work
mpmath==1.3.0
mypy-extensions==1.0.0
networkx==3.2.1
numpy==1.26.2
onnx==1.15.0
onnxconverter-common==1.14.0
onnxruntime-training @ Debug/dist/onnxruntime_training-1.17.0%2Bcu121-cp311-cp311-linux_x86_64.whl
onnxscript==0.1.0.dev20240103
optree==0.10.0
packaging==23.2
pathspec==0.12.1
platformdirs==4.1.0
protobuf==3.20.2
psutil==5.9.7
PyYAML==6.0.1
requests==2.31.0
ruff==0.1.4
six==1.16.0
sortedcontainers==2.4.0
sympy==1.12
-e git+https://github.com/pytorch/pytorch.git@b18d8d4595aa6e0768eedd5fc7d4a4402c567181#egg=torch
types-dataclasses==0.6.6
typing_extensions==4.9.0
urllib3==2.1.0

FYI: I built PyTorch and ORT locally with a commit two days ago.

jasonkit · 2024-01-05T10:00:19Z

@wschin

Just want to confirm, your testing environment is on arm64/aarch64 Linux? As there is no issue on Intel CPU.

My uname -a is Linux 78b33872b873 6.4.16-linuxkit #1 SMP PREEMPT Thu Nov 16 10:49:20 UTC 2023 aarch64 GNU/Linux

Actually I am testing in docker, you should be able to reproduce my environment by using docker

docker run --platform linux/arm64 --rm -it python:3.10-bullseye bash

and following requirements.txt

--extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/
coloredlogs==15.0.1
filelock==3.13.1
flatbuffers==23.5.26
fsspec==2023.12.2
humanfriendly==10.0
Jinja2==3.1.2
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.3
onnx==1.15.0
onnxconverter-common==1.14.0
packaging==23.2
protobuf==3.20.2
sympy==1.12
torch==2.1.2
typing_extensions==4.9.0
ort-nightly==1.17.0.dev20240103001

yihonglyu · 2024-07-02T03:45:42Z

@jasonkit, I tried to reproduce your issue on Windows 11 ARM64 but cannot reproduce it. Here is my package versions:

(1.15.0) >conda list
# packages in environment at C:\Users\yilyu\.conda\envs\1.15.0:
#
# Name                    Version                   Build  Channel
black                     24.4.2                   pypi_0    pypi
ca-certificates           2024.3.11            haa95532_0
cerberus                  1.3.5                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
coloredlogs               15.0.1                   pypi_0    pypi
coverage                  7.5.4                    pypi_0    pypi
exceptiongroup            1.2.1                    pypi_0    pypi
filelock                  3.15.4                   pypi_0    pypi
flatbuffers               24.3.25                  pypi_0    pypi
fsspec                    2024.6.1                 pypi_0    pypi
humanfriendly             10.0                     pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
intel-openmp              2021.4.0                 pypi_0    pypi
isort                     5.13.2                   pypi_0    pypi
jinja2                    3.1.4                    pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
libffi                    3.4.4                hd77b12b_1
markupsafe                2.1.5                    pypi_0    pypi
mkl                       2021.4.0                 pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
mypy-extensions           1.0.0                    pypi_0    pypi
networkx                  3.1                      pypi_0    pypi
numpy                     1.24.4                   pypi_0    pypi
onnx                      1.16.1                   pypi_0    pypi
onnxconverter-common      1.14.0                   pypi_0    pypi
onnxmltools               1.12.0                   pypi_0    pypi
onnxruntime               1.15.0                   pypi_0    pypi
openssl                   3.0.14               h827c3e9_0
packaging                 24.1                     pypi_0    pypi
pandas                    2.0.3                    pypi_0    pypi
parameterized             0.9.0                    pypi_0    pypi
pathspec                  0.12.1                   pypi_0    pypi
pip                       24.0             py38haa95532_0
platformdirs              4.2.2                    pypi_0    pypi
pluggy                    1.5.0                    pypi_0    pypi
protobuf                  3.20.2                   pypi_0    pypi
pydocstyle                6.3.0                    pypi_0    pypi
pyreadline3               3.4.1                    pypi_0    pypi
pytest                    8.2.2                    pypi_0    pypi
pytest-cov                5.0.0                    pypi_0    pypi
python                    3.8.19               h1aa4202_0
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
scikit-learn              1.3.2                    pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
setuptools                69.5.1           py38haa95532_0
six                       1.16.0                   pypi_0    pypi
snowballstemmer           2.2.0                    pypi_0    pypi
sqlite                    3.45.3               h2bbff1b_0
sympy                     1.12.1                   pypi_0    pypi
tbb                       2021.13.0                pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
tomli                     2.0.1                    pypi_0    pypi
torch                     2.3.1                    pypi_0    pypi
typing-extensions         4.12.2                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
vc                        14.2                 h2eaa2aa_4
vs2015_runtime            14.29.30133          h43f2093_4
wheel                     0.43.0           py38haa95532_0

Could you reproduce the issue on Windows 11 ARM64?

jasonkit · 2024-07-09T09:00:50Z

@yihonglyu

Sorry that, I don't have access to Window 11 ARM64 machine.

Actually my reported issue is happened on Linux ARM64, not on Window 11.

As I mentioned in the issue description

It works fine on macOS with M1 CPU, or Linux with intel CPU.

I suspect that the same issue might not appear on Window 11 ARM64 machine, and might only reproducible on Linux Arm64 machine.

The environment I used to reproduce the issue is mentioned in #18992 (comment)
You may try to reproduce that with the specified environment.

I have just re-run the test with following package

torch==2.3.1
numpy==1.26.4
onnx==1.16.1
onnxconverter-common==1.14.0
onnxruntime==1.18.1

And the issue still exists.

If you don't have access to docker, and you Window 11 ARM64, could you try reproduce the issue on WSL?

- Improved accuracy for face-detection, image-classification, and object-detection in the GeekBench ML benchmark on ARM64. - Fixed issue #18992

yihonglyu · 2024-07-30T10:50:35Z

@jasonkit This issue has been resolved.

jasonkit · 2024-08-01T04:27:25Z

@yihonglyu

Confirmed this issue has resolve in latest build, thanks!

wschin added the quantization issues related to quantization label Jan 3, 2024

wschin closed this as completed Jan 3, 2024

jasonkit changed the title ~~Incorrect result for converted FP16 model with Conv Op whenrun on arm64 Linux with onnxruntime >= 1.15.0~~ Incorrect result for converted FP16 model with Conv Op when run on arm64 Linux with onnxruntime >= 1.15.0 Jan 4, 2024

wschin reopened this Jan 4, 2024

wschin added ep:ArmNN issues related to Arm NN execution provider and removed quantization issues related to quantization labels Jan 5, 2024

yufenglee assigned yihonglyu Jan 9, 2024

yihonglyu mentioned this issue Jul 30, 2024

Enable FP16 Clip and Handle Bias in FP16 Depthwise Conv #21493

Merged

yihonglyu added a commit that referenced this issue Jul 30, 2024

Enable FP16 Clip and Handle Bias in FP16 Depthwise Conv (#21493)

530a2d7

- Improved accuracy for face-detection, image-classification, and object-detection in the GeekBench ML benchmark on ARM64. - Fixed issue #18992

jasonkit closed this as completed Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect result for converted FP16 model with Conv Op when run on arm64 Linux with onnxruntime >= 1.15.0 #18992

Incorrect result for converted FP16 model with Conv Op when run on arm64 Linux with onnxruntime >= 1.15.0 #18992

jasonkit commented Jan 3, 2024 •

edited

Loading

wschin commented Jan 3, 2024

wschin commented Jan 3, 2024

jasonkit commented Jan 4, 2024

jasonkit commented Jan 4, 2024

wschin commented Jan 4, 2024

jasonkit commented Jan 5, 2024

yihonglyu commented Jul 2, 2024 •

edited

Loading

jasonkit commented Jul 9, 2024

yihonglyu commented Jul 30, 2024

jasonkit commented Aug 1, 2024

Incorrect result for converted FP16 model with Conv Op when run on arm64 Linux with onnxruntime >= 1.15.0 #18992

Incorrect result for converted FP16 model with Conv Op when run on arm64 Linux with onnxruntime >= 1.15.0 #18992

Comments

jasonkit commented Jan 3, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

wschin commented Jan 3, 2024

wschin commented Jan 3, 2024

jasonkit commented Jan 4, 2024

jasonkit commented Jan 4, 2024

wschin commented Jan 4, 2024

jasonkit commented Jan 5, 2024

yihonglyu commented Jul 2, 2024 • edited Loading

jasonkit commented Jul 9, 2024

yihonglyu commented Jul 30, 2024

jasonkit commented Aug 1, 2024

jasonkit commented Jan 3, 2024 •

edited

Loading

yihonglyu commented Jul 2, 2024 •

edited

Loading