Update IREE onnx import to be in sync with Torch-MLIR #17476

saienduri · 2024-05-23T11:30:01Z

This commit updates iree-import-onnx so that it behaves the same as torch-mlir's version (https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py).
Specifically, enabling the data propagation improves shape inference and is leading to more models passing.

Related to #17021

Signed-off-by: saienduri <saimanas.enduri@amd.com>

ScottTodd · 2024-05-23T15:53:50Z

Please tag issues related to PRs in the PR description (Reconcile PyTorch pass pipelines with those upstream in torch-mlir (if needed) #17021 here)
Thanks for following the users/ branch naming, but we still prefer for most developers to work in forks unless some extra access to the workflows in the main repo are needed

ScottTodd · 2024-05-23T16:03:04Z

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py

+    # Make a temp dir for all the temp files we'll be generating as a side
+    # effect of infering shapes.  For now, the only file is a new .onnx holding
+    # the revised model with shapes.
+    #
+    # TODO: If the program temp_dir is None, we should be using an ephemeral
+    # temp directory instead of a hard-coded path in order to avoid data races
+    # by default.
+    input_dir = os.path.dirname(os.path.abspath(args.input_file))
+    temp_dir = (
+        Path(input_dir if args.temp_dir is None else args.temp_dir)
+        / "onnx-importer-temp"
+    )
+    shutil.rmtree(temp_dir, ignore_errors=True)
+    temp_dir.mkdir(exist_ok=True)


This is sketchy code to be copying as-is. If the --temp-dir path is not specified, this looks like it will rm -rf the directory that the input file is in??

I don't see why https://docs.python.org/3/library/tempfile.html wouldn't work here? No need to have the user create their own temp dir and pass it in via a flag, and then no risk of deleting an existing directory.

Since this is a user tool for IREE, I also don't see too much value in adding options around configurable temp paths that would more facilitate torch-mlir debugging.

If the user does not specify --temp-dir, I think it is creating an onnx-importer-temp directory in the directory that the input file is in

Ah. I'd prefer to use tempfile. That's what the TODO there is suggesting anyways. It should be less code :P

I don't think tempfile works here because our one temporary file is being generated here: onnx.shape_inference.infer_shapes_path(args.input_file, <path_to_output_file>, data_prop=args.data_prop). So, we have to pass a path where it will be stored and can't change the file being generated by onnx to a tempfile. But, if we don't want to deal with temp_dir in iree, I can just remove that arg and just make it create a file with a certain path in the input dir and delete it at the end if the arg to remove temp files is true so we don't have to deal with any directories being deleted. What do you think?

won't this work?

# create a temporary directory using the context manager with tempfile.TemporaryDirectory() as tmpdirname: print('created temporary directory', tmpdirname)

Yeah temp directory would work. But, at the moment, it's only one file. Do we want to create a directory for it, or just delete the file created by the onnx call at the end. Either way works

Keep it simple - use standard library functions with predictable behavior (create a file or directory with tempfile, use that for scratch space). I don't want to risk a bug like https://www.theregister.com/2015/01/17/scary_code_of_the_week_steam_cleans_linux_pcs/ :P

Got it, went with the tempfile.TemporaryDirectory

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py

ScottTodd · 2024-05-23T16:16:59Z

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py

+    parser.add_argument(
+        "--data-dir",
+        help="Path between CWD and the base directory of the data,"
+        " excluding the directories given in the 'location' argument of "
+        " convert_model_to_external_data.  For example, if 'location' was"
+        ' "data/data.bin" and the relative path from CWD to that .bin file is'
+        ' a/b/data/data.bin, then set data-dir to "a/b".'
+        " Defaults to the directory of the input file.",
+        type=Path,
+    )


This docstring is confusing. Why not just take a regular path?

https://onnx.ai/onnx/api/external_data_helper.html#convert-model-to-external-data takes a relative path, sure, but that's for creating external data files. All we need to do here is load from an external path, which has no requirements on being in the same directory tree as the base model file: https://onnx.ai/onnx/api/external_data_helper.html#load-external-data-for-model

Yeah, I don't see anything that suggests this has to be a relative path either. made the change

Signed-off-by: saienduri <saimanas.enduri@amd.com>

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py

Signed-off-by: saienduri <saimanas.enduri@amd.com>

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py

ScottTodd · 2024-05-23T21:00:44Z

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py

+        # Load the temp file and the external data.
+        inferred_model = onnx.load(temp_inferred_file, load_external_data=False)
+        data_dir = Path(input_dir if args.temp_dir is None else args.data_dir)
+        onnx.load_external_data_for_model(inferred_model, data_dir)


This data_dir code looks wrong. The temp_dir arg is no longer used, so what is this actually trying to do?

I'm not really following the logic here. New tests cases would help: https://github.com/iree-org/iree/blob/main/compiler/bindings/python/test/tools/import_onnx_test.py.

(note that those aren't running on CI: #16624)

I think that this is a typo from whoever wrote the torch-mlir importer. They probably meant args.data_dir. I don't see how these two could be related

Ah I see, can add CI for this in a follow up PR. I've been locally testing that it works. Also, for all e2eshark testing we don't use the shape inference with a file. Might have to figure out which cases require this. Might get more models passing

Yeah no need to get the CI running as part of this. I'm just hesitant to accept clearly buggy code from upstream blindly :P. If the shape inference with a file path hasn't been exercised yet, we could also remove it until needed... smaller surface area that way.

From the other place data_dir is used, looks like no relation to temp_dir

Okay, will just remove the shape inference with a file path then. I haven't seen it used anywhere

Or actually, I'll leave it in case it is required somewhere. Logic seems to be correct from what I can tell, but I'll try to get some tests for all these cases.

Signed-off-by: saienduri <saimanas.enduri@amd.com>

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py

Signed-off-by: saienduri <saimanas.enduri@amd.com>

ScottTodd · 2024-05-28T17:35:21Z

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py

+    # Run the checker to test whether the file is above the threshold for
+    # in-memory shape inference.  If not, go ahead and do the shape inference.
+    try:
+        onnx.checker.check_model(raw_model)
+        inferred_model = onnx.shape_inference.infer_shapes(
+            raw_model, data_prop=args.data_prop
+        )
+        return inferred_model
+    except ValueError:
+        pass


I saw 159 new test import failures on Windows after pulling this code in.

D:\dev\projects\SHARK-TestSuite\iree_tests (main -> upstream) (nightly_pip.venv) λ iree-import-onnx D:\dev\projects\SHARK-TestSuite\third_party\onnx\onnx\backend\test\data\node\test_cast_DOUBLE_to_FLOAT\model.onnx -o D:\dev\projects\iree-tmp\double_to_float.mlir Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Scripts\iree-import-onnx.exe\__main__.py", line 7, in <module> File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Lib\site-packages\iree\compiler\tools\import_onnx\__main__.py", line 140, in _cli_main sys.exit(main(parse_arguments())) ^^^^^^^^^^^^^^^^^^^^^^^ File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Lib\site-packages\iree\compiler\tools\import_onnx\__main__.py", line 44, in main model_proto = load_onnx_model(args) ^^^^^^^^^^^^^^^^^^^^^ File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Lib\site-packages\iree\compiler\tools\import_onnx\__main__.py", line 84, in load_onnx_model onnx.checker.check_model(raw_model) File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Lib\site-packages\onnx\checker.py", line 148, in check_model C.check_model(protobuf_string, full_check, skip_opset_compatibility_check) onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version 10 is higher than the checker's (9).

Updating my onnx Python package from 1.15.0 to 1.16.1 helped. We should update some of these or make this code tolerate different versions:

iree/compiler/setup.py

Line 470 in 6c75aa1

"onnx>=1.15.0",

iree/.github/workflows/ci.yml

Line 624 in 6c75aa1

pip install onnx>=1.15.0

With the newer onnx Python package, there are now 2 new import successes and 0 new import failures.

This pulls in iree-org/iree#17476 which now runs shape inference: `onnx.shape_inference.infer_shapes(model, data_prop=True)`. Generated from an already configured venv with: ``` python -m pip install --upgrade --find-links https://iree.dev/pip-release-links.html iree-compiler python -m pip install --upgrade onnx python .\onnx\import_tests.py ```

As discovered on #17476 (comment), this is needed to avoid an error running `iree-import-onnx` on certain test files: > onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version 10 is higher than the checker's (9). If we care about supporting older onnx package versions, we could add some fallback code to `iree-import-onnx`. We may also want to update the version used in torch-mlir? https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264

This commit updates `iree-import-onnx` so that it behaves the same as torch-mlir's version (https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py). Specifically, enabling the data propagation improves shape inference and is leading to more models passing. Related to iree-org#17021 --------- Signed-off-by: saienduri <saimanas.enduri@amd.com>

As discovered on iree-org#17476 (comment), this is needed to avoid an error running `iree-import-onnx` on certain test files: > onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version 10 is higher than the checker's (9). If we care about supporting older onnx package versions, we could add some fallback code to `iree-import-onnx`. We may also want to update the version used in torch-mlir? https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264

This commit updates `iree-import-onnx` so that it behaves the same as torch-mlir's version (https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py). Specifically, enabling the data propagation improves shape inference and is leading to more models passing. Related to iree-org#17021 --------- Signed-off-by: saienduri <saimanas.enduri@amd.com>

As discovered on iree-org#17476 (comment), this is needed to avoid an error running `iree-import-onnx` on certain test files: > onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version 10 is higher than the checker's (9). If we care about supporting older onnx package versions, we could add some fallback code to `iree-import-onnx`. We may also want to update the version used in torch-mlir? https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264

This commit updates `iree-import-onnx` so that it behaves the same as torch-mlir's version (https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py). Specifically, enabling the data propagation improves shape inference and is leading to more models passing. Related to iree-org#17021 --------- Signed-off-by: saienduri <saimanas.enduri@amd.com>

As discovered on iree-org#17476 (comment), this is needed to avoid an error running `iree-import-onnx` on certain test files: > onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version 10 is higher than the checker's (9). If we care about supporting older onnx package versions, we could add some fallback code to `iree-import-onnx`. We may also want to update the version used in torch-mlir? https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264

This pulls in iree-org/iree#17476 which now runs shape inference: `onnx.shape_inference.infer_shapes(model, data_prop=True)`. Generated from an already configured venv with: ``` python -m pip install --upgrade --find-links https://iree.dev/pip-release-links.html iree-compiler python -m pip install --upgrade onnx python .\onnx\import_tests.py ```

This commit updates `iree-import-onnx` so that it behaves the same as torch-mlir's version (https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py). Specifically, enabling the data propagation improves shape inference and is leading to more models passing. Related to iree-org#17021 --------- Signed-off-by: saienduri <saimanas.enduri@amd.com> Signed-off-by: Lubo Litchev <lubol@google.com>

As discovered on iree-org#17476 (comment), this is needed to avoid an error running `iree-import-onnx` on certain test files: > onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version 10 is higher than the checker's (9). If we care about supporting older onnx package versions, we could add some fallback code to `iree-import-onnx`. We may also want to update the version used in torch-mlir? https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264 Signed-off-by: Lubo Litchev <lubol@google.com>

saienduri added 2 commits May 23, 2024 04:23

update onnx import

713c060

Signed-off-by: saienduri <saimanas.enduri@amd.com>

change Torch-MLIR to IREE

1e1d7a1

Signed-off-by: saienduri <saimanas.enduri@amd.com>

saienduri requested review from stellaraccident and ScottTodd May 23, 2024 11:30

ScottTodd requested changes May 23, 2024

View reviewed changes

address comments

b5f4a26

Signed-off-by: saienduri <saimanas.enduri@amd.com>

saienduri requested a review from ScottTodd May 23, 2024 19:10

ScottTodd reviewed May 23, 2024

View reviewed changes

reorder and formatting

99c0157

Signed-off-by: saienduri <saimanas.enduri@amd.com>

saienduri requested a review from ScottTodd May 23, 2024 20:28

saienduri added 2 commits May 23, 2024 13:40

move comments

3929e13

Signed-off-by: saienduri <saimanas.enduri@amd.com>

python formatting

15f78af

Signed-off-by: saienduri <saimanas.enduri@amd.com>

ScottTodd reviewed May 23, 2024

View reviewed changes

remove temp_dir completely

be9d59a

Signed-off-by: saienduri <saimanas.enduri@amd.com>

saienduri requested a review from ScottTodd May 23, 2024 21:09

ScottTodd approved these changes May 23, 2024

View reviewed changes

compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py Outdated Show resolved Hide resolved

remove shutil

d69167e

Signed-off-by: saienduri <saimanas.enduri@amd.com>

saienduri enabled auto-merge (squash) May 23, 2024 21:26

saienduri merged commit abdf550 into main May 23, 2024
63 checks passed

saienduri deleted the users/saienduri/sync_iree-import-onnx branch May 23, 2024 21:48

ScottTodd reviewed May 28, 2024

View reviewed changes

This was referenced May 28, 2024

Import onnx tests with new iree-import-onnx on onnx 1.16.1. nod-ai/SHARK-TestSuite#240

Merged

Update onnx package version minimum to 1.16.0. #17504

Merged

ScottTodd added the integrations/onnx ONNX integration work label Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update IREE onnx import to be in sync with Torch-MLIR #17476

Update IREE onnx import to be in sync with Torch-MLIR #17476

saienduri commented May 23, 2024 •

edited

Loading

ScottTodd commented May 23, 2024

ScottTodd May 23, 2024

saienduri May 23, 2024

ScottTodd May 23, 2024

saienduri May 23, 2024 •

edited

Loading

ScottTodd May 23, 2024

saienduri May 23, 2024 •

edited

Loading

ScottTodd May 23, 2024

saienduri May 23, 2024

ScottTodd May 23, 2024

saienduri May 23, 2024

ScottTodd May 23, 2024

saienduri May 23, 2024 •

edited

Loading

saienduri May 23, 2024

ScottTodd May 23, 2024

saienduri May 23, 2024 •

edited

Loading

saienduri May 23, 2024

saienduri May 23, 2024

ScottTodd May 28, 2024

Update IREE onnx import to be in sync with Torch-MLIR #17476

Update IREE onnx import to be in sync with Torch-MLIR #17476

Conversation

saienduri commented May 23, 2024 • edited Loading

ScottTodd commented May 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri commented May 23, 2024 •

edited

Loading

saienduri May 23, 2024 •

edited

Loading

saienduri May 23, 2024 •

edited

Loading

saienduri May 23, 2024 •

edited

Loading

saienduri May 23, 2024 •

edited

Loading