Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update IREE onnx import to be in sync with Torch-MLIR #17476

Merged
merged 8 commits into from
May 23, 2024

Conversation

saienduri
Copy link
Collaborator

@saienduri saienduri commented May 23, 2024

This commit updates iree-import-onnx so that it behaves the same as torch-mlir's version (https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py).
Specifically, enabling the data propagation improves shape inference and is leading to more models passing.

Related to #17021

Signed-off-by: saienduri <saimanas.enduri@amd.com>
Signed-off-by: saienduri <saimanas.enduri@amd.com>
@ScottTodd
Copy link
Member

Comment on lines 71 to 84
# Make a temp dir for all the temp files we'll be generating as a side
# effect of infering shapes. For now, the only file is a new .onnx holding
# the revised model with shapes.
#
# TODO: If the program temp_dir is None, we should be using an ephemeral
# temp directory instead of a hard-coded path in order to avoid data races
# by default.
input_dir = os.path.dirname(os.path.abspath(args.input_file))
temp_dir = (
Path(input_dir if args.temp_dir is None else args.temp_dir)
/ "onnx-importer-temp"
)
shutil.rmtree(temp_dir, ignore_errors=True)
temp_dir.mkdir(exist_ok=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sketchy code to be copying as-is. If the --temp-dir path is not specified, this looks like it will rm -rf the directory that the input file is in??

I don't see why https://docs.python.org/3/library/tempfile.html wouldn't work here? No need to have the user create their own temp dir and pass it in via a flag, and then no risk of deleting an existing directory.

Since this is a user tool for IREE, I also don't see too much value in adding options around configurable temp paths that would more facilitate torch-mlir debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user does not specify --temp-dir, I think it is creating an onnx-importer-temp directory in the directory that the input file is in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I'd prefer to use tempfile. That's what the TODO there is suggesting anyways. It should be less code :P

Copy link
Collaborator Author

@saienduri saienduri May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think tempfile works here because our one temporary file is being generated here: onnx.shape_inference.infer_shapes_path(args.input_file, <path_to_output_file>, data_prop=args.data_prop). So, we have to pass a path where it will be stored and can't change the file being generated by onnx to a tempfile. But, if we don't want to deal with temp_dir in iree, I can just remove that arg and just make it create a file with a certain path in the input dir and delete it at the end if the arg to remove temp files is true so we don't have to deal with any directories being deleted. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this work?

# create a temporary directory using the context manager
with tempfile.TemporaryDirectory() as tmpdirname:
    print('created temporary directory', tmpdirname)

Copy link
Collaborator Author

@saienduri saienduri May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah temp directory would work. But, at the moment, it's only one file. Do we want to create a directory for it, or just delete the file created by the onnx call at the end. Either way works

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it simple - use standard library functions with predictable behavior (create a file or directory with tempfile, use that for scratch space). I don't want to risk a bug like https://www.theregister.com/2015/01/17/scary_code_of_the_week_steam_cleans_linux_pcs/ :P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, went with the tempfile.TemporaryDirectory

Comment on lines 173 to 182
parser.add_argument(
"--data-dir",
help="Path between CWD and the base directory of the data,"
" excluding the directories given in the 'location' argument of "
" convert_model_to_external_data. For example, if 'location' was"
' "data/data.bin" and the relative path from CWD to that .bin file is'
' a/b/data/data.bin, then set data-dir to "a/b".'
" Defaults to the directory of the input file.",
type=Path,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is confusing. Why not just take a regular path?

https://onnx.ai/onnx/api/external_data_helper.html#convert-model-to-external-data takes a relative path, sure, but that's for creating external data files. All we need to do here is load from an external path, which has no requirements on being in the same directory tree as the base model file: https://onnx.ai/onnx/api/external_data_helper.html#load-external-data-for-model

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see anything that suggests this has to be a relative path either. made the change

Signed-off-by: saienduri <saimanas.enduri@amd.com>
@saienduri saienduri requested a review from ScottTodd May 23, 2024 19:10
Signed-off-by: saienduri <saimanas.enduri@amd.com>
@saienduri saienduri requested a review from ScottTodd May 23, 2024 20:28
Signed-off-by: saienduri <saimanas.enduri@amd.com>
Signed-off-by: saienduri <saimanas.enduri@amd.com>
Comment on lines 105 to 108
# Load the temp file and the external data.
inferred_model = onnx.load(temp_inferred_file, load_external_data=False)
data_dir = Path(input_dir if args.temp_dir is None else args.data_dir)
onnx.load_external_data_for_model(inferred_model, data_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data_dir code looks wrong. The temp_dir arg is no longer used, so what is this actually trying to do?

I'm not really following the logic here. New tests cases would help: https://github.com/iree-org/iree/blob/main/compiler/bindings/python/test/tools/import_onnx_test.py.

(note that those aren't running on CI: #16624)

Copy link
Collaborator Author

@saienduri saienduri May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a typo from whoever wrote the torch-mlir importer. They probably meant args.data_dir. I don't see how these two could be related

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, can add CI for this in a follow up PR. I've been locally testing that it works. Also, for all e2eshark testing we don't use the shape inference with a file. Might have to figure out which cases require this. Might get more models passing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah no need to get the CI running as part of this. I'm just hesitant to accept clearly buggy code from upstream blindly :P. If the shape inference with a file path hasn't been exercised yet, we could also remove it until needed... smaller surface area that way.

Copy link
Collaborator Author

@saienduri saienduri May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

From the other place data_dir is used, looks like no relation to temp_dir

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will just remove the shape inference with a file path then. I haven't seen it used anywhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or actually, I'll leave it in case it is required somewhere. Logic seems to be correct from what I can tell, but I'll try to get some tests for all these cases.

Signed-off-by: saienduri <saimanas.enduri@amd.com>
@saienduri saienduri requested a review from ScottTodd May 23, 2024 21:09
Signed-off-by: saienduri <saimanas.enduri@amd.com>
@saienduri saienduri enabled auto-merge (squash) May 23, 2024 21:26
@saienduri saienduri merged commit abdf550 into main May 23, 2024
63 checks passed
@saienduri saienduri deleted the users/saienduri/sync_iree-import-onnx branch May 23, 2024 21:48
Comment on lines +81 to +90
# Run the checker to test whether the file is above the threshold for
# in-memory shape inference. If not, go ahead and do the shape inference.
try:
onnx.checker.check_model(raw_model)
inferred_model = onnx.shape_inference.infer_shapes(
raw_model, data_prop=args.data_prop
)
return inferred_model
except ValueError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw 159 new test import failures on Windows after pulling this code in.

D:\dev\projects\SHARK-TestSuite\iree_tests (main -> upstream)
(nightly_pip.venv) λ iree-import-onnx D:\dev\projects\SHARK-TestSuite\third_party\onnx\onnx\backend\test\data\node\test_cast_DOUBLE_to_FLOAT\model.onnx -o D:\dev\projects\iree-tmp\double_to_float.mlir

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Scripts\iree-import-onnx.exe\__main__.py", line 7, in <module>
  File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Lib\site-packages\iree\compiler\tools\import_onnx\__main__.py", line 140, in _cli_main
    sys.exit(main(parse_arguments()))
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Lib\site-packages\iree\compiler\tools\import_onnx\__main__.py", line 44, in main
    model_proto = load_onnx_model(args)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Lib\site-packages\iree\compiler\tools\import_onnx\__main__.py", line 84, in load_onnx_model
    onnx.checker.check_model(raw_model)
  File "D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Lib\site-packages\onnx\checker.py", line 148, in check_model
    C.check_model(protobuf_string, full_check, skip_opset_compatibility_check)
onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version 10 is higher than the checker's (9).

Updating my onnx Python package from 1.15.0 to 1.16.1 helped. We should update some of these or make this code tolerate different versions:

"onnx>=1.15.0",
pip install onnx>=1.15.0

With the newer onnx Python package, there are now 2 new import successes and 0 new import failures.

ScottTodd added a commit to nod-ai/SHARK-TestSuite that referenced this pull request May 28, 2024
This pulls in iree-org/iree#17476 which now runs
shape inference: `onnx.shape_inference.infer_shapes(model,
data_prop=True)`.

Generated from an already configured venv with:
```
python -m pip install --upgrade --find-links https://iree.dev/pip-release-links.html iree-compiler
python -m pip install --upgrade onnx
python .\onnx\import_tests.py
```
ScottTodd added a commit that referenced this pull request May 28, 2024
As discovered on
#17476 (comment), this
is needed to avoid an error running `iree-import-onnx` on certain test
files:

> onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version
10 is higher than the checker's (9).

If we care about supporting older onnx package versions, we could add
some fallback code to `iree-import-onnx`.

We may also want to update the version used in torch-mlir?
https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264
gglangg pushed a commit to gglangg/iree that referenced this pull request Jun 4, 2024
This commit updates `iree-import-onnx` so that it behaves the same as
torch-mlir's version
(https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py).
Specifically, enabling the data propagation improves shape inference and
is leading to more models passing.

Related to iree-org#17021

---------

Signed-off-by: saienduri <saimanas.enduri@amd.com>
gglangg pushed a commit to gglangg/iree that referenced this pull request Jun 4, 2024
As discovered on
iree-org#17476 (comment), this
is needed to avoid an error running `iree-import-onnx` on certain test
files:

> onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version
10 is higher than the checker's (9).

If we care about supporting older onnx package versions, we could add
some fallback code to `iree-import-onnx`.

We may also want to update the version used in torch-mlir?
https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264
gglangg pushed a commit to gglangg/iree that referenced this pull request Jun 4, 2024
This commit updates `iree-import-onnx` so that it behaves the same as
torch-mlir's version
(https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py).
Specifically, enabling the data propagation improves shape inference and
is leading to more models passing.

Related to iree-org#17021

---------

Signed-off-by: saienduri <saimanas.enduri@amd.com>
gglangg pushed a commit to gglangg/iree that referenced this pull request Jun 4, 2024
As discovered on
iree-org#17476 (comment), this
is needed to avoid an error running `iree-import-onnx` on certain test
files:

> onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version
10 is higher than the checker's (9).

If we care about supporting older onnx package versions, we could add
some fallback code to `iree-import-onnx`.

We may also want to update the version used in torch-mlir?
https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Jun 5, 2024
This commit updates `iree-import-onnx` so that it behaves the same as
torch-mlir's version
(https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py).
Specifically, enabling the data propagation improves shape inference and
is leading to more models passing.

Related to iree-org#17021

---------

Signed-off-by: saienduri <saimanas.enduri@amd.com>
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Jun 5, 2024
As discovered on
iree-org#17476 (comment), this
is needed to avoid an error running `iree-import-onnx` on certain test
files:

> onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version
10 is higher than the checker's (9).

If we care about supporting older onnx package versions, we could add
some fallback code to `iree-import-onnx`.

We may also want to update the version used in torch-mlir?
https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264
renxida pushed a commit to nod-ai/SHARK-TestSuite that referenced this pull request Jul 18, 2024
This pulls in iree-org/iree#17476 which now runs
shape inference: `onnx.shape_inference.infer_shapes(model,
data_prop=True)`.

Generated from an already configured venv with:
```
python -m pip install --upgrade --find-links https://iree.dev/pip-release-links.html iree-compiler
python -m pip install --upgrade onnx
python .\onnx\import_tests.py
```
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
This commit updates `iree-import-onnx` so that it behaves the same as
torch-mlir's version
(https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/tools/import_onnx/__main__.py).
Specifically, enabling the data propagation improves shape inference and
is leading to more models passing.

Related to iree-org#17021

---------

Signed-off-by: saienduri <saimanas.enduri@amd.com>
Signed-off-by: Lubo Litchev <lubol@google.com>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
As discovered on
iree-org#17476 (comment), this
is needed to avoid an error running `iree-import-onnx` on certain test
files:

> onnx.onnx_cpp2py_export.checker.ValidationError: Your model ir_version
10 is higher than the checker's (9).

If we care about supporting older onnx package versions, we could add
some fallback code to `iree-import-onnx`.

We may also want to update the version used in torch-mlir?
https://github.com/llvm/torch-mlir/blob/e0a5adb1db38b0072c44b87570bc530eb3b324ad/setup.py#L264

Signed-off-by: Lubo Litchev <lubol@google.com>
@ScottTodd ScottTodd added the integrations/onnx ONNX integration work label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrations/onnx ONNX integration work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants