Fixes for PyTorch/XLA functionalization integration #88787

wonjoolee95 · 2022-11-10T00:41:03Z

Picking up #88506

pytorch-bot · 2022-11-10T00:41:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88787

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 1f72367:

NEW FAILURES - The following jobs have failed:

lintrunner

This comment was automatically generated by Dr. CI and updates every 15 minutes.

alanwaketan · 2022-12-13T08:24:30Z

@wonjoolee95 You probably want to hide your debug hints while running the CI here. It makes the log too large and impossible to parse over the browser. Also, a rebase will be appreciated.

alanwaketan · 2022-12-13T09:46:33Z

I guess it's better to log those information with proper log level control such that you can still use it locally for debugging but it won't increase the test log size dramatically.

wonjoolee95 · 2022-12-14T22:43:14Z

Hmm, seems like some functorch tests are still failing even after applying the diff generated by EXPECTTEST_ACCEPT=1. I'll try it one more time. If they're still failing, I'll fix the rest manually.

alanwaketan · 2022-12-14T22:45:34Z

It looks like those machines are with gpu devices. Do you have a gpu env? I guess you need to run those tests on a gpu env. Otherwise, the tests will be skipped.

wonjoolee95 · 2022-12-14T22:50:17Z

Please correct me if I'm wrong, seems like only the first two failing tests are gpu devices? I'll wait for the rest of the CI to complete and then first all the non-gpu tests first.

wonjoolee95 · 2022-12-15T00:31:36Z

With the latest commit, all the TestAOTAutograd tests should succeed:

(base) jenkins@26d7adccbc26:/workspace/pytorch$ python test/functorch/test_aotdispatch.py TestAOTAutograd
/opt/conda/lib/python3.7/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libc10_cuda.so: cannot open shared object file: No such file or directory
  warn(f"Failed to load image Python extension: {e}")
ss2022-12-15 00:26:27.721580: W 2153590 tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-12-15 00:26:27.721700: W 2153590 tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
./opt/conda/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py:919: UserWarning: Your compiler for AOTAutograd is returning a a function that doesn't take boxed arguments. Please wrap it with functorch.compile.make_boxed_func or handle the boxed arguments yourself. See https://github.com/pytorch/pytorch/pull/83137#issuecomment-1211320670 for rationale.
  "Your compiler for AOTAutograd is returning a a function that doesn't take boxed arguments. "
...test/functorch/test_aotdispatch.py:241: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /workspace/pytorch/build/aten/src/ATen/core/TensorBody.h:485.)
  grads = [inp.grad for inp in pytree.tree_flatten(inps)[0]]
...................................s...../opt/conda/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py:919: UserWarning: Your compiler for AOTAutograd is returning a a function that doesn't take boxed arguments. Please wrap it with functorch.compile.make_boxed_func or handle the boxed arguments yourself. See https://github.com/pytorch/pytorch/pull/83137#issuecomment-1211320670 for rationale.
  "Your compiler for AOTAutograd is returning a a function that doesn't take boxed arguments. "
..../opt/conda/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py:919: UserWarning: Your compiler for AOTAutograd is returning a a function that doesn't take boxed arguments. Please wrap it with functorch.compile.make_boxed_func or handle the boxed arguments yourself. See https://github.com/pytorch/pytorch/pull/83137#issuecomment-1211320670 for rationale.
  "Your compiler for AOTAutograd is returning a a function that doesn't take boxed arguments. "
...
----------------------------------------------------------------------
Ran 54 tests in 3.871s

OK (skipped=3)
(base) jenkins@26d7adccbc26:/workspace/pytorch$

I'm still unsure about the previous GPU failure that failed with:

/var/lib/jenkins/multipy/multipy/runtime/../../multipy/runtime/interpreter/builtin_registry.h:31:10: fatal error: gtest/gtest_prod.h: No such file or directory

But I'll let the CI to run one more time before spending more time on the GPU test.

alanwaketan · 2022-12-15T01:48:54Z

That failure is very likely unrelated.

alanwaketan · 2022-12-15T08:48:27Z

According to hud, the deploy failure shouldn't be related.

alanwaketan · 2022-12-21T00:44:43Z

torchgen/gen_functionalization_type.py

@@ -347,7 +347,6 @@ def emit_view_functionalization_body(
        }}
      );
      auto compute_reference_meta =
-        {view_tensor_name}.key_set().has_backend(c10::BackendComponent::XLABit) ||


@bdhirsh Is this guard safe to remove? With it, I crashed on xla symbolic expand:

root@t1v-n-307ffe96-w-0:/workspaces/work/pytorch/xla# PJRT_DEVICE=CPU python test/test_dynamic_shapes.py -v TestDynamicShapes.test_simple_expand test_simple_expand (__main__.TestDynamicShapes) ... ERROR ====================================================================== ERROR: test_simple_expand (__main__.TestDynamicShapes) ---------------------------------------------------------------------- Traceback (most recent call last): File "test/test_dynamic_shapes.py", line 18, in test_simple_expand t5.expand(t2.size(0)) RuntimeError: /workspaces/work/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:2109: SymIntArrayRef expected to contain only concrete integers ---------------------------------------------------------------------- Ran 1 test in 0.022s FAILED (errors=1)

Okay, it regresses. It's not safe.

@alanwaketan so, the idea motivating this bit if code is the following:

pytorch/XLA doesn't care about strides, so when comparing a pytorch program when run on CUDA vs XLA, the user will witness different strides on the tensors throughout their program

functionalization gives XLA the ability to fix that problem; XLA can choose to not care about the value of strides in all of its kernels, but functionalization can run the meta function for every ATen, to properly set the strides

One question here is - do you think that's a benefit worth trying to capture for pytorch/XLA (stride correctness, for the user's perspective, for XLA tensors)? I'd be interested in @JackCaoG 's opinion.

Our options are either to:
(1) kill that code, and not bother trying to get correct strides
(2) make it more robust so it works on this test

In this test, it looks like you're using dynamic shapes, and the meta function we're calling doesn't play well with dynamic shapes. The way that the dynamic shapes workstream in core has been handling this is that we have python implementations / decompositons of a bunch of our ops, that we want to run when dynamic shapes are enabled. And it looks like... for some reason we aren't calling that python impl, and are instead calling the C++ one?

There's probably a better way to arrange for this to work with XLA, but one option option is enable the python dispatcher in your test, which should override a bunch of C++ meta kernels with their python equivalents:

from torch._dispatch.python import enable_python_dispatcher with enable_python_dispatcher(): test()

Here is the follow up on the xla side: pytorch/xla#4448.

Looks like by enabling python dispatcher and implementing missing sym size ops can workaround this. But then it brings a bigger question whether we should enable python dispatcher for dynamic shapes or not.

wonjoolee95 · 2023-01-04T22:35:54Z

Rebased this and the XLA POC PR with master. The aodispatch tests here will fail, as the master branch had some changes as well. I'll follow-up later to fix those.

wonjoolee95 · 2023-01-25T19:26:54Z

Bunch of dynamo related tests failing now with messages:

 Error: trace_fork_wait_inline (jit.test_async.TestAsync) ... [2023-01-25 03:37:54,841] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT <graph break in test_trace_fork_wait_inline> /var/lib/jenkins/workspace/test/jit/test_async.py line 416

The last commit does touch dynamo related code but the CI used to be green even with that commit. While I try to reproduce this locally, let me also try to rebase from master and re-run this CI.

alanwaketan · 2023-01-25T19:29:00Z

Bunch of dynamo related tests failing now with messages:
 Error: trace_fork_wait_inline (jit.test_async.TestAsync) ... [2023-01-25 03:37:54,841] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT <graph break in test_trace_fork_wait_inline> /var/lib/jenkins/workspace/test/jit/test_async.py line 416 
The last commit does touch dynamo related code but the CI used to be green even with that commit. While I try to reproduce this locally, let me also try to rebase from master and re-run this CI.

We can always use hud.pytorch.org to determine if the test is broken in tip of the tree.

wonjoolee95 · 2023-01-25T19:50:58Z

Thanks for the info, Jiewen. Looks like master also was seeing the same issue for a while:

But seems like a recent commit fixed it, so this PR's CI should be good now. I'll let the CI verify.

This reverts commit 1e66139201e493235fd4dadbe454d36eaa569aa7.

… exception

alanwaketan · 2023-02-08T21:02:27Z

aten/src/ATen/native/native_functions.yaml

@@ -14580,3 +14580,6 @@
  dispatch:
    CUDA: _fused_adamw_kernel_cuda_
  autogen: _fused_adamw, _fused_adamw.out
+
+- func: _propagate_xla_data(Tensor input, Tensor output) -> ()


@bdhirsh I have added the op you suggested in pytorch/xla#4505 (comment). Please review. You can just check the commit called: [Functionalization] Adds _propagate_xla_data.

wonjoolee95 · 2023-02-09T19:46:50Z

Moving to a new PR with a new branch -- #94537. Marking this one closed.

pytorchbot added the open source label Nov 10, 2022

wonjoolee95 mentioned this pull request Nov 10, 2022

Functionalization integration pytorch/xla#4158

Merged

wonjoolee95 force-pushed the functionalization branch 2 times, most recently from 612fc0a to df3bf42 Compare November 18, 2022 19:26

wonjoolee95 force-pushed the functionalization branch 2 times, most recently from cfc2922 to 86f8b11 Compare December 6, 2022 23:30

wonjoolee95 force-pushed the functionalization branch 2 times, most recently from bd0a882 to 2197973 Compare December 14, 2022 09:48

wonjoolee95 force-pushed the functionalization branch from f9804d3 to 048200a Compare December 15, 2022 00:30

wonjoolee95 force-pushed the functionalization branch 2 times, most recently from 18db0a9 to 14d131d Compare December 17, 2022 00:05

wonjoolee95 mentioned this pull request Dec 19, 2022

functionalization for XLA: fix cpu_tensor.copy_(xla_tensor) #91001

Closed

wonjoolee95 force-pushed the functionalization branch from 14d131d to af74813 Compare December 19, 2022 19:13

alanwaketan reviewed Dec 21, 2022

View reviewed changes

wonjoolee95 force-pushed the functionalization branch from ab6871a to 38fcb95 Compare December 22, 2022 00:38

alanwaketan mentioned this pull request Dec 22, 2022

functorch.functionalize doesn't work with torch.autograd.grad #91199

Open

wonjoolee95 force-pushed the functionalization branch from 38fcb95 to 6ebd38f Compare January 4, 2023 22:34

alanwaketan mentioned this pull request Jan 12, 2023

[Functionalization] Dynamic shape tests pytorch/xla#4448

Closed

7 tasks

wonjoolee95 force-pushed the functionalization branch from 6ebd38f to 016d9e6 Compare January 13, 2023 08:25

github-actions bot added the ciflow/inductor label Jan 13, 2023

wonjoolee95 force-pushed the functionalization branch from 016d9e6 to c29cfe4 Compare January 13, 2023 08:42

This was referenced Jan 13, 2023

Add back dynamo unit tests pytorch/xla#4453

Closed

Enable dynamo unit tests pytorch/xla#4454

Merged

wonjoolee95 force-pushed the functionalization branch from c29cfe4 to 55dac54 Compare January 25, 2023 02:42

wonjoolee95 force-pushed the functionalization branch from 55dac54 to c48991e Compare January 25, 2023 19:29

wonjoolee95 force-pushed the functionalization branch from e5811f1 to 34fad84 Compare January 31, 2023 19:43

wonjoolee95 force-pushed the functionalization branch from 34fad84 to 4c2a5ca Compare February 8, 2023 19:13

wonjoolee95 added 11 commits February 8, 2023 19:15

Fixes for PyTorch/XLA functionalization integration

bfaf49b

Add debugging lines

22cdf69

Update native_functions for as_strided_scatter

be10244

Fix linter issues

5db34f6

Comment out debugging print lines

dfa3b63

Fix cpu_tensor.copy_(xla_tensor)

056998a

Update codegen for cpu_tensor.copy_(xla_tensor)

8263cc4

Update gen_functionalization_type for XLA dynamic shape

15233f5

Revert "Update gen_functionalization_type for XLA dynamic shape"

c1c50cc

This reverts commit 1e66139201e493235fd4dadbe454d36eaa569aa7.

Update MetaConverter to exclude XLA tensors in raising NotImplemented…

1d638fd

… exception

Remove debugging lines

1f72367

wonjoolee95 force-pushed the functionalization branch from 4c2a5ca to f09903a Compare February 8, 2023 19:15

alanwaketan reviewed Feb 8, 2023

View reviewed changes

wonjoolee95 force-pushed the functionalization branch from 7896fe8 to 1f72367 Compare February 9, 2023 00:37

wonjoolee95 closed this Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for PyTorch/XLA functionalization integration #88787

Fixes for PyTorch/XLA functionalization integration #88787

wonjoolee95 commented Nov 10, 2022

pytorch-bot bot commented Nov 10, 2022 •

edited

Loading

alanwaketan commented Dec 13, 2022

alanwaketan commented Dec 13, 2022

wonjoolee95 commented Dec 14, 2022

alanwaketan commented Dec 14, 2022

wonjoolee95 commented Dec 14, 2022

wonjoolee95 commented Dec 15, 2022

alanwaketan commented Dec 15, 2022

alanwaketan commented Dec 15, 2022

alanwaketan Dec 21, 2022

alanwaketan Dec 21, 2022

bdhirsh Dec 21, 2022

alanwaketan Jan 12, 2023

alanwaketan Jan 13, 2023

wonjoolee95 commented Jan 4, 2023

wonjoolee95 commented Jan 25, 2023

alanwaketan commented Jan 25, 2023

wonjoolee95 commented Jan 25, 2023

alanwaketan Feb 8, 2023

wonjoolee95 commented Feb 9, 2023

Fixes for PyTorch/XLA functionalization integration #88787

Fixes for PyTorch/XLA functionalization integration #88787

Conversation

wonjoolee95 commented Nov 10, 2022

pytorch-bot bot commented Nov 10, 2022 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88787

❌ 1 Failures

alanwaketan commented Dec 13, 2022

alanwaketan commented Dec 13, 2022

wonjoolee95 commented Dec 14, 2022

alanwaketan commented Dec 14, 2022

wonjoolee95 commented Dec 14, 2022

wonjoolee95 commented Dec 15, 2022

alanwaketan commented Dec 15, 2022

alanwaketan commented Dec 15, 2022

alanwaketan Dec 21, 2022

Choose a reason for hiding this comment

alanwaketan Dec 21, 2022

Choose a reason for hiding this comment

bdhirsh Dec 21, 2022

Choose a reason for hiding this comment

alanwaketan Jan 12, 2023

Choose a reason for hiding this comment

alanwaketan Jan 13, 2023

Choose a reason for hiding this comment

wonjoolee95 commented Jan 4, 2023

wonjoolee95 commented Jan 25, 2023

alanwaketan commented Jan 25, 2023

wonjoolee95 commented Jan 25, 2023

alanwaketan Feb 8, 2023

Choose a reason for hiding this comment

wonjoolee95 commented Feb 9, 2023

pytorch-bot bot commented Nov 10, 2022 •

edited

Loading