-
Notifications
You must be signed in to change notification settings - Fork 23.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for PyTorch/XLA functionalization integration #88787
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88787
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit 1f72367: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
612fc0a
to
df3bf42
Compare
cfc2922
to
86f8b11
Compare
@wonjoolee95 You probably want to hide your debug hints while running the CI here. It makes the log too large and impossible to parse over the browser. Also, a rebase will be appreciated. |
I guess it's better to log those information with proper log level control such that you can still use it locally for debugging but it won't increase the test log size dramatically. |
bd0a882
to
2197973
Compare
Hmm, seems like some functorch tests are still failing even after applying the diff generated by |
It looks like those machines are with gpu devices. Do you have a gpu env? I guess you need to run those tests on a gpu env. Otherwise, the tests will be skipped. |
Please correct me if I'm wrong, seems like only the first two failing tests are gpu devices? I'll wait for the rest of the CI to complete and then first all the non-gpu tests first. |
f9804d3
to
048200a
Compare
With the latest commit, all the TestAOTAutograd tests should succeed:
I'm still unsure about the previous GPU failure that failed with:
But I'll let the CI to run one more time before spending more time on the GPU test. |
That failure is very likely unrelated. |
According to hud, the deploy failure shouldn't be related. |
18db0a9
to
14d131d
Compare
14d131d
to
af74813
Compare
@@ -347,7 +347,6 @@ def emit_view_functionalization_body( | |||
}} | |||
); | |||
auto compute_reference_meta = | |||
{view_tensor_name}.key_set().has_backend(c10::BackendComponent::XLABit) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdhirsh Is this guard safe to remove? With it, I crashed on xla symbolic expand:
root@t1v-n-307ffe96-w-0:/workspaces/work/pytorch/xla# PJRT_DEVICE=CPU python test/test_dynamic_shapes.py -v TestDynamicShapes.test_simple_expand
test_simple_expand (__main__.TestDynamicShapes) ... ERROR
======================================================================
ERROR: test_simple_expand (__main__.TestDynamicShapes)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test/test_dynamic_shapes.py", line 18, in test_simple_expand
t5.expand(t2.size(0))
RuntimeError: /workspaces/work/pytorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:2109: SymIntArrayRef expected to contain only concrete integers
----------------------------------------------------------------------
Ran 1 test in 0.022s
FAILED (errors=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it regresses. It's not safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanwaketan so, the idea motivating this bit if code is the following:
- pytorch/XLA doesn't care about strides, so when comparing a pytorch program when run on CUDA vs XLA, the user will witness different strides on the tensors throughout their program
- functionalization gives XLA the ability to fix that problem; XLA can choose to not care about the value of strides in all of its kernels, but functionalization can run the meta function for every ATen, to properly set the strides
One question here is - do you think that's a benefit worth trying to capture for pytorch/XLA (stride correctness, for the user's perspective, for XLA tensors)? I'd be interested in @JackCaoG 's opinion.
Our options are either to:
(1) kill that code, and not bother trying to get correct strides
(2) make it more robust so it works on this test
In this test, it looks like you're using dynamic shapes, and the meta function we're calling doesn't play well with dynamic shapes. The way that the dynamic shapes workstream in core has been handling this is that we have python implementations / decompositons of a bunch of our ops, that we want to run when dynamic shapes are enabled. And it looks like... for some reason we aren't calling that python impl, and are instead calling the C++ one?
There's probably a better way to arrange for this to work with XLA, but one option option is enable the python dispatcher in your test, which should override a bunch of C++ meta kernels with their python equivalents:
from torch._dispatch.python import enable_python_dispatcher
with enable_python_dispatcher():
test()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the follow up on the xla side: pytorch/xla#4448.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like by enabling python dispatcher and implementing missing sym size ops can workaround this. But then it brings a bigger question whether we should enable python dispatcher for dynamic shapes or not.
ab6871a
to
38fcb95
Compare
38fcb95
to
6ebd38f
Compare
Rebased this and the XLA POC PR with master. The |
6ebd38f
to
016d9e6
Compare
016d9e6
to
c29cfe4
Compare
c29cfe4
to
55dac54
Compare
Bunch of
The last commit does touch dynamo related code but the CI used to be green even with that commit. While I try to reproduce this locally, let me also try to rebase from master and re-run this CI. |
We can always use hud.pytorch.org to determine if the test is broken in tip of the tree. |
55dac54
to
c48991e
Compare
e5811f1
to
34fad84
Compare
34fad84
to
4c2a5ca
Compare
This reverts commit 1e66139201e493235fd4dadbe454d36eaa569aa7.
4c2a5ca
to
f09903a
Compare
@@ -14580,3 +14580,6 @@ | |||
dispatch: | |||
CUDA: _fused_adamw_kernel_cuda_ | |||
autogen: _fused_adamw, _fused_adamw.out | |||
|
|||
- func: _propagate_xla_data(Tensor input, Tensor output) -> () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdhirsh I have added the op you suggested in pytorch/xla#4505 (comment). Please review. You can just check the commit called: [Functionalization] Adds _propagate_xla_data.
7896fe8
to
1f72367
Compare
Moving to a new PR with a new branch -- #94537. Marking this one closed. |
Picking up #88506