-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not lowered: aten::_linalg_eigh #6017
Comments
@wonjoolee95 do you know if this is one of the core aten ops? |
It is not part of the core aten ops, but we can find someone to work on this. Probably will be after the 2.2 release though. |
Hello @wonjoolee95 Thank you for the answer. When will be the 2.2 release? Can the other problems (#6002, #6048) be related to this lowering issue? Because in the execution the problems occur in the neighborhood of the Is there anything I can do to help? |
Hello @wonjoolee95 I solved #6048 and it is not related with |
Thanks for the update. To answer your questions:
PyTorch's 2.2 release is set to be on January 11th (if I recall correctly), and PyTorch/XLA's release should follow shortly after.
I haven't looked at these two issues deeply yet, but an unlowered op most likely will not cause a crash since it would just fall back to CPU. Glad that you found the solution to one of the issues already.
If you feel comfortable with lowering the op, that'd be very appreciated and feel free to submit a PR! We have some readme's on op lowering, such as https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md. |
Hello Eventhough there exists a lowering problem with
Is this another error for |
Hey @mfatih7, the |
Hello @wonjoolee95 Here is the repo. Just run the file for single-core TPU run If you want to run multi-core change the selection on the lines. I observe that both single-core and multi-core runs are stuck.(No lines are printed on the terminal regarding to the accuracy) I did not observe the error above for the last couple of single-core runs. I am ready to modify the repo if the current version does not help to debug. |
Hello all Is there any opportunity for the lowering of aten::_linalg_eigh now? |
Thanks for bringing this up @mfatih7! |
@tengyifei is working on this. |
Hello In order to check your update, I am trying to install the nightly packages into a Python 3.10 environment on a TPUv4 VM.
But I get the error below:
What should I do? |
check the discussion in #7622 (comment) maybe? |
Hello @tengyifei, @JackCaoG I could finally test Using the same num_workers, the training speed with an environment (Python3.10) having current nightly versions of torch, torchvision, and torch_xla is 17% faster with respect to my old environment (Python 3.8), 20240229 nightly. I will also test with increased num_workers with maximized pumping rate. |
Hello
I am getting the above error during training
best regards
The text was updated successfully, but these errors were encountered: