-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring back PyTorch/XLA GPU tests/builds #8577
Comments
Chengji has a wip branch for hermetic CUDA: https://github.com/yaochengji/xla/tree/chengji/clang-herm |
cc @ysiraichi |
cc @tengyifei to work with @ysiraichi. Ideally, we would like to use this bug to bring back GPU whl for 2.6 release. |
Update: I have tried Chengji's branch, but the build kept failing with:
Still investigating it. |
Here's a more verbose update on transitioning PyTorch/XLA to use OpenXLA hermetic CUDA. In summary, these are the things I have added to the build system (branch diff):
Even after all these steps, I am still hitting the error above. Reproducing the Error
$ bazel build @xla//xla/pjrt/c:pjrt_c_api_gpu_plugin.so --symlink_prefix=$(pwd)/bazel- --config=cuda My Thoughts
QuestionHow to fix this error? |
Thanks @ysiraichi - I've pined openxla partners to share their input on this issue. |
Can you try setting CC and CXX to the full absolute path of clang?
|
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721803568
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721838742
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721838742
Addressed the bug pytorch/xla#8577. PiperOrigin-RevId: 721838742
The fix openxla/xla#22165 is merged. Can you try building Pytorch again please? (without changing Clang symlink to absolute path) |
Thanks, @ybaturina. Using the absolute path did work! I still haven't tested your patch. |
@tengyifei @will-cromar What's the recommended process for installing something (e.g. |
@ysiraichi I believe you need to install xla/infra/ansible/development.Dockerfile Line 15 in 3578940
That builds a dev docker image that will be accessible at https://console.cloud.google.com/artifacts/docker/tpu-pytorch-releases/us-central1/docker/development |
🐛 Bug
PyTorch/XLA on GPUs builds have been failing since Oct 21, 2024.
In order to bring back GPU builds and tests, the first challenge is to build PyTorch/XLA with clang and hermetic CUDA 1. After that, there may be more tests to fix.
This bug tracks work/discussions needed to bring back GPU builds.
The text was updated successfully, but these errors were encountered: