Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.6 rocm torch.compile validations are failing #6082

Closed
atalman opened this issue Dec 17, 2024 · 0 comments
Closed

Release 2.6 rocm torch.compile validations are failing #6082

atalman opened this issue Dec 17, 2024 · 0 comments

Comments

@atalman
Copy link
Contributor

atalman commented Dec 17, 2024

Run: https://github.com/pytorch/test-infra/actions/runs/12377467982/job/34547066618?pr=6080

Error log:

python3 ./smoke_test/smoke_test.py
torch: 2.6.0.dev20241216+rocm6.2.4
ATen/Parallel:
	at::get_num_threads() : 4
	at::get_num_interop_threads() : 4
OpenMP 201511 (a.k.a. OpenMP 4.5)
	omp_get_max_threads() : 4
Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
	mkl_get_max_threads() : 4
Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
std::thread::hardware_concurrency() : 8
Environment variables:
	OMP_NUM_THREADS : [not set]
	MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

Nightly date check for torchvision version 0.22.0.dev20241217+rocm6.2.4
Nightly date check for torchaudio version 2.6.0.dev20241217+rocm6.2.4
Testing smoke_test_conv2d
Testing smoke_test_linalg on cpu
Path does not exist: /pytorch/pytorch/.ci/pytorch/vision
Output: 
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
torchvision: 0.22.0.dev20241217+rocm6.2.4
torch.cuda.is_available: False
torch.ops.image._jpeg_version() = 80
Is torchvision usable? True
German shepherd (cpu): 37.6%


Path does not exist: /pytorch/pytorch/.ci/pytorch/audio
Output: 
Skipping ffmpeg test.
Smoke test passed.


Testing smoke_test_compile for cpu and torch.float16
Traceback (most recent call last):
  File "/pytorch/pytorch/.ci/pytorch/./smoke_test/smoke_test.py", line 385, in <module>
    main()
  File "/pytorch/pytorch/.ci/pytorch/./smoke_test/smoke_test.py", line 379, in main
    smoke_test_cuda(
  File "/pytorch/pytorch/.ci/pytorch/./smoke_test/smoke_test.py", line 186, in smoke_test_cuda
    smoke_test_compile("cuda" if torch.cuda.is_available() else "cpu")
  File "/pytorch/pytorch/.ci/pytorch/./smoke_test/smoke_test.py", line 286, in smoke_test_compile
    x_pt2 = torch.compile(foo)(x)
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/__init__.py", line 2533, in compile
Traceback (most recent call last):
  File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 102, in <module>
    main()
  File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 98, in main
    run_cmd_or_die(f"docker exec -t {container_name} /exec")
  File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 39, in run_cmd_or_die
    raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}")
RuntimeError: Command docker exec -t a3f5b2b3c6258980af8758f4d101a0ac470fe435d39964735148eec9fd94ad65 /exec failed with exit code 1
    return torch._dynamo.optimize(
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 837, in optimize
    return _optimize(rebuild_ctx, *args, **kwargs)
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 912, in _optimize
    backend.get_compiler_config()
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/__init__.py", line 2317, in get_compiler_config
    from torch._inductor.compile_fx import get_patched_config_dict
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 97, in <module>
    from .fx_passes.joint_graph import joint_graph_passes
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/fx_passes/joint_graph.py", line 22, in <module>
    from ..pattern_matcher import (
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/pattern_matcher.py", line 95, in <module>
    from .lowering import fallback_node_due_to_unsupported_type
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/lowering.py", line 6555, in <module>
    from . import kernel
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/kernel/__init__.py", line 1, in <module>
    from . import mm, mm_common, mm_plus_mm, unpack_mixed_mm
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/kernel/mm.py", line 16, in <module>
    from torch._inductor.codegen.cpp_gemm_template import CppGemmTemplate
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/codegen/cpp_gemm_template.py", line 15, in <module>
    from ..kernel.mm_common import mm_args
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/kernel/mm_common.py", line 376, in <module>
    mm_platform_configs = build_rocm_gemm_configs(mm_platform_configs)
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/kernel/mm_common.py", line 31, in build_rocm_gemm_configs
    rocm_num_stages = get_backend_num_stages()
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/utils.py", line 1632, in get_backend_num_stages
    options = get_backend_options()
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/_inductor/runtime/triton_helpers.py", line 64, in get_backend_options
    target = driver.active.get_current_target()
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/triton/backends/amd/driver.py", line 497, in get_current_target
    device = self.get_current_device()
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/cuda/__init__.py", line 955, in current_device
    _lazy_init()
  File "/opt/conda/envs/conda-env-12377467982/lib/python3.9/site-packages/torch/cuda/__init__.py", line 320, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available
++ handle_error
++ echo 'Please note: We are currently migrating Linux Wheel builds to Manywheel 2.28'
++ echo 'If you see error like: ImportError: /lib64/libc.so.6: version GLIBC_2.28 not found'
++ echo 'Please migrate to: https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job_v2.yml'
++ echo 'Issue: https://github.com/pytorch/pytorch/issues/123649'
Please note: We are currently migrating Linux Wheel builds to Manywheel 2.28
If you see error like: ImportError: /lib64/libc.so.6: version GLIBC_2.28 not found
Please migrate to: https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job_v2.yml
Issue: https://github.com/pytorch/pytorch/issues/123649
Error: Process completed with exit code 1.
pytorchbot pushed a commit to pytorch/pytorch that referenced this issue Dec 31, 2024
To resolve: pytorch/test-infra#6082

Calling into Triton's get_backend_options will initialise CUDA and break CPU-only environments that may have hip installed.

Pull Request resolved: #143570
Approved by: https://github.com/atalman, https://github.com/jeffdaily

(cherry picked from commit 6617257)
kit1980 pushed a commit to pytorch/pytorch that referenced this issue Jan 6, 2025
[ROCm] Guard triton backend call around cuda.is_available (#143570)

To resolve: pytorch/test-infra#6082

Calling into Triton's get_backend_options will initialise CUDA and break CPU-only environments that may have hip installed.

Pull Request resolved: #143570
Approved by: https://github.com/atalman, https://github.com/jeffdaily

(cherry picked from commit 6617257)

Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant