Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some CUDA tests are unconditionally relying on specific hardware #11075

Closed
ScottTodd opened this issue Nov 7, 2022 · 1 comment
Closed

Some CUDA tests are unconditionally relying on specific hardware #11075

ScottTodd opened this issue Nov 7, 2022 · 1 comment
Assignees
Labels
codegen/nvvm NVVM code generation compiler backend infrastructure Relating to build systems, CI, or testing

Comments

@ScottTodd
Copy link
Member

Several of our tests assume too much about the machine they are running on, such as

iree_generated_trace_runner_test(
NAME
e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore
GENERATOR
"generate_e2e_matmul_tests.py"
GENERATOR_ARGS
"--lhs_rhs_type=f32"
"--shapes=gpu_large"
"--compilation_info=LLVMGPUMatmulTensorCore"
TRACE_RUNNER
iree-e2e-matmul-test
TARGET_BACKENDS
"cuda"
DRIVERS
"cuda"
COMPILER_FLAGS
"--iree-hal-cuda-llvm-target-arch=sm_80"
LABELS
"noasan"
"nomsan"
"notsan"
"noubsan"
"requires-gpu-nvidia"
)

That test fails on my machine like this:

      Start 47: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda
47: Test command: D:\dev\projects\iree-build\tools\iree-e2e-matmul-test.exe "D:/dev/projects/iree-build/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda.yaml" "--device=cuda"
47: Working Directory: D:/dev/projects/iree-build/tests/e2e/matmul
47: Environment variables: 
47:  TEST_TMPDIR=D:/dev/projects/iree-build/test_tmpdir/iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda_test_tmpdir
47: Test timeout computed to be: 60
47: D:\dev\projects\iree\runtime\src\iree\hal\drivers\cuda\native_executable.c:93: INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_PTX' (218): a PTX JIT compilation failed; while invoking native function hal.executable.create; while calling import; 
47: [ 1]   native hal.executable.create:0 -
47: [ 0] bytecode module.__init:250 D:/dev/projects/iree-build/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda.mlir:7:13
47:       at D:/dev/projects/iree-build/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda.mlir:6:1; replaying trace file 'D:/dev/projects/iree-build/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda.yaml'
15/20 Test #47: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ...............***Failed    1.28 sec

Tests like that should probably use some combination of

  • Automatic feature detection (use the default/host architecture if it can be detected, instead of always using sm_80)
  • Labels/tags that allow for filtering (e.g. cuda_uses_tensorcore, like the existing tag vulkan_uses_vk_khr_shader_float16_int8)
@ScottTodd ScottTodd added infrastructure Relating to build systems, CI, or testing awaiting-triage codegen/nvvm NVVM code generation compiler backend labels Nov 7, 2022
@pzread pzread self-assigned this Dec 13, 2022
@ScottTodd
Copy link
Member Author

This might have been fixed by #14173. Tentatively closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen/nvvm NVVM code generation compiler backend infrastructure Relating to build systems, CI, or testing
Projects
No open projects
Status: No status
Development

No branches or pull requests

2 participants