Some CUDA tests are unconditionally relying on specific hardware #11075

ScottTodd · 2022-11-07T23:24:49Z

Several of our tests assume too much about the machine they are running on, such as

Lines 276 to 299 in 75636a9

    
           iree_generated_trace_runner_test( 
        
             NAME 
        
               e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore 
        
             GENERATOR 
        
               "generate_e2e_matmul_tests.py" 
        
             GENERATOR_ARGS 
        
               "--lhs_rhs_type=f32" 
        
               "--shapes=gpu_large" 
        
               "--compilation_info=LLVMGPUMatmulTensorCore" 
        
             TRACE_RUNNER 
        
               iree-e2e-matmul-test 
        
             TARGET_BACKENDS 
        
               "cuda" 
        
             DRIVERS 
        
               "cuda" 
        
             COMPILER_FLAGS 
        
               "--iree-hal-cuda-llvm-target-arch=sm_80" 
        
             LABELS 
        
               "noasan" 
        
               "nomsan" 
        
               "notsan" 
        
               "noubsan" 
        
               "requires-gpu-nvidia" 
        
           )

That test fails on my machine like this:

      Start 47: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda
47: Test command: D:\dev\projects\iree-build\tools\iree-e2e-matmul-test.exe "D:/dev/projects/iree-build/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda.yaml" "--device=cuda"
47: Working Directory: D:/dev/projects/iree-build/tests/e2e/matmul
47: Environment variables: 
47:  TEST_TMPDIR=D:/dev/projects/iree-build/test_tmpdir/iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda_test_tmpdir
47: Test timeout computed to be: 60
47: D:\dev\projects\iree\runtime\src\iree\hal\drivers\cuda\native_executable.c:93: INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_PTX' (218): a PTX JIT compilation failed; while invoking native function hal.executable.create; while calling import; 
47: [ 1]   native hal.executable.create:0 -
47: [ 0] bytecode module.__init:250 D:/dev/projects/iree-build/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda.mlir:7:13
47:       at D:/dev/projects/iree-build/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda.mlir:6:1; replaying trace file 'D:/dev/projects/iree-build/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda.yaml'
15/20 Test #47: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ...............***Failed    1.28 sec

Tests like that should probably use some combination of

Automatic feature detection (use the default/host architecture if it can be detected, instead of always using sm_80)
Labels/tags that allow for filtering (e.g. cuda_uses_tensorcore, like the existing tag vulkan_uses_vk_khr_shader_float16_int8)

The text was updated successfully, but these errors were encountered:

ScottTodd · 2023-10-25T16:32:34Z

This might have been fixed by #14173. Tentatively closing

ScottTodd added infrastructure Relating to build systems, CI, or testing awaiting-triage codegen/nvvm NVVM code generation compiler backend labels Nov 7, 2022

ScottTodd mentioned this issue Nov 7, 2022

Add build/test Windows CI #11009

Closed

pzread self-assigned this Dec 13, 2022

pzread removed the awaiting-triage label Dec 13, 2022

ScottTodd closed this as completed Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some CUDA tests are unconditionally relying on specific hardware #11075

Some CUDA tests are unconditionally relying on specific hardware #11075

ScottTodd commented Nov 7, 2022

ScottTodd commented Oct 25, 2023

Some CUDA tests are unconditionally relying on specific hardware #11075

Some CUDA tests are unconditionally relying on specific hardware #11075

Comments

ScottTodd commented Nov 7, 2022

ScottTodd commented Oct 25, 2023