[ROCm][Codegen] llama 8b fp8 with attention segfault #19921

AmosLewis · 2025-02-05T23:53:32Z

What happened?

iree compile bug llama_8b_fp8_atten_iree-comiple_gdbbug.txt

iree-compile: iree/third_party/llvm-project/llvm/include/llvm/Support/Casting.h:566: decltype(auto) llvm::cast(const From &) [To = mlir::detail::TypedValue<mlir::VectorType>, From = mlir::OpResult]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

Thread 9 "llvm-worker-4" received signal SIGABRT, Aborted.

gdb shows the bug here so I guess its a codegen roc issue

#10 0x00007fffeb7a0402 in (anonymous namespace)::ExtFOnFloat8RewritePattern::rewrite (this=0x7fffa423dcf0, op=..., rewriter=...)
    at /home/chi/src/iree/third_party/llvm-project/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp:113
...
#26 0x00007fffeb2e1b61 in mlir::iree_compiler::ConvertToROCDLPass::runOnOperation (this=0x7fffa43835e0)
    at /home/chi/src/iree/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToROCDL.cpp:182
/home/chi/src/iree/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToROCDL.cpp:182

Steps to reproduce your issue

compile iree

cmake -G Ninja -B ../iree-build  -S . \
    -DCMAKE_BUILD_TYPE=Debug \
    -DIREE_ENABLE_ASSERTIONS=ON \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DIREE_ENABLE_RUNTIME_TRACING=ON \
    -DIREE_BUILD_TRACY=OFF \
    -DIREE_ENABLE_LLD=ON \
    -DIREE_BUILD_PYTHON_BINDINGS=ON \
    -DPython3_EXECUTABLE="$(which python3)" \
    -DIREE_TARGET_BACKEND_CUDA=OFF \
    -DIREE_HAL_DRIVER_HIP=ON \
    -DIREE_TARGET_BACKEND_ROCM=ON .
cmake --build ../iree-build

Download input mlir here f8_attn_chi_castf32_roctorch.mlir,

Optional: Export the 8_attn_chi_castf32_roctorch.mlir manually with nod-ai/shark-ai#907

run the following cmd:

 /home/chi/src/iree-build/tools/iree-compile f8_attn_chi_castf32_roctorch.mlir \
  --iree-hip-target=gfx942 \
  -o=f8_attn_chi_castf32_roctorch.vmfb \
  --iree-hal-target-device=hip \
  --iree-dispatch-creation-enable-aggressive-fusion=true \
  --iree-global-opt-propagate-transposes=true \
  --iree-opt-aggressively-propagate-transposes=true \
  --iree-opt-data-tiling=false \
  --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \
  --iree-hal-indirect-command-buffers=true \
  --iree-stream-resource-memory-model=discrete \
  --iree-hal-memoization=true \
  --iree-opt-strip-assertions

What component(s) does this issue relate to?

Compiler

Version information

commit 3f713f5 (HEAD -> main, upstream/main)
Author: Jakub Kuderski jakub@nod-labs.com
Date: Wed Jan 29 12:49:42 2025 -0500

[ROCm] Add mi325x to known targets (#19846)

Additional context

No response

The text was updated successfully, but these errors were encountered:

ScottTodd · 2025-02-05T23:57:01Z

Can you share the full stacktrace? I only see #10 and #26, nothing before or between?

Edit: ah, it's in the first gist link. Thanks.

pashu123 · 2025-02-06T16:47:03Z

This is the exact https://gist.github.com/pashu123/0b261b96af91e893e055c662d9e8079b dispatch, which is failing, I'm looking into it.

pashu123 · 2025-02-06T16:57:41Z

iree-opt --split-input-file --iree-gpu-test-target=gfx940 --iree-convert-to-rocdl test.mlir https://gist.github.com/pashu123/9848fa6cc6b2b8cdbdad4cfa98dfccfc

IanWood1 · 2025-02-06T17:05:01Z

Possibly unrelated to the crash, but there shouldn't be any collapse/expand ops in the dispatch like that.

pashu123 · 2025-02-06T17:12:10Z

Possibly unrelated to the crash, but there shouldn't be any collapse/expand ops in the dispatch like that.

Oh! I just ran the compile command and dumped the dispatches. Could you take a look at that part?

The above dispatch is not running because of %225 = arith.extf %210 : vector<f8E4M3FNUZ> to vector<f32> inside ExtFOnFloat8RewritePattern. This may be because of 0d vectors. I am taking a look.

IanWood1 · 2025-02-06T17:17:40Z

I just opened a PR and it clears the tensor reshape ops, but still hitting the same error with the attention dispatch

pashu123 · 2025-02-06T18:03:53Z

Possibly unrelated to the crash, but there shouldn't be any collapse/expand ops in the dispatch like that.

Oh! I just ran the compile command and dumped the dispatches. Could you take a look at that part?

The above dispatch is not running because of %225 = arith.extf %210 : vector<f8E4M3FNUZ> to vector<f32> inside ExtFOnFloat8RewritePattern. This may be because of 0d vectors. I am taking a look.

llvm/llvm-project#126102

pashu123 · 2025-02-06T18:12:21Z

iree-opt --split-input-file --iree-gpu-test-target=gfx940 --iree-convert-to-rocdl test.mlir https://gist.github.com/pashu123/9848fa6cc6b2b8cdbdad4cfa98dfccfc

After applying llvm/llvm-project#126102 this dispatch is still failing with

within split at test.mlir:1 offset :160:11: error: 'llvm.fneg' op operand #0 must be floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type, but got 'vector<1xi8>'
    %93 = arith.negf %92 : vector<1x2x1x1x1x1xf8E4M3FNUZ>
          ^
within split at test.mlir:1 offset :160:11: note: see current operation: %692 = "llvm.fneg"(%691) <{fastmathFlags = #llvm.fastmath<none>}> : (vector<1xi8>) -> vector<1xi8>

@krzysz00 Do you have any suggestions for this?

pashu123 · 2025-02-07T15:03:13Z

I had a chat with @krzysz00 offline; we need to create another pass similar to https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Common/ConvertBf16ArithToF32.cpp for operations like arith.negf with fp8 types.

AmosLewis · 2025-02-14T02:32:46Z

New codegen issue llama_f8_attn_bug_log_0213.txt after I rebase iree to

commit 0ff26a7bef803edf3e22588f3e69a51c9335a79b (HEAD -> main, upstream/main)
Author: Prashant Kumar <pk5561@gmail.com>
Date:   Thu Feb 13 23:26:59 2025 +0530
    [Codegen] Add support to emulate unsupported float type (#19943)

f8_attn_chi_castf32_roctorch.mlir:45778:10: error: 'func.func' op failed to distribute
    %1 = iree_linalg_ext.attention {indexing_maps = [#map, #map1, #map2, #map3, #map4, #map5]} ins(%collapsed, %collapsed_1, %collapsed_2, %extracted, %arg4 : tensor<32x?x128xf8E4M3FNUZ>, tensor<32x?x128xf8E4M3FNUZ>, tensor<32x?x128xf8E4M3FNUZ>, f32, tensor<?x?xf8E4M3FNUZ>) outs(%cast : tensor<32x?x128xf32>) {
         ^
f8_attn_chi_castf32_roctorch.mlir:2706:12: note: called from
    %914 = util.call @sharktank_masked_flash_attention_1_32_128_128_f8E4M3FNUZ_f32_f32(%909, %910, %911, %913, %912) : (tensor<1x32x?x128xf8E4M3FNUZ>, tensor<1x32x?x128xf8E4M3FNUZ>, tensor<1x32x?x128xf8E4M3FNUZ>, tensor<f32>, tensor<?x?xf8E4M3FNUZ>) -> tensor<1x32x?x128xf32>
           ^
f8_attn_chi_castf32_roctorch.mlir:45778:10: note: see current operation:

Should we create a new issue id or continue this issue? @pashu123

pashu123 · 2025-02-14T04:37:49Z

New codegen issue llama_f8_attn_bug_log_0213.txt after I rebase iree to

commit 0ff26a7bef803edf3e22588f3e69a51c9335a79b (HEAD -> main, upstream/main)
Author: Prashant Kumar <pk5561@gmail.com>
Date:   Thu Feb 13 23:26:59 2025 +0530
    [Codegen] Add support to emulate unsupported float type (#19943)

f8_attn_chi_castf32_roctorch.mlir:45778:10: error: 'func.func' op failed to distribute
   %1 = iree_linalg_ext.attention {indexing_maps = [#map, #map1, #map2, #map3, #map4, #map5]} ins(%collapsed, %collapsed_1, %collapsed_2, %extracted, %arg4 : tensor<32x?x128xf8E4M3FNUZ>, tensor<32x?x128xf8E4M3FNUZ>, tensor<32x?x128xf8E4M3FNUZ>, f32, tensor<?x?xf8E4M3FNUZ>) outs(%cast : tensor<32x?x128xf32>) {
        ^
f8_attn_chi_castf32_roctorch.mlir:2706:12: note: called from
   %914 = util.call @sharktank_masked_flash_attention_1_32_128_128_f8E4M3FNUZ_f32_f32(%909, %910, %911, %913, %912) : (tensor<1x32x?x128xf8E4M3FNUZ>, tensor<1x32x?x128xf8E4M3FNUZ>, tensor<1x32x?x128xf8E4M3FNUZ>, tensor<f32>, tensor<?x?xf8E4M3FNUZ>) -> tensor<1x32x?x128xf32>
          ^
f8_attn_chi_castf32_roctorch.mlir:45778:10: note: see current operation:

Should we create a new issue id or continue this issue? @pashu123

Please create a new issue! The error is from the vector distribute pipeline.

AmosLewis · 2025-02-14T04:54:11Z

File a new issue for the vector distribute pipeline #19991

AmosLewis added the bug 🐞 Something isn't working label Feb 5, 2025

ScottTodd added the codegen/rocm ROCm code generation compiler backend (HIP/HSA) label Feb 5, 2025

IanWood1 mentioned this issue Feb 6, 2025

[Dispatch] Make dynamic attention ineligible for collapse #19929

Closed

MaheshRavishankar assigned pashu123 Feb 6, 2025

This was referenced Feb 10, 2025

[Codegen] Add f8 to f32 pass for arith.negf #19942

Closed

[Codegen] Add support to emulate unsupported float type #19943

Merged

pashu123 closed this as completed in #19943 Feb 13, 2025

pashu123 closed this as completed in 0ff26a7 Feb 13, 2025

AmosLewis reopened this Feb 14, 2025

AmosLewis mentioned this issue Feb 14, 2025

[Codegen] llama 8b fp8 with attention vector distribute fail #19991

Open

AmosLewis closed this as completed Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm][Codegen] llama 8b fp8 with attention segfault #19921

[ROCm][Codegen] llama 8b fp8 with attention segfault #19921

AmosLewis commented Feb 5, 2025 •

edited

Loading

ScottTodd commented Feb 5, 2025 •

edited

Loading

pashu123 commented Feb 6, 2025

pashu123 commented Feb 6, 2025

IanWood1 commented Feb 6, 2025

pashu123 commented Feb 6, 2025

IanWood1 commented Feb 6, 2025

pashu123 commented Feb 6, 2025

pashu123 commented Feb 6, 2025 •

edited

Loading

pashu123 commented Feb 7, 2025

AmosLewis commented Feb 14, 2025

pashu123 commented Feb 14, 2025

AmosLewis commented Feb 14, 2025

[ROCm][Codegen] llama 8b fp8 with attention segfault #19921

[ROCm][Codegen] llama 8b fp8 with attention segfault #19921

Comments

AmosLewis commented Feb 5, 2025 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

ScottTodd commented Feb 5, 2025 • edited Loading

pashu123 commented Feb 6, 2025

pashu123 commented Feb 6, 2025

IanWood1 commented Feb 6, 2025

pashu123 commented Feb 6, 2025

IanWood1 commented Feb 6, 2025

pashu123 commented Feb 6, 2025

pashu123 commented Feb 6, 2025 • edited Loading

pashu123 commented Feb 7, 2025

AmosLewis commented Feb 14, 2025

pashu123 commented Feb 14, 2025

AmosLewis commented Feb 14, 2025

AmosLewis commented Feb 5, 2025 •

edited

Loading

ScottTodd commented Feb 5, 2025 •

edited

Loading

pashu123 commented Feb 6, 2025 •

edited

Loading