Llama-7b Crashes during IREE Compilation #16317

vivekkhandelwal1 · 2024-02-05T17:03:05Z

What happened?

While compilation, I'm getting the following crash:

iree-compile: /home/azureuser/work/iree-vivek/third_party/llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:843: static void mlir::tensor::EmptyOp::build(mlir:
:OpBuilder &, mlir::OperationState &, ArrayRef<int64_t>, mlir::Type, mlir::Attribute): Assertion `all_of(staticShape, [](int64_t sz) { return !ShapedType::isDynamic(sz); }) && "expected only static sizes"' failed.                                                                                                                 Please report issues to https://github.com/openxla/iree/issues and include the crash backtrace.                                                                    
  #0 0x00007fd3a8bc5add llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/azureuser/work/iree-vivek/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:11                                                                                                                                                           #1 0x00007fd3a8bc5fcb PrintStackTraceSignalHandler(void*) /home/azureuser/work/iree-vivek/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:798:1       
  #2 0x00007fd3a8bc3ff6 llvm::sys::RunSignalHandlers() /home/azureuser/work/iree-vivek/third_party/llvm-project/llvm/lib/Support/Signals.cpp:105:5                   #3 0x00007fd3a8bc67e5 SignalHandler(int) /home/azureuser/work/iree-vivek/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1                        
  #4 0x00007fd39ca42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)                                                                                                    #5 0x00007fd39ca969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)                                                                                     
  #6 0x00007fd39ca42476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)                                                                                          
  #7 0x00007fd39ca287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)                                                                                            
  #8 0x00007fd39ca2871b (/lib/x86_64-linux-gnu/libc.so.6+0x2871b)                                                                                                  
  #9 0x00007fd39ca39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)                                                                                                  
 #10 0x00007fd3b079dcf8 mlir::tensor::EmptyOp::build(mlir::OpBuilder&, mlir::OperationState&, llvm::ArrayRef<long>, mlir::Type, mlir::Attribute) /home/azureuser/wo
rk/iree-vivek/third_party/llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:844:9                                                                              
 #11 0x00007fd3a98d1731 mlir::tensor::EmptyOp mlir::OpBuilder::create<mlir::tensor::EmptyOp, llvm::SmallVector<long, 6u>&, mlir::Type&>(mlir::Location, llvm::Small
Vector<long, 6u>&, mlir::Type&) /home/azureuser/work/iree-vivek/third_party/llvm-project/mlir/include/mlir/IR/Builders.h:500:5                                     
 #12 0x00007fd3ae30d36e mlir::linalg::GeneralizeOuterUnitDimsPackOpPattern::matchAndRewrite(mlir::tensor::PackOp, mlir::PatternRewriter&) const /home/azureuser/wor
k/iree-vivek/third_party/llvm-project/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp:1214:26                                                                    
 #13 0x00007fd3ac06246b mlir::detail::OpOrInterfaceRewritePatternBase<mlir::tensor::PackOp>::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const /home/
azureuser/work/iree-vivek/third_party/llvm-project/mlir/include/mlir/IR/PatternMatch.h:330:12                                                                      
 #14 0x00007fd3afec5886 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::f
unction_ref<void (mlir::Pattern const&)>, llvm::function_ref<mlir::LogicalResult (mlir::Pattern const&)>)::$_1::operator()() const /home/azureuser/work/iree-vivek/
third_party/llvm-project/mlir/lib/Rewrite/PatternApplicator.cpp:208:31

Steps to reproduce your issue

IR is available at: https://gist.github.com/vivekkhandelwal1/185ed5eb3f76c7d88fd37a489185d0fb

Command to reproduce the crash:

./build/tools/iree-compile --iree-hal-target-backends=llvm-cpu --iree-input-type=tm_tensor --iree-util-zero-fill-elided-attrs llama_7b_linalg_elided.mlir -o llama_7b_fp32.vmfb

What component(s) does this issue relate to?

MLIR, Compiler

Version information

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

benvanik · 2024-02-05T17:04:47Z

guessing this is an upstream bug in linalg::GeneralizeOuterUnitDimsPackOpPattern

hanhanW · 2024-02-05T17:48:05Z

I can not reproduce the issue when I specify the CPU to cascadelake. Here is the command I used:

iree-compile --output-format=vm-bytecode \
  --iree-hal-target-backends=llvm-cpu \
  --iree-input-type=tm_tensor \
  --iree-util-zero-fill-elided-attrs \
  --iree-llvmcpu-target-cpu=cascadelake \
  --iree-llvmcpu-target-triple=x86_64-unknown-linux-gnu \
  ~/llama_7b_linalg_elided.mlir -o /tmp/z.vmfb

What is the target cpu?

vivekkhandelwal1 · 2024-02-05T17:50:34Z

I can not reproduce the issue when I specify the CPU to cascadelake. Here is the command I used:

iree-compile --output-format=vm-bytecode \
  --iree-hal-target-backends=llvm-cpu \
  --iree-input-type=tm_tensor \
  --iree-util-zero-fill-elided-attrs \
  --iree-llvmcpu-target-cpu=cascadelake \
  --iree-llvmcpu-target-triple=x86_64-unknown-linux-gnu \
  ~/llama_7b_linalg_elided.mlir -o /tmp/z.vmfb

What is the target cpu?

This is the CPU that I'm running this model on: Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz

hanhanW · 2024-02-05T17:52:19Z

Can you share the output of lscpu with me?

vivekkhandelwal1 · 2024-02-05T17:53:48Z

Can you share the output of lscpu with me?

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  64
  On-line CPU(s) list:   0-63
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           2
    Stepping:            1
    BogoMIPS:            4589.37
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm const
                         ant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor la
                         hf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arch_capabilities
Virtualization features: 
  Hypervisor vendor:     Microsoft
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   1 MiB (32 instances)
  L1i:                   1 MiB (32 instances)
  L2:                    8 MiB (32 instances)
  L3:                    100 MiB (2 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-31
  NUMA node1 CPU(s):     32-63
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

vivekkhandelwal1 · 2024-02-05T18:00:11Z

@hanhanW, using this flag --iree-llvmcpu-target-cpu=cascadelake worked.

hanhanW · 2024-02-05T18:04:42Z

Okay, I can reproduce the issue if I don't specify target cpu... There is definitely a bug in the upstream pattern.

Your CPU is broadwell, and I can also reproduce it with --iree-llvmcpu-target-cpu=broadwell. I will take a look at this. Thanks for all the info!

vivekkhandelwal1 · 2024-02-05T18:10:05Z

broadwell

Thank you @hanhanW!

hanhanW · 2024-02-05T18:47:49Z

The quick workaround (for functionality) is deleting these https://github.com/openxla/iree/blob/af387d39d2dd553d03943c6a698cc15b6a8fc483/compiler/src/iree/compiler/Codegen/Common/DecomposePackUnPackOps.cpp#L145-L155 (I already shared it with Kumar)

I have a smaller repro now: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-decompose-pack-unpack-ops))" ~/repro.mlir. I will take a deeper look tomorrow.

func.func @main_graph_dispatch_17_pack_f32() {
  %c0 = arith.constant 0 : index
  %c5 = arith.constant 5 : index
  %c1 = arith.constant 1 : index
  %cst = arith.constant 0.000000e+00 : f32
  %c3200 = arith.constant 3200 : index
  %c88320 = arith.constant 88320 : index
  %c32 = arith.constant 32 : index
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c3200) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x5x5xf32>>{%c32}
  %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c88320) : !flow.dispatch.tensor<writeonly:tensor<?x1x5x8x1xf32>>{%c32}
  %workgroup_id_x = hal.interface.workgroup.id[0] : index
  %workgroup_count_x = hal.interface.workgroup.count[0] : index
  %workgroup_id_y = hal.interface.workgroup.id[1] : index
  %workgroup_count_y = hal.interface.workgroup.count[1] : index
  %workgroup_id_z = hal.interface.workgroup.id[2] : index
  %workgroup_count_z = hal.interface.workgroup.count[2] : index
  %2 = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_id_z]
  %3 = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_count_z]
  scf.for %arg0 = %2 to %c32 step %3 {
    scf.for %arg1 = %workgroup_id_y to %c1 step %workgroup_count_y {
      %4 = affine.apply affine_map<()[s0] -> (s0 * 2)>()[%workgroup_id_x]
      %5 = affine.apply affine_map<()[s0] -> (s0 * 2)>()[%workgroup_count_x]
      scf.for %arg2 = %4 to %c5 step %5 {
        %6 = affine.min affine_map<(d0) -> (-d0 + 5, 2)>(%arg2)
        %7 = flow.dispatch.tensor.load %1, offsets = [%arg0, %arg1, %arg2, 0, 0], sizes = [32, 1, %6, 8, 1], strides = [1, 1, 1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<?x1x5x8x1xf32>>{%c32} -> tensor<32x1x?x8x1xf32>
        %8 = affine.min affine_map<(d0) -> (-d0 + 32, 64)>(%arg0)
        %9 = affine.apply affine_map<(d0) -> (d0 * 8)>(%arg1)
        %10 = flow.dispatch.tensor.load %0, offsets = [%arg0, %9, %arg2], sizes = [%8, 5, %6], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x5x5xf32>>{%c32} -> tensor<?x5x?xf32>
        %11 = scf.for %arg3 = %c0 to %c32 step %c1 iter_args(%arg4 = %7) -> (tensor<32x1x?x8x1xf32>) {
          %12 = scf.for %arg5 = %c0 to %6 step %c1 iter_args(%arg6 = %arg4) -> (tensor<32x1x?x8x1xf32>) {
            %13 = affine.min affine_map<(d0, d1) -> (1, d0 - d1)>(%8, %arg3)
            %extracted_slice = tensor.extract_slice %10[%arg3, 0, %arg5] [%13, 5, 1] [1, 1, 1] : tensor<?x5x?xf32> to tensor<?x5x1xf32>
            %extracted_slice_0 = tensor.extract_slice %arg6[%arg3, 0, %arg5, 0, 0] [1, 1, 1, 8, 1] [1, 1, 1, 1, 1] : tensor<32x1x?x8x1xf32> to tensor<1x1x1x8x1xf32>
            %pack = tensor.pack %extracted_slice padding_value(%cst : f32) outer_dims_perm = [0, 1, 2] inner_dims_pos = [1, 2] inner_tiles = [8, 1] into %extracted_slice_0 {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[64, 1, 2], [1, 1, 1]]>} : tensor<?x5x1xf32> -> tensor<1x1x1x8x1xf32>
            %inserted_slice = tensor.insert_slice %pack into %arg6[%arg3, 0, %arg5, 0, 0] [1, 1, 1, 8, 1] [1, 1, 1, 1, 1] : tensor<1x1x1x8x1xf32> into tensor<32x1x?x8x1xf32>
            scf.yield %inserted_slice : tensor<32x1x?x8x1xf32>
          }
          scf.yield %12 : tensor<32x1x?x8x1xf32>
        }
        flow.dispatch.tensor.store %11, %1, offsets = [%arg0, %arg1, %arg2, 0, 0], sizes = [32, 1, %6, 8, 1], strides = [1, 1, 1, 1, 1] : tensor<32x1x?x8x1xf32> -> !flow.dispatch.tensor<writeonly:tensor<?x1x5x8x1xf32>>{%c32}
      }
    }
  }
  return
}

hanhanW · 2024-02-06T06:43:06Z

Summarize the discussion from Discord:

We hit an assertion in runtime because the frontend generates bad code. The below is the example. All the inputs are constants (which are generated by frontend), and then it hits the assertion. I will fix the compilation issue in upstream repo.

    %cst_6 = arith.constant dense<-1> : tensor<i64>
    %cst_7 = arith.constant dense<1> : tensor<4xi64>
    %cst_8 = arith.constant dense<[1, 1, 5, 5]> : tensor<4xi64>
    %6 = linalg.generic {indexing_maps = [#map2, #map3, #map2], iterator_types = ["parallel"]} ins(%cst_7, %cst_6 : tensor<4xi64>, tensor<i64>) outs(%5 : tensor<4xi64>) {
    ^bb0(%in: i64, %in_413: i64, %out: i64):
      %1654 = arith.muli %in, %in_413 : i64
      linalg.yield %1654 : i64
    } -> tensor<4xi64>
    %7 = tensor.empty() : tensor<4xi1>
    %8 = linalg.generic {indexing_maps = [#map2, #map2, #map2], iterator_types = ["parallel"]} ins(%cst_8, %6 : tensor<4xi64>, tensor<4xi64>) outs(%7 : tensor<4xi1>) {
    ^bb0(%in: i64, %in_413: i64, %out: i1):
      %1654 = arith.cmpi eq, %in, %in_413 : i64
      linalg.yield %1654 : i1
    } -> tensor<4xi1>
    %9 = linalg.generic {indexing_maps = [#map2, #map2, #map2, #map2], iterator_types = ["parallel"]} ins(%8, %cst_7, %cst_8 : tensor<4xi1>, tensor<4xi64>, tensor<4xi64>)
    ^bb0(%in: i1, %in_413: i64, %in_414: i64, %out: i64):
      %1654 = arith.select %in, %in_413, %in_414 : i64
      linalg.yield %1654 : i64
    } -> tensor<4xi64>
    %extracted_slice = tensor.extract_slice %9[0] [1] [1] : tensor<4xi64> to tensor<1xi64>
    %extracted = tensor.extract %extracted_slice[%c0] : tensor<1xi64>
    %extracted_slice_23 = tensor.extract_slice %9[1] [1] [1] : tensor<4xi64> to tensor<1xi64>
    %extracted_24 = tensor.extract %extracted_slice_23[%c0] : tensor<1xi64>
    %13 = arith.cmpi slt, %extracted_24, %c0_i64 : i64
    %14 = arith.index_cast %extracted_24 : i64 to index
    %15 = arith.select %13, %c1, %14 : index
    %71 = arith.cmpi eq, %15, %c32 : index
    cf.assert %71, "mismatched size for broadcast"

…80848) Fixes iree-org/iree#16317

vivekkhandelwal1 added the bug 🐞 Something isn't working label Feb 5, 2024

hanhanW self-assigned this Feb 5, 2024

hanhanW mentioned this issue Feb 6, 2024

[mlir][tensor] Add support for tensor.pack static shapes inference. llvm/llvm-project#80848

Merged

hanhanW closed this as completed in llvm/llvm-project#80848 Feb 14, 2024

hanhanW added a commit to llvm/llvm-project that referenced this issue Feb 14, 2024

[mlir][tensor] Add support for tensor.pack static shapes inference. (#…

bc08cc2

…80848) Fixes iree-org/iree#16317

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-7b Crashes during IREE Compilation #16317

Llama-7b Crashes during IREE Compilation #16317

vivekkhandelwal1 commented Feb 5, 2024

benvanik commented Feb 5, 2024

hanhanW commented Feb 5, 2024

vivekkhandelwal1 commented Feb 5, 2024

hanhanW commented Feb 5, 2024

vivekkhandelwal1 commented Feb 5, 2024

vivekkhandelwal1 commented Feb 5, 2024

hanhanW commented Feb 5, 2024

vivekkhandelwal1 commented Feb 5, 2024

hanhanW commented Feb 5, 2024

hanhanW commented Feb 6, 2024 •

edited

Loading

Llama-7b Crashes during IREE Compilation #16317

Llama-7b Crashes during IREE Compilation #16317

Comments

vivekkhandelwal1 commented Feb 5, 2024

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

benvanik commented Feb 5, 2024

hanhanW commented Feb 5, 2024

vivekkhandelwal1 commented Feb 5, 2024

hanhanW commented Feb 5, 2024

vivekkhandelwal1 commented Feb 5, 2024

vivekkhandelwal1 commented Feb 5, 2024

hanhanW commented Feb 5, 2024

vivekkhandelwal1 commented Feb 5, 2024

hanhanW commented Feb 5, 2024

hanhanW commented Feb 6, 2024 • edited Loading

hanhanW commented Feb 6, 2024 •

edited

Loading