`@fastmath maximum` segfaults for `Float16` on master #49907

matthias314 · 2023-05-21T02:02:54Z

On master I get

julia> @fastmath maximum(Float16[1,2,3]; init = Float16(0))

LLVM ERROR: Cannot select: 0xe31218: v16f16 = X86ISD::FMAX nnan ninf nsz arcp contract afn reassoc 0xe4a4b0, 0xe1ddc8, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
  0xe4a4b0: v16f16,ch = CopyFromReg 0x91a8d8, Register:v16f16 %10, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
    0xe28c10: v16f16 = Register %10
  0xe1ddc8: v16f16,ch = CopyFromReg 0x91a8d8, Register:v16f16 %11, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
    0xe1e0a0: v16f16 = Register %11
In function: julia_maximum_fast_16

Complete output

julia> @fastmath maximum(Float16[1,2,3]; init = Float16(0))
LLVM ERROR: Cannot select: 0x258e0d8: v16f16 = X86ISD::FMAX nnan ninf nsz arcp contract afn reassoc 0x25a7370, 0x2572848, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
  0x25a7370: v16f16,ch = CopyFromReg 0x205e688, Register:v16f16 %10, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
    0x2587730: v16f16 = Register %10
  0x2572848: v16f16,ch = CopyFromReg 0x205e688, Register:v16f16 %11, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
    0x2572b20: v16f16 = Register %11
In function: julia_maximum_fast_19

[92681] signal (6.-6): Aborted
in expression starting at REPL[1]:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel15CannotYetSelectEPNS_6SDNodeE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel6SelectEPN4llvm6SDNodeE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.950 at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE.part.68 at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc14SimpleCompilerclERNS_6ModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
operator() at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:1272
_ZN4llvm3orc14IRCompileLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16IRTransformLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
emit at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:690
_ZN4llvm3orc31BasicIRLayerMaterializationUnit11materializeESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc19MaterializationTask3runEv at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm6detail18UniqueFunctionBaseIvJSt10unique_ptrINS_3orc4TaskESt14default_deleteIS4_EEEE8CallImplIPFvS7_EEEvPvRS7_ at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession22dispatchOutstandingMUsEv at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession17OL_completeLookupESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EESt10shared_ptrINS0_23AsynchronousSymbolQueryEESt8functionIFvRKNS_8DenseMapIPNS0_8JITDylibENS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISF_vEEEENSG_ISD_vEENS_6detail12DenseMapPairISD_SI_EEEEEE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc25InProgressFullLookupState8completeESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession19OL_applyQueryPhase1ESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EENS_5ErrorE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS0_10LookupKindERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS8_EENS0_15SymbolLookupSetENS0_11SymbolStateENS_15unique_functionIFvNS_8ExpectedINS_8DenseMapINS0_15SymbolStringPtrENS_18JITEvaluatedSymbolENS_12DenseMapInfoISI_vEENS_6detail12DenseMapPairISI_SJ_EEEEEEEEESt8functionIFvRKNSH_IS6_NS_8DenseSetISI_SL_EENSK_IS6_vEENSN_IS6_SV_EEEEEE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EENS0_15SymbolLookupSetENS0_10LookupKindENS0_11SymbolStateESt8functionIFvRKNS_8DenseMapIS5_NS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISI_vEEEENSJ_IS5_vEENS_6detail12DenseMapPairIS5_SL_EEEEEE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
addModule at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:1491
jl_add_to_ee at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:1896
_jl_compile_codeinst at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:243
jl_generate_fptr_impl at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:493
jl_compile_method_internal at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2475 [inlined]
jl_compile_method_internal at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2364
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2880 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
jl_apply at /cache/build/default-amdci4-2/julialang/julia-master/src/julia.h:1961 [inlined]
do_call at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:125
eval_value at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:222
eval_stmt_value at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:173 [inlined]
eval_body at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:602
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:760
jl_toplevel_eval_flex at /cache/build/default-amdci4-2/julialang/julia-master/src/toplevel.c:911
jl_toplevel_eval_flex at /cache/build/default-amdci4-2/julialang/julia-master/src/toplevel.c:854
ijl_toplevel_eval_in at /cache/build/default-amdci4-2/julialang/julia-master/src/toplevel.c:970
eval at ./boot.jl:383 [inlined]
eval_user_input at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2888 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
#run_repl#59 at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:376
run_repl at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:362
jfptr_run_repl_89532.1 at /tmp/julia-d2f5bbd7cf/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2888 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
#997 at ./client.jl:421
jfptr_YY.997_82669.1 at /tmp/julia-d2f5bbd7cf/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2888 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
jl_apply at /cache/build/default-amdci4-2/julialang/julia-master/src/julia.h:1961 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-2/julialang/julia-master/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:863 [inlined]
invokelatest at ./essentials.jl:860 [inlined]
run_main_repl at ./client.jl:405
exec_options at ./client.jl:322
_start at ./client.jl:541
jfptr__start_82698.1 at /tmp/julia-d2f5bbd7cf/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2888 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
jl_apply at /cache/build/default-amdci4-2/julialang/julia-master/src/julia.h:1961 [inlined]
true_main at /cache/build/default-amdci4-2/julialang/julia-master/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/default-amdci4-2/julialang/julia-master/src/jlapi.c:734
main at /cache/build/default-amdci4-2/julialang/julia-master/cli/loader_exe.c:58
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 2832 (Pool: 2823; Big: 9); GC: 0
Aborted (core dumped)

It works fine with Julia 1.9.0. Float32 and Float64 don't seem to be affected, and without @fastmath it also works for Float16.

Julia Version 1.10.0-DEV.1347
Commit d2f5bbd7cfb (2023-05-20 10:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

The text was updated successfully, but these errors were encountered:

giordano · 2023-05-21T07:53:33Z

Likely due to #48153 (CC @mcabbott)

mcabbott · 2023-05-21T10:52:34Z

Can reproduce. Note that it can be triggered by @fastmath reduce(max, x; init), but not without the init, nor by @fastmath max(x, y):

julia> versioninfo()
Julia Version 1.10.0-DEV.1351
Commit a6ad9ea099f (2023-05-21 08:01 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
  Threads: 5 on 12 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/cuda/lib64
  JULIA_NUM_THREADS = 4

julia> @fastmath max(Float16(1), Float16(2))
Float16(2.0)

julia> @fastmath reduce(max, Float16[1,2,3])
Float16(3.0)

julia> @fastmath reduce(max, Float16[1,2,3]; init = Float16(0))
LLVM ERROR: Cannot select: 0x204fef8: v16f16 = X86ISD::FMAX nnan ninf nsz arcp contract afn reassoc 0x2031f30, 0x1eec5d0, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 ] ] ]
  0x2031f30: v16f16,ch = CopyFromReg 0x1d83098, Register:v16f16 %9, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 ] ] ]

Also triggered by some foldl calls:

julia> foldl(Base.FastMath.max_fast, Float16[1, 2, 3])
LLVM ERROR: Cannot select: 0x248ba98: v16f16 = X86ISD::FMAX nnan ninf nsz arcp contract afn reassoc 0x24abec0, 0x24819c8, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 ] ] ]

On the same machine, a version from before #48153 does not have the problem:

julia> @fastmath reduce(max, Float16[1,2,3]; init = Float16(0))
Float16(3.0)

julia> versioninfo()
Julia Version 1.10.0-DEV.220
Commit 9ded051e9f8 (2022-12-29 10:05 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)

On an M1 mac, the problem does not seem to occur:

julia> @fastmath reduce(max, Float16[1,2,3]; init = Float16(0))
Float16(3.0)

julia> versioninfo()
Julia Version 1.10.0-DEV.1351
Commit a6ad9ea099 (2023-05-21 08:01 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.6.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 5 on 4 virtual cores

gbaraldi · 2023-05-21T11:04:29Z

This is probably an issue with the demote float16 pass. It would be cool to see the LLVM IR generated on the function that crashes.

giordano · 2023-05-21T12:06:03Z

julia> @code_llvm Base.FastMath.maximum_fast(Float16[1, 2, 3]; init = Float16(0))

;  @ fastmath.jl:380 within `maximum_fast`
define half @julia_maximum_fast_289([1 x half]* nocapture noundef nonnull readonly align 2 dereferenceable(2) %0, {}* noundef nonnull align 16 dereferenceable(40) %1) #0 {
top:
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #9
  %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8
  %ppgcstack = bitcast i8* %ppgcstack_i8 to {}****
  %pgcstack = load {}***, {}**** %ppgcstack, align 8
  %ptls_field16 = getelementptr inbounds {}**, {}*** %pgcstack, i64 2
  %2 = bitcast {}*** %ptls_field16 to i64***
  %ptls_load1718 = load i64**, i64*** %2, align 8
  %3 = getelementptr inbounds i64*, i64** %ptls_load1718, i64 2
  %safepoint = load i64*, i64** %3, align 8
  fence syncscope("singlethread") seq_cst
  %4 = load volatile i64, i64* %safepoint, align 8
  fence syncscope("singlethread") seq_cst
; ┌ @ fastmath.jl:380 within `#maximum_fast#1`
; │┌ @ reducedim.jl:406 within `reduce`
; ││┌ @ reducedim.jl:406 within `#reduce#811`
; │││┌ @ reducedim.jl:357 within `mapreduce`
      %5 = getelementptr inbounds [1 x half], [1 x half]* %0, i64 0, i64 0
; ││││┌ @ reducedim.jl:357 within `#mapreduce#809`
; │││││┌ @ reducedim.jl:362 within `_mapreduce_dim`
; ││││││┌ @ reduce.jl:44 within `mapfoldl_impl`
; │││││││┌ @ reduce.jl:48 within `foldl_impl`
; ││││││││┌ @ reduce.jl:56 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate` @ array.jl:938
; ││││││││││┌ @ essentials.jl:10 within `length`
             %6 = bitcast {}* %1 to { i8*, i64, i16, i16, i32 }*
             %7 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %6, i64 0, i32 1
             %8 = load i64, i64* %7, align 8
; ││││││││││└
; ││││││││││┌ @ int.jl:520 within `<` @ int.jl:513
             %.not = icmp eq i64 %8, 0
; ││││││││││└
            br i1 %.not, label %L19, label %L20

L19:                                              ; preds = %top
            %9 = load half, half* %5, align 2
            br label %L55

L20:                                              ; preds = %top
; ││││││││││┌ @ essentials.jl:13 within `getindex`
             %10 = bitcast {}* %1 to half**
             %11 = load half*, half** %10, align 8
             %12 = load half, half* %11, align 2
; │││││││││└└
; │││││││││ @ reduce.jl:58 within `_foldl_impl`
; │││││││││┌ @ reduce.jl:86 within `BottomRF`
; ││││││││││┌ @ fastmath.jl:251 within `max_fast`
; │││││││││││┌ @ fastmath.jl:191 within `gt_fast`
; ││││││││││││┌ @ fastmath.jl:189 within `lt_fast`
               %13 = load half, half* %5, align 2
; │││││││││││└└
; │││││││││││┌ @ essentials.jl:621 within `ifelse`
              %.inv = fcmp fast olt half %13, %12
              %14 = select fast i1 %.inv, half %12, half %13
; │││││││││└└└
; │││││││││ @ reduce.jl:60 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate`
; ││││││││││┌ @ int.jl:520 within `<` @ int.jl:513
             %.not1926.not = icmp eq i64 %8, 1
; ││││││││││└
            br i1 %.not1926.not, label %L55, label %iter.check

iter.check:                                       ; preds = %L20
            %15 = add nsw i64 %8, -1
            %min.iters.check = icmp ult i64 %15, 8
            br i1 %min.iters.check, label %vec.epilog.scalar.ph, label %vector.main.loop.iter.check

vector.main.loop.iter.check:                      ; preds = %iter.check
            %min.iters.check29 = icmp ult i64 %15, 32
            br i1 %min.iters.check29, label %vec.epilog.ph, label %vector.ph

vector.ph:                                        ; preds = %vector.main.loop.iter.check
            %n.vec = and i64 %15, -32
            %minmax.ident.splatinsert = insertelement <16 x half> poison, half %14, i64 0
            %minmax.ident.splat = shufflevector <16 x half> %minmax.ident.splatinsert, <16 x half> poison, <16 x i32> zeroinitializer
            br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
            %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
            %vec.phi = phi <16 x half> [ %minmax.ident.splat, %vector.ph ], [ %22, %vector.body ]
            %vec.phi30 = phi <16 x half> [ %minmax.ident.splat, %vector.ph ], [ %23, %vector.body ]
            %offset.idx = or i64 %index, 1
; ││││││││││┌ @ essentials.jl:13 within `getindex`
             %16 = getelementptr inbounds half, half* %11, i64 %offset.idx
             %17 = bitcast half* %16 to <16 x half>*
             %wide.load = load <16 x half>, <16 x half>* %17, align 2
             %18 = getelementptr inbounds half, half* %16, i64 16
             %19 = bitcast half* %18 to <16 x half>*
             %wide.load31 = load <16 x half>, <16 x half>* %19, align 2
; │││││││││└└
; │││││││││ @ reduce.jl:62 within `_foldl_impl`
; │││││││││┌ @ reduce.jl:86 within `BottomRF`
; ││││││││││┌ @ fastmath.jl:251 within `max_fast`
; │││││││││││┌ @ essentials.jl:621 within `ifelse`
              %20 = fcmp fast olt <16 x half> %vec.phi, %wide.load
              %21 = fcmp fast olt <16 x half> %vec.phi30, %wide.load31
              %22 = select <16 x i1> %20, <16 x half> %wide.load, <16 x half> %vec.phi
              %23 = select <16 x i1> %21, <16 x half> %wide.load31, <16 x half> %vec.phi30
              %index.next = add nuw i64 %index, 32
              %24 = icmp eq i64 %index.next, %n.vec
              br i1 %24, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
; │││││││││└└└
; │││││││││ @ reduce.jl:60 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate`
            %25 = call fast <16 x half> @llvm.maxnum.v16f16(<16 x half> %22, <16 x half> %23)
            %26 = call fast half @llvm.vector.reduce.fmax.v16f16(<16 x half> %25)
            %cmp.n = icmp eq i64 %15, %n.vec
            br i1 %cmp.n, label %L55, label %vec.epilog.iter.check

vec.epilog.iter.check:                            ; preds = %middle.block
            %ind.end36 = or i64 %n.vec, 2
            %ind.end34 = or i64 %n.vec, 1
            %n.vec.remaining = and i64 %15, 24
            %min.epilog.iters.check = icmp eq i64 %n.vec.remaining, 0
            br i1 %min.epilog.iters.check, label %vec.epilog.scalar.ph, label %vec.epilog.ph

vec.epilog.ph:                                    ; preds = %vec.epilog.iter.check, %vector.main.loop.iter.check
            %bc.merge.rdx = phi half [ %14, %vector.main.loop.iter.check ], [ %26, %vec.epilog.iter.check ]
            %vec.epilog.resume.val = phi i64 [ 0, %vector.main.loop.iter.check ], [ %n.vec, %vec.epilog.iter.check ]
            %n.vec33 = and i64 %15, -8
            %ind.end = or i64 %n.vec33, 1
            %ind.end35 = or i64 %n.vec33, 2
            %minmax.ident.splatinsert41 = insertelement <8 x half> poison, half %bc.merge.rdx, i64 0
            %minmax.ident.splat42 = shufflevector <8 x half> %minmax.ident.splatinsert41, <8 x half> poison, <8 x i32> zeroinitializer
            br label %vec.epilog.vector.body

vec.epilog.vector.body:                           ; preds = %vec.epilog.vector.body, %vec.epilog.ph
            %index39 = phi i64 [ %vec.epilog.resume.val, %vec.epilog.ph ], [ %index.next45, %vec.epilog.vector.body ]
            %vec.phi40 = phi <8 x half> [ %minmax.ident.splat42, %vec.epilog.ph ], [ %30, %vec.epilog.vector.body ]
            %offset.idx43 = or i64 %index39, 1
; ││││││││││┌ @ essentials.jl:13 within `getindex`
             %27 = getelementptr inbounds half, half* %11, i64 %offset.idx43
             %28 = bitcast half* %27 to <8 x half>*
             %wide.load44 = load <8 x half>, <8 x half>* %28, align 2
; │││││││││└└
; │││││││││ @ reduce.jl:62 within `_foldl_impl`
; │││││││││┌ @ reduce.jl:86 within `BottomRF`
; ││││││││││┌ @ fastmath.jl:251 within `max_fast`
; │││││││││││┌ @ essentials.jl:621 within `ifelse`
              %29 = fcmp fast olt <8 x half> %vec.phi40, %wide.load44
              %30 = select <8 x i1> %29, <8 x half> %wide.load44, <8 x half> %vec.phi40
              %index.next45 = add nuw i64 %index39, 8
              %31 = icmp eq i64 %index.next45, %n.vec33
              br i1 %31, label %vec.epilog.middle.block, label %vec.epilog.vector.body

vec.epilog.middle.block:                          ; preds = %vec.epilog.vector.body
; │││││││││└└└
; │││││││││ @ reduce.jl:60 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate`
            %32 = call fast half @llvm.vector.reduce.fmax.v8f16(<8 x half> %30)
            %cmp.n38 = icmp eq i64 %15, %n.vec33
            br i1 %cmp.n38, label %L55, label %vec.epilog.scalar.ph

vec.epilog.scalar.ph:                             ; preds = %vec.epilog.middle.block, %vec.epilog.iter.check, %iter.check
            %bc.resume.val = phi i64 [ %ind.end, %vec.epilog.middle.block ], [ %ind.end34, %vec.epilog.iter.check ], [ 1, %iter.check ]
            %bc.resume.val37 = phi i64 [ %ind.end35, %vec.epilog.middle.block ], [ %ind.end36, %vec.epilog.iter.check ], [ 2, %iter.check ]
            %bc.merge.rdx46 = phi half [ %32, %vec.epilog.middle.block ], [ %26, %vec.epilog.iter.check ], [ %14, %iter.check ]
            br label %L42

L42:                                              ; preds = %L42, %vec.epilog.scalar.ph
            %33 = phi i64 [ %value_phi628, %L42 ], [ %bc.resume.val, %vec.epilog.scalar.ph ]
            %value_phi628 = phi i64 [ %36, %L42 ], [ %bc.resume.val37, %vec.epilog.scalar.ph ]
            %value_phi527 = phi half [ %37, %L42 ], [ %bc.merge.rdx46, %vec.epilog.scalar.ph ]
; ││││││││││┌ @ essentials.jl:13 within `getindex`
             %34 = getelementptr inbounds half, half* %11, i64 %33
             %35 = load half, half* %34, align 2
; ││││││││││└
; ││││││││││┌ @ int.jl:87 within `+`
             %36 = add nuw nsw i64 %value_phi628, 1
; │││││││││└└
; │││││││││ @ reduce.jl:62 within `_foldl_impl`
; │││││││││┌ @ reduce.jl:86 within `BottomRF`
; ││││││││││┌ @ fastmath.jl:251 within `max_fast`
; │││││││││││┌ @ essentials.jl:621 within `ifelse`
              %.inv20 = fcmp fast olt half %value_phi527, %35
              %37 = select fast i1 %.inv20, half %35, half %value_phi527
; │││││││││└└└
; │││││││││ @ reduce.jl:60 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate`
; ││││││││││┌ @ int.jl:520 within `<` @ int.jl:513
             %exitcond.not = icmp eq i64 %value_phi628, %8
; ││││││││││└
            br i1 %exitcond.not, label %L55, label %L42

L55:                                              ; preds = %L42, %vec.epilog.middle.block, %middle.block, %L20, %L19
            %value_phi4 = phi half [ %9, %L19 ], [ %14, %L20 ], [ %26, %middle.block ], [ %32, %vec.epilog.middle.block ], [ %37, %L42 ]
; └└└└└└└└└└
  ret half %value_phi4
}

So the problem is that those halfs are not demoted to floats?

gbaraldi · 2023-05-21T14:53:44Z

I love that llvm creates an intrinsic that it doesn't know how to lower.
What confuses me is that putting the same unoptimized IR into opt-15 doesn't do this.
This is a LLVM-15 regression, @mcabbott's PR just exposed it.
@pchintalapudi

gbaraldi · 2023-07-20T19:33:45Z

This is llvm/llvm-project#59258, which got fixed in https://reviews.llvm.org/D139078. I saw that @maleadt was adding some patches so could we get this on as well?

maleadt · 2023-07-20T20:42:02Z

I just finished rebuilding all of LLVM 😅

gbaraldi · 2023-07-20T20:45:09Z

I'm so sorry

- llvm/llvm-project@af39acd closing #50448 - https://reviews.llvm.org/D139078 closing #49907

- llvm/llvm-project@af39acd closing #50448 - https://reviews.llvm.org/D139078 closing #49907 (cherry picked from commit 092231c)

giordano added regression Regression in behavior compared to a previous version fold sum, maximum, reduce, foldl, etc. float16 labels May 21, 2023

giordano added this to the 1.10 milestone May 21, 2023

matthias314 mentioned this issue May 21, 2023

@fastmath support for sum, prod, extrema and extrema! #49910

Open

gbaraldi mentioned this issue May 31, 2023

Julia 1.10 LLVM Error #50020

Closed

giordano added upstream The issue is with an upstream dependency, e.g. LLVM compiler:llvm For issues that relate to LLVM labels Jul 20, 2023

maleadt mentioned this issue Jul 22, 2023

Backport LLVM patches to fix various issues. #50639

Merged

maleadt linked a pull request Jul 22, 2023 that will close this issue

Backport LLVM patches to fix various issues. #50639

Merged

maleadt closed this as completed in #50639 Jul 23, 2023

maleadt added a commit that referenced this issue Jul 23, 2023

Backport LLVM patches to fix various issues. (#50639)

092231c

- llvm/llvm-project@af39acd closing #50448 - https://reviews.llvm.org/D139078 closing #49907

KristofferC pushed a commit that referenced this issue Jul 24, 2023

Backport LLVM patches to fix various issues. (#50639)

b21a343

- llvm/llvm-project@af39acd closing #50448 - https://reviews.llvm.org/D139078 closing #49907 (cherry picked from commit 092231c)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`@fastmath maximum` segfaults for `Float16` on master #49907

`@fastmath maximum` segfaults for `Float16` on master #49907

matthias314 commented May 21, 2023 •

edited

Loading

giordano commented May 21, 2023

mcabbott commented May 21, 2023 •

edited

Loading

gbaraldi commented May 21, 2023

giordano commented May 21, 2023

gbaraldi commented May 21, 2023 •

edited

Loading

gbaraldi commented Jul 20, 2023

maleadt commented Jul 20, 2023

gbaraldi commented Jul 20, 2023

@fastmath maximum segfaults for Float16 on master #49907

@fastmath maximum segfaults for Float16 on master #49907

Comments

matthias314 commented May 21, 2023 • edited Loading

giordano commented May 21, 2023

mcabbott commented May 21, 2023 • edited Loading

gbaraldi commented May 21, 2023

giordano commented May 21, 2023

gbaraldi commented May 21, 2023 • edited Loading

gbaraldi commented Jul 20, 2023

maleadt commented Jul 20, 2023

gbaraldi commented Jul 20, 2023

`@fastmath maximum` segfaults for `Float16` on master #49907

`@fastmath maximum` segfaults for `Float16` on master #49907

matthias314 commented May 21, 2023 •

edited

Loading

mcabbott commented May 21, 2023 •

edited

Loading

gbaraldi commented May 21, 2023 •

edited

Loading