-
-
Notifications
You must be signed in to change notification settings - Fork 83
Invalid IR for reductions along trivial dimension #542
Comments
If you can point me in the right direction, I would not mind fixing this |
The You can also try to follow the dispatch chain with |
I just realised I had forgotten to include the error, so here it goes. julia> # I guess the errors stems from Base.mapreduce, so...
Base.mapreducedim!(identity, +, dst_1, src)
ERROR: InvalidIRError: compiling mapreducedim_kernel_serial(typeof(identity), typeof(+), CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Tuple{Nothing,Nothing}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_getfield)
Stacktrace:
[1] getindex at tuple.jl:24
[2] map at tuple.jl:162 (repeats 2 times)
[3] mapreducedim_kernel_serial at /home/vicentinif/.julia/packages/CuArrays/ZYCpV/src/mapreduce.jl:5
Stacktrace:
[1] check_ir(::CUDAnative.CompilerJob, ::LLVM.Module) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/validation.jl:114
[2] macro expansion at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/driver.jl:188 [inlined]
[3] macro expansion at /opt/julia/global_depot/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
[4] #codegen#156(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/driver.jl:186
[5] #codegen at ./none:0 [inlined]
[6] #compile#155(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/driver.jl:47
[7] #compile at ./none:0 [inlined]
[8] #compile#154 at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/compiler/driver.jl:28 [inlined]
[9] #compile at ./none:0 [inlined] (repeats 2 times)
[10] macro expansion at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:403 [inlined]
[11] #cufunction#202(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::typeof(CuArrays.mapreducedim_kernel_serial), ::Type{Tuple{typeof(identity),typeof(+),CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Nothing,Nothing}}}) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:368
[12] cufunction(::Function, ::Type) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:368
[13] _mapreducedim!(::Function, ::Function, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}) at /home/vicentinif/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:176
[14] mapreducedim!(::Function, ::Function, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}) at ./reducedim.jl:274
[15] top-level scope at REPL[4]:2 |
So according to the above, the culprit is function mapreducedim_kernel_serial(f, op, R, A, range)
I = @cuindex R
newrange = map((r, i) -> r === nothing ? i : r, range, I)
for I′ in CartesianIndices(newrange)
@inbounds R[I...] = op(R[I...], f(A[I′]))
end
return
end and in particular this line newrange = map((r, i) -> r === nothing ? i : r, range, I) |
Ok, wild guess here because I have no idea on how to properly debug device codegen...
So... julia> I = CuArrays.ind2sub_(rand(10,10), 1)
(1, 1)
julia> I = CuArrays.ind2sub_(rand(10), 1)
(1,)
# and therefore...
julia> newrange = map((r, i) -> r === nothing ? i : r, (nothing, nothing), CuArrays.ind2sub_(rand(10,10), 1))
(1, 1)
Julia> newrange = map((r, i) -> r === nothing ? i : r, (nothing, nothing), CuArrays.ind2sub_(rand(10), 1))
ERROR: BoundsError: attempt to access ()
at index [1]
Stacktrace:
[1] getindex(::Tuple, ::Int64) at ./tuple.jl:24
[2] map at ./tuple.jl:162 [inlined] (repeats 2 times)
[3] top-level scope at REPL[24]:1 If you wonder how I got the @maleadt Do you have any idea on how to proceed? |
I didn't write this implementation, so I'm not sure how it works. But recent profiling has shown that even the parallel implementation is pretty damn slow, so we should just reimplement the mapreduce implementation from scratch. In the meantime, I suggest disabling the serial version and always using the parallel one, which seems to work here. Could you verify that works for your use case, and open a PR? |
Indeed, commenting out #if x_thr >= 8
blk, thr = (Rlength - 1) ÷ y_thr + 1, (x_thr, y_thr, 1)
parallel_kernel(parallel_kargs...; threads=thr, blocks=blk)
#else
# # not enough work, fall back to serial reduction
# range = ifelse.(length.(axes(R)) .== 1, axes(A), nothing)
# blk, thr = cudims(R)
# @cuda(blocks=blk, threads=thr, mapreducedim_kernel_serial(f, op, R, A, range))
#end
end in |
I can open a PR, but are you sure this won't lead to slowdowns elsewhere? |
When performance matters, the user will be using large arrays and not using the serial fallback, so I think this is a safe thing to do. |
This is still not solved. julia> a=CuArrays.rand(5,10)
5×10 CuArray{Float32,2,Nothing}:
0.0421446 0.267358 0.630052 0.990972 … 0.152812 0.687188 0.15777
0.73055 0.0208062 0.529826 0.879257 0.0119244 0.302241 0.567767
0.939997 0.833147 0.812721 0.999592 0.172571 0.807677 0.950438
0.843176 0.721778 0.0592983 0.206773 0.478356 0.407127 0.449898
0.61159 0.608743 0.479727 0.43722 0.805054 0.751322 0.982895
julia> b=CuArrays.rand(5)
5-element CuArray{Float32,1,Nothing}:
0.9893592
0.069014914
0.7996987
0.6109471
0.31840062
julia> mean!(b, a)
ERROR: InvalidIRError: compiling mapreducedim_kernel_serial(typeof(identity), typeof(Base.add_sum), CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Tuple{Nothing,Nothing}) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_getfield)
Stacktrace:
[1] getindex at tuple.jl:24
[2] map at tuple.jl:162 (repeats 2 times)
[3] mapreducedim_kernel_serial at /home/vicentinif/.julia/packages/CuArrays/A6GUx/src/mapreduce.jl:7
Stacktrace:
[1] check_ir(::CUDAnative.CompilerJob, ::LLVM.Module) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/validation.jl:116
[2] macro expansion at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/driver.jl:193 [inlined]
[3] macro expansion at /home/vicentinif/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
[4] #codegen#156(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/driver.jl:191
[5] #codegen at ./none:0 [inlined]
[6] #compile#155(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/driver.jl:52
[7] #compile at ./none:0 [inlined]
[8] #compile#154 at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/compiler/driver.jl:33 [inlined]
[9] #compile at ./none:0 [inlined] (repeats 2 times)
[10] macro expansion at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/execution.jl:393 [inlined]
[11] #cufunction#200(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::typeof(CuArrays.mapreducedim_kernel_serial), ::Type{Tuple{typeof(identity),typeof(Base.add_sum),CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Nothing,Nothing}}}) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/execution.jl:360
[12] cufunction(::Function, ::Type) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/execution.jl:360
[13] _mapreducedim!(::Function, ::Function, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}) at /home/vicentinif/.julia/packages/CUDAnative/hfulr/src/execution.jl:179
[14] mapreducedim! at ./reducedim.jl:274 [inlined]
[15] #sum!#599 at ./reducedim.jl:674 [inlined]
[16] #sum! at ./none:0 [inlined]
[17] #sum!#600 at ./reducedim.jl:676 [inlined]
[18] #sum! at ./none:0 [inlined]
[19] mean!(::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Statistics/src/Statistics.jl:126
[20] top-level scope at REPL[26]:1 with (neural_dev) pkg> st
Status `~/neural_dev/Project.toml`
[c5f51814] CUDAdrv v6.0.0
[3a865a2d] CuArrays v1.7.3
[587475ba] Flux v0.10.3
[872c559c] NNlib v0.6.6
[eb923273] NeuralQuantum v0.2.0 #master (https://github.com/PhilipVinc/NeuralQuantum.jl/)
[e88e6eb3] Zygote v0.4.9 |
It has. Not part of a release yet. |
ah! sorry |
Describe the bug
Performing reduction operations along some dimensions, where the destination array has fewer dimensions than the source produces invalid IR. This works with Base arrays.
example:
Both those methods work with Base arrays.
To Reproduce
The Minimal Working Example (MWE) for this bug:
Environment details (please complete this section)
Details on Julia:
The text was updated successfully, but these errors were encountered: