Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflowError in broadcasting #62

Open
ranocha opened this issue Dec 2, 2022 · 14 comments
Open

StackOverflowError in broadcasting #62

ranocha opened this issue Dec 2, 2022 · 14 comments

Comments

@ranocha
Copy link
Member

ranocha commented Dec 2, 2022

Reported by @efaulhaber in #60 (comment)

using StrideArrays: PtrArray
using OrdinaryDiffEq


tspan = (0.0, 0.1)
u0_ode = [0.0]
ode = ODEProblem((du_ode, u_ode, semi, t) -> nothing, u0_ode, tspan)

sol = solve(ode, RDPK3SpFSAL49(thread=OrdinaryDiffEq.True()));
ERROR: StackOverflowError:
Stacktrace:
     [1] materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}})
       @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:185
     [2] macro expansion
       @ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined]
     [3] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), (true,), (true,), (), (), (), ())})
       @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530
     [4] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
       @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
     [5] _materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
       @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:178
--- the last 5 lines are repeated 19994 more times ---
 [99976] materialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}})
       @ StrideArrays ~/.julia/packages/StrideArrays/zzjCK/src/broadcast.jl:185
 [99977] macro expansion
       @ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined]
 [99978] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), (true,), (true,), (), (), (), ())})
       @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530
 [99979] vmaterialize!(dest::StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(DiffEqBase.calculate_residuals), Tuple{StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, StrideArraysCore.PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{Static.StaticInt{8}}, Tuple{Static.StaticInt{1}}}, Float64, Float64, Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)}, Float64}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
       @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
@ranocha
Copy link
Member Author

ranocha commented Dec 2, 2022

Reduced:

julia> using StrideArrays

julia> foo(x, f) = f(x)
foo (generic function with 1 method)

julia> src1 = rand(10); dst = zero(src1);

julia> src1_ptr = PtrArray(src1); dst_ptr = PtrArray(dst);

julia> @. dst = foo(src1, abs)
10-element Vector{Float64}:
 0.4497752365375052
 0.234212713779973
 0.8718344166425321
 0.1169076748948169
 0.12774646887625019
 0.41850986610044205
 0.017042548453313433
 0.9246865917682306
 0.4249229606273417
 0.7560184865926094

julia> @. dst_ptr = foo(src1_ptr, abs)
ERROR: StackOverflowError:
Stacktrace:
     [1] macro expansion
       @ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined]
     [2] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), ())})
       @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530
     [3] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
       @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
     [4] _materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
       @ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:178
     [5] materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}})
       @ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:185
--- the last 5 lines are repeated 19994 more times ---
 [99976] macro expansion
       @ ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530 [inlined]
 [99977] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)}, #unused#::Val{((true,), ())})
       @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:530
 [99978] vmaterialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{:StrideArrays}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
       @ LoopVectorization ~/.julia/packages/LoopVectorization/DDH6Z/src/broadcast.jl:664
 [99979] _materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}}, #unused#::Val{(true, 0, 0, 0, true, 4, 32, 15, 64, 0x0000000000000001, 0, true)})
       @ StrideArrays ~/.julia/dev/StrideArrays/src/broadcast.jl:178

@ranocha
Copy link
Member Author

ranocha commented Dec 2, 2022

As far as I understand, StrideArrays.jl prepares everything and hands it over to LoopVectorization.jl. In
https://github.com/JuliaSIMD/LoopVectorization.jl/blob/943e35ddb6bb30c4777efc225b589d33213c95ac/src/broadcast.jl#L528-L567
LV spits out a Broadcast.materialize! call again, which is the one overloaded in StrideArrays.jl we started with. I have not enough understanding of the complete code to see where to best interrupt this cycle.

@efaulhaber
Copy link

Is there a temporary workaround for this?

@ranocha
Copy link
Member Author

ranocha commented Dec 16, 2022

I am not sure but I think this used to work with some version of StrideArrays/StrideArraysCore/LoopVectorization. Do you have the bandwidth to bisect the versions of these packages to find the problematic change, @efaulhaber?

@efaulhaber
Copy link

I can try and see how far I come before I run out of bandwidth.

@efaulhaber
Copy link

efaulhaber commented Dec 28, 2022

Okay, here is what I found:
Your reduced example works fine with StrideArrays v0.1.19, StrideArraysCore v0.3.17, LoopVectorization v0.12.128.
When I leave the other packages at these versions and update LoopVectorization to v0.12.129, I get this error:

ERROR: BoundsError: attempt to access Tuple{Bool, Int8, Int8, Int8, Bool, UInt64, Int64} at index [8]
Stacktrace:
 [1] indexed_iterate(t::Tuple{Bool, Int8, Int8, Int8, Bool, UInt64, Int64}, i::Int64, state::Int64)
   @ Base .\tuple.jl:88
 [2] #s191#70
   @ C:\Users\Erik\.julia\packages\LoopVectorization\gyra6\src\condense_loopset.jl:561 [inlined]
 [3] var"#s191#70"(CNFARG::Any, W::Any, RS::Any, AR::Any, CLS::Any, NT::Any, ::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
   @ LoopVectorization .\none:0
 [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
   @ Core .\boot.jl:582
 [5] avx_config_val
   @ C:\Users\Erik\.julia\packages\LoopVectorization\gyra6\src\condense_loopset.jl:568 [inlined]
 [6] materialize!(dest::PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, bc::Base.Broadcast.Broadcasted{StrideArrays.CartesianStyle{Tuple{Int64}, 1}, Nothing, typeof(foo), Tuple{PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}, Base.RefValue{typeof(abs)}}})
   @ StrideArrays C:\Users\Erik\.julia\packages\StrideArrays\v8RT3\src\broadcast.jl:183
 [7] top-level scope
   @ c:\Users\Erik\Documents\Test\test.jl:11

This doesn't change when I update LoopVectorization to the latest version while keeping StrideArrays at v0.1.19.
With StrideArrays v0.1.20, I then get the StackOverflowError as before.

@ranocha
Copy link
Member Author

ranocha commented Dec 29, 2022

So it was the change JuliaSIMD/LoopVectorization.jl@v0.12.128...v0.12.129
This introduced the additional check LoopVectorization.can_turbo before the turbo version. This check fails for foo in the minimal example. Thus, the fallback Base.Broadcast.materialize!(dest, bc) is called, resulting in the StackOverflowError. The old version of LoopVectorization.jl didn't check LoopVectorization.can_turbo and thus didn't use the Base fallback.

The MWE above works again if we set LoopVectorization.can_turbo(::typeof(foo), ::Val{2}) = true, e.g.,

julia> begin
       using StrideArrays
       foo(x, f) = f(x)
       StrideArrays.LoopVectorization.can_turbo(::typeof(foo), ::Val{2}) = true
       src1 = rand(10); dst = zero(src1);
       src1_ptr = PtrArray(src1); dst_ptr = PtrArray(dst);
       @. dst = foo(src1, abs)
       @. dst_ptr = foo(src1_ptr, abs)
       end
10-element PtrArray{Tuple{Int64}, (true,), Float64, 1, 1, 0, (1,), Tuple{StaticInt{8}}, Tuple{StaticInt{1}}}:
 0.21757498850961488
 0.08772365981125829
 0.9338368308540324
 0.47094465360632565
 0.6707104502974711
 0.20925363260701946
 0.628131138456404
 0.013161734404078751
 0.5928102226448883
 0.6575575585083866

@ranocha
Copy link
Member Author

ranocha commented Dec 29, 2022

@chriselrod What is the recommended way to get LoopVectorization.can_turbo(f, ::Val{N}) = true for something defined in a package? In this case, it is basically
https://github.com/SciML/DiffEqBase.jl/blob/c5154b8be87531a345591db90d3a76b0e8c4d738/src/calculate_residuals.jl#L94-L99
and its related functions. Note that DiffEqBase.jl does not depend on StrideArrays.jl or LoopVectorization.jl directly but uses @.. from FastBroadcast.jl with multi-threading from Polyester.jl.

@efaulhaber
Copy link

Does that mean that we will always run into a stack overflow whenever we do something like this without setting can_turbo to true first? That seems unintended.

@ranocha
Copy link
Member Author

ranocha commented Dec 29, 2022

Does that mean that we will always run into a stack overflow whenever we do something like this without setting can_turbo to true first? That seems unintended.

Yes

@chriselrod
Copy link
Member

chriselrod commented Dec 30, 2022

Sorry for dragging my feet on this. I'll take a look in a couple hours.

chriselrod added a commit that referenced this issue Dec 30, 2022
@chriselrod
Copy link
Member

chriselrod commented Dec 30, 2022

I've implemented a hotfix in #62 by disabling the change (can_turbo) that caused the regression in the first place.

I do think can_turbo is useful with the way LV currently works, and would be nice to get working again if I want to try and get more use out of StrideArrays -- which I do, because 1.9 brings us both optional dependencies and substantial reductions in compile times for LV dependent packages.

If someone wants to re-enable or improve can_turbo for StrideArrays broadcasts, I don't think it'd be that difficult.
It would mostly involve a fair bit of plumbing.

See the relevant code here:
https://github.com/JuliaSIMD/LoopVectorization.jl/blob/35f83103c12992ddd887cd709bf65e345db5ec9e/src/condense_loopset.jl#L924-L965
can_turbo tries to guess if a function supports @turbo via checking whether it returns Union{}.

julia> using LoopVectorization, VectorizationBase

julia> Base.promote_op(+, Vec{2,Int}, Vec{2,Int}) !== Union{}
true

julia> foo(x, f) = f(x)
foo (generic function with 1 method)

julia> Base.promote_op(foo, Vec{2,Int}, Vec{2,Int}) !== Union{}
false

julia> Base.promote_op(foo, Vec{2,Int}, typeof(abs)) !== Union{}
true

Obviously, calling foo(::Vec{2,Int}, ::Vec{2,Int}) isn't valid!

But that's a pretty naive guess, to put it mildly.

I think there's at least two approaches that would work.

  1. Preprocess code, defining anonymous functions that capture constants to essentially remove them. The reason this approach is difficult is because @turbo works just fine inside @generated functions currently, so we'd need something like RuntimeGeneratedFunctions to maintain that property. And we'd need to make sure LV doesn't lose information; it knows what a + b is, so if b happens to be a constant, we need to not forget that. I think this approach will get complicated, so I'd suggest instead:
  2. Define a can_turbo that takes types as arguments, and feed it actually plausible estimates of what the types are. I think this would be relatively simple to do; you have a
julia> ls.operations # ls is a LoopSet object
4-element Vector{LoopVectorization.Operation}:
 var"###temp###9###" = var"######arg###8######10###"[var"###n###1###"]
 var"######arg###11######13###" = 0
 destination = var"###func###6###"(var"###temp###9###", var"######arg###11######13###")
 dest[var"###n###1###"] = destination

julia> ls.operations[3]
destination = var"###func###6###"(var"###temp###9###", var"######arg###11######13###")

julia> ls.operations[3]. # tab to look at completion candidates
children          dependencies      elementbytes      identifier        instruction
mangledvariable   node_type         parents           reduced_children  reduced_deps
ref               rejectcurly       rejectinterleave  u₁unrolled        u₂unrolled
variable          vectorized
julia> ls.operations[3].parents
2-element Vector{LoopVectorization.Operation}:
 var"###temp###9###" = var"######arg###8######10###"[var"###n###1###"]
 var"######arg###11######13###" = 0

julia> ls.operations[3].parents[2]
var"######arg###11######13###" = 0

The three easiest ways to get a LoopSet object are

  1. using LoopVectorization.@turbo_debug instead of @turbo. This will return the LoopSet object that is updated with the types of all objects involved.
  2. LoopVectorization.loopset(q) on a loop expression q. This will be an untyped LoopSet; internally this is used for building a _turbo_! call which then assembles the LoopSet you get in 1..
  3. Define _a = Ref{Any}() in a REPL, dev LoopVectorization, and edit code somewhere to add Main._a[] = ls to store the LoopSet from that point in time. Beware the Revise probably wont trigger automatically, because _turbo_! and vmaterialize! are @generated functions that aren't going to be invalidated. You'll need to invalidate them manually, which you can do via adding/removing nonsense lines like 1 + 2 (which you can see are already in the codebase in these functions for convenience).

Anyway, you should be able to just iterate over the arguments to a function call, and then decide what to do. A simple approach would be:

  1. If the argument is a load, call vectype(loaded_from_array), where vectype's definition is something like vectype(::AbstractArray{T}) where {T} = Vec{2,T}.
  2. If the argument is an operation, just go with Vec{2,Int} as is currently done. Or define this get-type function recursively, and do promotion. But you can save getting fancy for when basics work.
  3. If it is a constant, use the type of the constant, i.e. pass typeof(sym).

"3." will be what fixes the problem here, as you can see above where foo(Vec{2,Int}, typeof(abs)) returns true.

chriselrod added a commit that referenced this issue Dec 30, 2022
@ranocha
Copy link
Member Author

ranocha commented Dec 30, 2022

Thanks a lot for the hotfix, @chriselrod! I can confirm that it fixes the original issue reported in #60 (comment)

@ranocha
Copy link
Member Author

ranocha commented Dec 30, 2022

Feel free to close this - or leave it open as a reference to the better fix described above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants