Same function, same arguments, different generated code #20913

KristofferC · 2017-03-06T15:36:24Z

Consider the following small fixedsize matrix implementation together with a reduce function and two implementations of sum:

struct FixedMatrix{R,C,T,RC} <: AbstractMatrix{T}
    data::NTuple{RC, T}
end

Base.size(::FixedMatrix{R,C}) where {R, C}  = (R,C)
Base.length(::FixedMatrix{R,C,T,RC}) where {R,C,T,RC} = RC
Base.IndexStyle(::Type{<: FixedMatrix}) = IndexLinear()
Base.getindex(fm::FixedMatrix, i::Int) = fm.data[i]

@inline function myreduce(op, v0, a::FixedMatrix)
    if length(a) == 0
        return v0
    else
        s = v0
        @inbounds @simd for j = 1:length(a)
            s = op(s, a[j])
        end
        return s
    end
end

sum2(a::FixedMatrix) = myreduce(+, zero(eltype(a)), a)
sum3(a::FixedMatrix) = myreduce(+, 0.0, a)

Now, sum2 and sum3 will (should?) clearly call the same method with the same arguments when the matrix is of type Float64 (which I confirmed by putting a @which in there).

However:

julia> using BenchmarkTools

julia> m = FixedMatrix{8, 8, Float64, 64}((rand(64)...,))

julia> @btime sum2($m)
  67.113 ns (0 allocations: 0 bytes)
36.441929014270464

julia> @btime sum3($m)
  8.379 ns (0 allocations: 0 bytes)
36.441929014270464

In fact, the generated code is vastly different (one uses SIMD, the other one does not).

julia> @code_llvm sum2(m)

julia> @code_llvm sum3(m)

How can so different code be generated when the exact same function is called with the same arguments?

The text was updated successfully, but these errors were encountered:

KristofferC · 2017-03-06T15:39:43Z

Also note that the built in sum time is slower than the one I wrote up there (although probably more robust against accumulating floating point errors):

julia> @btime sum($m)
  17.394 ns (0 allocations: 0 bytes)
36.441929014270464

yuyichao · 2017-03-06T15:42:09Z

What's m?

KristofferC · 2017-03-06T15:42:44Z

Sorry! m = FixedMatrix{8, 8, Float64, 64}((rand(64)...))

fredrikekre · 2017-03-06T15:43:49Z

julia> @btime sum2($m);
  9.603 ns (0 allocations: 0 bytes)

julia> @btime sum3($m);
  9.341 ns (0 allocations: 0 bytes)

EDIT: Above timings were for 5x5 matrix. For 8x8:

julia> @btime sum2($m);
  111.510 ns (0 allocations: 0 bytes)

julia> @btime sum3($m);
  9.231 ns (0 allocations: 0 bytes)

KristofferC · 2017-03-06T15:44:55Z

What version? Is the generated code the same for you?

KristofferC · 2017-03-06T15:45:54Z

My master was 11 days old... maybe it has been fixed already. Will look.

fredrikekre · 2017-03-06T15:46:23Z

I got the same generated code but for a 5x5 case. See my edit in the comment above.

KristofferC · 2017-03-06T15:46:56Z

Yes for 5x5 I think everything gets unrolled.

KristofferC · 2017-03-06T15:49:17Z

Same result on 0.5.

JeffBezanson · 2017-03-06T19:06:26Z

The only difference in IR seems to be a constant 0.0 vs. sitofp(0). Maybe the sitofp is enough to throw off the optimizer? I guess we should add some early constant folding to sitofp.

Keno · 2017-03-06T19:21:03Z

That would be either a serious bug in LLVM or a pass order problem. Either way we should find out.

KristofferC · 2017-04-18T14:36:26Z

Any ideas here? AFAIU this inihibits simd for the reductions in StaticArrays.

KristofferC · 2017-05-26T18:22:57Z

Still happens

KristofferC · 2017-08-02T12:03:34Z

This seems to be fixed on 0.7 🎉.

julia> @btime sum2($m)
  7.670 ns (0 allocations: 0 bytes)
35.36299421828035

julia> @btime sum3($m)
  7.603 ns (0 allocations: 0 bytes)
35.36299421828035

KristofferC mentioned this issue Mar 6, 2017

Code rewrite for Julia v0.6 JuliaArrays/StaticArrays.jl#113

Closed

26 tasks

ararslan added the compiler:codegen Generation of LLVM IR and native code label Mar 6, 2017

JeffBezanson added the performance Must go faster label Mar 6, 2017

KristofferC added the potential benchmark Could make a good benchmark in BaseBenchmarks label Aug 2, 2017

KristofferC closed this as completed Aug 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same function, same arguments, different generated code #20913

Same function, same arguments, different generated code #20913

KristofferC commented Mar 6, 2017 •

edited

Loading

KristofferC commented Mar 6, 2017 •

edited

Loading

yuyichao commented Mar 6, 2017

KristofferC commented Mar 6, 2017 •

edited

Loading

fredrikekre commented Mar 6, 2017 •

edited

Loading

KristofferC commented Mar 6, 2017

KristofferC commented Mar 6, 2017

fredrikekre commented Mar 6, 2017

KristofferC commented Mar 6, 2017

KristofferC commented Mar 6, 2017

JeffBezanson commented Mar 6, 2017

Keno commented Mar 6, 2017

KristofferC commented Apr 18, 2017 •

edited

Loading

KristofferC commented May 26, 2017

KristofferC commented Aug 2, 2017

Same function, same arguments, different generated code #20913

Same function, same arguments, different generated code #20913

Comments

KristofferC commented Mar 6, 2017 • edited Loading

KristofferC commented Mar 6, 2017 • edited Loading

yuyichao commented Mar 6, 2017

KristofferC commented Mar 6, 2017 • edited Loading

fredrikekre commented Mar 6, 2017 • edited Loading

KristofferC commented Mar 6, 2017

KristofferC commented Mar 6, 2017

fredrikekre commented Mar 6, 2017

KristofferC commented Mar 6, 2017

KristofferC commented Mar 6, 2017

JeffBezanson commented Mar 6, 2017

Keno commented Mar 6, 2017

KristofferC commented Apr 18, 2017 • edited Loading

KristofferC commented May 26, 2017

KristofferC commented Aug 2, 2017

KristofferC commented Mar 6, 2017 •

edited

Loading

KristofferC commented Mar 6, 2017 •

edited

Loading

KristofferC commented Mar 6, 2017 •

edited

Loading

fredrikekre commented Mar 6, 2017 •

edited

Loading

KristofferC commented Apr 18, 2017 •

edited

Loading