Uninline trig functions. #24117

pkofod · 2017-10-13T08:07:23Z

The resulting code for functions that calls sin or cos is increased substantially when inlined (think f(x)=sin(x)+cos(x)). I didn't do it for tan, atan, atan2 either, so might just have been the remains of some trial-and-error learning about performance sensitive code on my part.

Would appreciate if someone ran nanosoldier here.

fredrikekre · 2017-10-13T08:11:06Z

@nanosoldier runbenchmarks(ALL, vs=":master")

KristofferC · 2017-10-13T08:14:08Z

Even if this is a slight performance decrease in microbenchmarks, I think we should do this.

pkofod · 2017-10-13T09:05:46Z

Even if this is a slight performance decrease in microbenchmarks, I think we should do this.

Don't disagree, just wanted to see the actual performance impact here. I'm not entirely sure how

b=@benchmarkable sin($x)
tune!(b)
run(b)

is supposed to be / not be affected by inlining at the sin level, but... we'll see!

nanosoldier · 2017-10-13T12:46:57Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

pkofod · 2017-10-13T17:17:52Z

Okay, so some of this is certainly noise, but I do notice that there are 10 sincos regressions in there... I may run these locally as well to see if I can replicate.

JeffBezanson · 2017-10-13T18:55:28Z

Is nanosoldier back? What about JuliaIO/JLD.jl#196 ?

ararslan · 2017-10-13T18:56:27Z

I have absolutely no idea why this worked and other PRs didn't (and continue not to).

PallHaraldsson · 2017-10-13T20:08:34Z

@pkofod since you mentioned: f(x)=sin(x)+cos(x)

In general I think your right about not inlining (called outlining?), but for that case I see in my 0.5 (not everyone will know about the alternative sincos function or how to call it..):

[I don't know too much about this, but since this is an important function can't we do better for f?]

julia> @code_llvm f(1.0)

define double @julia_f_72800(double) #0 {
top:
  %1 = call double @julia_sin_72801(double %0) #0
  %2 = call double @julia_cos_72803(double %0) #0
  %3 = fadd double %1, %2
  ret double %3

I see you dropped inline from sincos and then I also see (could the above be compiled using this or similar?):

@inline function sincos_fast(v::Float64)
     s = Ref{Cdouble}()
     c = Ref{Cdouble}()
     ccall((:sincos, libm), Void, (Cdouble, Ptr{Cdouble}, Ptr{Cdouble}), v, s, c)
     return (s[], c[])
end

I'm not sure how I can call these, but can Julia be clever (without macros) and recognize that function (and similar ones, for e.g. power of two) and call such one sincos function?

function sincos(x::T) where T<:Union{Float32, Float64}
..
    # ... and use the same selection scheme as above: (sin, cos, -sin, -cos) for
    # for sin and (cos, -sin, -cos, sin) for cos

[redundant "for there in comment?]

@inline sincos_kernel(y::Union{Float32, Float64, DoubleFloat32, DoubleFloat64}) = (sin_kernel(y), cos_kernel(y))

pkofod · 2017-10-14T18:21:47Z

In general I think your right about not inlining (called outlining?), but for that case I see in my 0.5 (not everyone will know about the alternative sincos function or how to call it..):

sincos doesn't return sin(x)+cos(x), it returns sin(x), cos(x), sorry if it was a bad example. What I meant was that f(x)=sin(x) (or sin(x)+cos(x)) will the contain the full sin(x) body, instead of just a function call, and that makes the f definition take up as much space as the original sin which can be a problem.

KristofferC · 2017-10-14T18:31:27Z

The sincosthing must be slower now though since they are just two separate function calls (e.g. no optimization from inlining available). You could define:

@inline _sin(x) = ...
@noinline sin(x) = _sin(x)
@inline _cos(x) = ...
@noinline cos(x) = _cos(x)

sincos(x) = _sin(x), _cos(x)

pkofod · 2017-10-14T18:41:37Z

I don't think so, sincos's kernel function is still inlined in sincos? The regressions may or may not be real. It's 10 regressions out of 66 benchmarks. I'm working on something else right now, but I'll try to run the benchmarks locally.

KristofferC · 2017-10-14T18:44:40Z

Sorry, I mistook how sincoswas being computed.

pkofod · 2017-10-14T18:46:07Z

Sorry, I mistook how sincoswas being computed.

Yeah, that's why I'm surprised at the regressions, but I think there's a chance they might just be noise, though it is 10/66 benchmarks.

pkofod · 2017-10-15T14:58:04Z

While we're waiting for @nanosoldier I tried to run the sincos benchmarks locally. Seems like there's around 10% regression here on average. Some exceptions: x=zero(T) case is much slower (> 2x slowdown). I guess this is because the benchmarks somehow do something clever here? The other case is in the large x case (payne hanek). Here we see a 40% increase in run time by not inlining.

Below is a figure. On the left, it shows histograms of ratios of runtimes where the numerator is the noinline commit (this one) and the denominator is the commit just before it. Blue and red are just two different load, tune! and run instances. On the right you see the ratios where the first axis is simply the benchmarks ID'd by the order in which they were run. You can easily identify the 0 cases (Float32 and Float64 respectively) and the Payne Hanek branches.

simonbyrne · 2017-10-23T17:47:07Z

While we should probably try to be fast for x == 0, I don't think we need to worry too much about performance if it falls back to Payne-Hanek.

pkofod · 2017-10-23T18:46:03Z

While we should probably try to be fast for x == 0, I don't think we need to worry too much about performance if it falls back to Payne-Hanek.

Do you know why the x==0 case might be affected by inlining? Is it because it's a value that can be recognized in the @benchmarkable sin($T(0)) case which is then constant folded?

simonbyrne · 2017-10-23T18:53:45Z

Possibly? In that case there might not be much we can do about it.

pkofod · 2017-10-24T12:46:15Z

If someone (cc: @jrevels @ararslan ) who knows more about the benchmarking code can verify (or teach me how to verify) if nanosoldier constant folds the 0 case, we should probably just merge this. Maybe even without knowing if it's the constant folding or not, see #24286 for an issue where this is very relevant!

KristofferC · 2017-10-24T12:57:42Z

I don't think this is constant folding. If the function returns immediately after checking if the input argument is zero, then the cost of having to do the call will be large compared to if the function body is inlined. So it is not surprising that preventing inlining of this function will give regression for the special cases.

I think this is a non-issue though, and that this should just be merged.

pkofod · 2017-10-24T13:00:22Z

I don't think this is constant folding. If the function returns immediately after checking if the input argument is zero, then the cost of having to do the call will be large compared to if the function body is inlined. So it is not surprising that preventing inlining of this function will give regression for the special cases.

I think this is a non-issue though, and that this should just be merged.

Yes of course, the work to call time ratio will be very very low in this case (<0.5 apparently).

Uninline trig functions.

225ba14

kshyatt added the maths Mathematical functions label Oct 16, 2017

pkofod mentioned this pull request Oct 24, 2017

Exceedingly long compilation time of a function involving StaticArrays #24286

Closed

vtjnash approved these changes Oct 24, 2017

View reviewed changes

KristofferC merged commit 9f5ec71 into JuliaLang:master Oct 24, 2017

pkofod deleted the noinline branch October 24, 2017 13:03

pkofod mentioned this pull request Sep 15, 2020

A faster version of exp, exp2, exp10 for Floating point numbers #36761

Merged

musm mentioned this pull request Dec 6, 2020

inline exp functions #38726

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uninline trig functions. #24117

Uninline trig functions. #24117

pkofod commented Oct 13, 2017 •

edited

Loading

fredrikekre commented Oct 13, 2017

KristofferC commented Oct 13, 2017

pkofod commented Oct 13, 2017 •

edited

Loading

nanosoldier commented Oct 13, 2017

pkofod commented Oct 13, 2017

JeffBezanson commented Oct 13, 2017

ararslan commented Oct 13, 2017 •

edited

Loading

PallHaraldsson commented Oct 13, 2017 •

edited

Loading

pkofod commented Oct 14, 2017

KristofferC commented Oct 14, 2017 •

edited

Loading

pkofod commented Oct 14, 2017

KristofferC commented Oct 14, 2017

pkofod commented Oct 14, 2017

pkofod commented Oct 15, 2017

simonbyrne commented Oct 23, 2017

pkofod commented Oct 23, 2017

simonbyrne commented Oct 23, 2017 •

edited

Loading

pkofod commented Oct 24, 2017

KristofferC commented Oct 24, 2017

pkofod commented Oct 24, 2017

Uninline trig functions. #24117

Uninline trig functions. #24117

Conversation

pkofod commented Oct 13, 2017 • edited Loading

fredrikekre commented Oct 13, 2017

KristofferC commented Oct 13, 2017

pkofod commented Oct 13, 2017 • edited Loading

nanosoldier commented Oct 13, 2017

pkofod commented Oct 13, 2017

JeffBezanson commented Oct 13, 2017

ararslan commented Oct 13, 2017 • edited Loading

PallHaraldsson commented Oct 13, 2017 • edited Loading

pkofod commented Oct 14, 2017

KristofferC commented Oct 14, 2017 • edited Loading

pkofod commented Oct 14, 2017

KristofferC commented Oct 14, 2017

pkofod commented Oct 14, 2017

pkofod commented Oct 15, 2017

simonbyrne commented Oct 23, 2017

pkofod commented Oct 23, 2017

simonbyrne commented Oct 23, 2017 • edited Loading

pkofod commented Oct 24, 2017

KristofferC commented Oct 24, 2017

pkofod commented Oct 24, 2017

pkofod commented Oct 13, 2017 •

edited

Loading

pkofod commented Oct 13, 2017 •

edited

Loading

ararslan commented Oct 13, 2017 •

edited

Loading

PallHaraldsson commented Oct 13, 2017 •

edited

Loading

KristofferC commented Oct 14, 2017 •

edited

Loading

simonbyrne commented Oct 23, 2017 •

edited

Loading