Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninline trig functions. #24117

Merged
merged 1 commit into from
Oct 24, 2017
Merged

Uninline trig functions. #24117

merged 1 commit into from
Oct 24, 2017

Conversation

pkofod
Copy link
Contributor

@pkofod pkofod commented Oct 13, 2017

The resulting code for functions that calls sin or cos is increased substantially when inlined (think f(x)=sin(x)+cos(x)). I didn't do it for tan, atan, atan2 either, so might just have been the remains of some trial-and-error learning about performance sensitive code on my part.

Would appreciate if someone ran nanosoldier here.

@fredrikekre
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@KristofferC
Copy link
Member

Even if this is a slight performance decrease in microbenchmarks, I think we should do this.

@pkofod
Copy link
Contributor Author

pkofod commented Oct 13, 2017

Even if this is a slight performance decrease in microbenchmarks, I think we should do this.

Don't disagree, just wanted to see the actual performance impact here. I'm not entirely sure how

b=@benchmarkable sin($x)
tune!(b)
run(b)

is supposed to be / not be affected by inlining at the sin level, but... we'll see!

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@pkofod
Copy link
Contributor Author

pkofod commented Oct 13, 2017

Okay, so some of this is certainly noise, but I do notice that there are 10 sincos regressions in there... I may run these locally as well to see if I can replicate.

@JeffBezanson
Copy link
Member

Is nanosoldier back? What about JuliaIO/JLD.jl#196 ?

@ararslan
Copy link
Member

ararslan commented Oct 13, 2017

I have absolutely no idea why this worked and other PRs didn't (and continue not to).

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Oct 13, 2017

@pkofod since you mentioned: f(x)=sin(x)+cos(x)

In general I think your right about not inlining (called outlining?), but for that case I see in my 0.5 (not everyone will know about the alternative sincos function or how to call it..):

[I don't know too much about this, but since this is an important function can't we do better for f?]

julia> @code_llvm f(1.0)

define double @julia_f_72800(double) #0 {
top:
  %1 = call double @julia_sin_72801(double %0) #0
  %2 = call double @julia_cos_72803(double %0) #0
  %3 = fadd double %1, %2
  ret double %3

I see you dropped inline from sincos and then I also see (could the above be compiled using this or similar?):

@inline function sincos_fast(v::Float64)
     s = Ref{Cdouble}()
     c = Ref{Cdouble}()
     ccall((:sincos, libm), Void, (Cdouble, Ptr{Cdouble}, Ptr{Cdouble}), v, s, c)
     return (s[], c[])
end

I'm not sure how I can call these, but can Julia be clever (without macros) and recognize that function (and similar ones, for e.g. power of two) and call such one sincos function?

function sincos(x::T) where T<:Union{Float32, Float64}
..
    # ... and use the same selection scheme as above: (sin, cos, -sin, -cos) for
    # for sin and (cos, -sin, -cos, sin) for cos

[redundant "for there in comment?]

@inline sincos_kernel(y::Union{Float32, Float64, DoubleFloat32, DoubleFloat64}) = (sin_kernel(y), cos_kernel(y))

@pkofod
Copy link
Contributor Author

pkofod commented Oct 14, 2017

In general I think your right about not inlining (called outlining?), but for that case I see in my 0.5 (not everyone will know about the alternative sincos function or how to call it..):

sincos doesn't return sin(x)+cos(x), it returns sin(x), cos(x), sorry if it was a bad example. What I meant was that f(x)=sin(x) (or sin(x)+cos(x)) will the contain the full sin(x) body, instead of just a function call, and that makes the f definition take up as much space as the original sin which can be a problem.

@KristofferC
Copy link
Member

KristofferC commented Oct 14, 2017

The sincosthing must be slower now though since they are just two separate function calls (e.g. no optimization from inlining available). You could define:

@inline _sin(x) = ...
@noinline sin(x) = _sin(x)
@inline _cos(x) = ...
@noinline cos(x) = _cos(x)

sincos(x) = _sin(x), _cos(x)

@pkofod
Copy link
Contributor Author

pkofod commented Oct 14, 2017

I don't think so, sincos's kernel function is still inlined in sincos? The regressions may or may not be real. It's 10 regressions out of 66 benchmarks. I'm working on something else right now, but I'll try to run the benchmarks locally.

@KristofferC
Copy link
Member

Sorry, I mistook how sincoswas being computed.

@pkofod
Copy link
Contributor Author

pkofod commented Oct 14, 2017

Sorry, I mistook how sincoswas being computed.

Yeah, that's why I'm surprised at the regressions, but I think there's a chance they might just be noise, though it is 10/66 benchmarks.

@pkofod
Copy link
Contributor Author

pkofod commented Oct 15, 2017

While we're waiting for @nanosoldier I tried to run the sincos benchmarks locally. Seems like there's around 10% regression here on average. Some exceptions: x=zero(T) case is much slower (> 2x slowdown). I guess this is because the benchmarks somehow do something clever here? The other case is in the large x case (payne hanek). Here we see a 40% increase in run time by not inlining.

Below is a figure. On the left, it shows histograms of ratios of runtimes where the numerator is the noinline commit (this one) and the denominator is the commit just before it. Blue and red are just two different load, tune! and run instances. On the right you see the ratios where the first axis is simply the benchmarks ID'd by the order in which they were run. You can easily identify the 0 cases (Float32 and Float64 respectively) and the Payne Hanek branches.

slightly

@kshyatt kshyatt added the maths Mathematical functions label Oct 16, 2017
@simonbyrne
Copy link
Contributor

While we should probably try to be fast for x == 0, I don't think we need to worry too much about performance if it falls back to Payne-Hanek.

@pkofod
Copy link
Contributor Author

pkofod commented Oct 23, 2017

While we should probably try to be fast for x == 0, I don't think we need to worry too much about performance if it falls back to Payne-Hanek.

Do you know why the x==0 case might be affected by inlining? Is it because it's a value that can be recognized in the @benchmarkable sin($T(0)) case which is then constant folded?

@simonbyrne
Copy link
Contributor

simonbyrne commented Oct 23, 2017

Possibly? In that case there might not be much we can do about it.

@pkofod
Copy link
Contributor Author

pkofod commented Oct 24, 2017

If someone (cc: @jrevels @ararslan ) who knows more about the benchmarking code can verify (or teach me how to verify) if nanosoldier constant folds the 0 case, we should probably just merge this. Maybe even without knowing if it's the constant folding or not, see #24286 for an issue where this is very relevant!

@KristofferC
Copy link
Member

I don't think this is constant folding. If the function returns immediately after checking if the input argument is zero, then the cost of having to do the call will be large compared to if the function body is inlined. So it is not surprising that preventing inlining of this function will give regression for the special cases.

I think this is a non-issue though, and that this should just be merged.

@pkofod
Copy link
Contributor Author

pkofod commented Oct 24, 2017

I don't think this is constant folding. If the function returns immediately after checking if the input argument is zero, then the cost of having to do the call will be large compared to if the function body is inlined. So it is not surprising that preventing inlining of this function will give regression for the special cases.

I think this is a non-issue though, and that this should just be merged.

Yes of course, the work to call time ratio will be very very low in this case (<0.5 apparently).

@KristofferC KristofferC merged commit 9f5ec71 into JuliaLang:master Oct 24, 2017
@pkofod pkofod deleted the noinline branch October 24, 2017 13:03
@musm musm mentioned this pull request Dec 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants