Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer: supports callsite annotations of inlining, fixes #18773 #41328

Merged
merged 11 commits into from
Sep 1, 2021

Conversation

aviatesk
Copy link
Sponsor Member

@aviatesk aviatesk commented Jun 23, 2021

Enable @inline/@noinline annotations on function callsites.
From #40754.

Now @inline and @noinline can be applied to a code block and then
the compiler will try to (or try not to) inline calls within the block:

@inline f(...) # The compiler will try to inline `f`

@inline f(...) + g(...) # The compiler will try to inline `f`, `g` and `+`

@inline f(args...) = ... # Of course annotations on a definition is still allowed

Here are couple of notes on how those callsite annotations will work:

  • callsite annotation always has the precedence over the annotation
    applied to the definition of the called method, whichever we use
    @inline/@noinline:
    @inline function explicit_inline(args...)
        # body
    end
    
    let
        @noinline explicit_inline(args...) # this call will not be inlined
    end
  • when callsite annotations are nested, the innermost annotations has
    the precedence
    @noinline let a0, b0 = ...
        a = @inline f(a0)  # the compiler will try to inline this call
        b = notinlined(b0) # the compiler will NOT try to inline this call
        return a, b
    end

They're both tested and included in documentations.

Co-authored-by: Joseph Tan jdtan638@gmail.com


Comment on lines 39 to 47
# when the source isn't available at this moment, try to re-infer and inline it
# NOTE we can make inference try to keep the source if the call is going to be inlined,
# but then inlining will depend on local state of inference and so the first entry
# and the succeeding ones may generate different code; rather we always re-infer
# the source to avoid the problem while it's obviously not most efficient
# HACK disable inlining for the re-inference to avoid cycles by making sure the following inference never comes here again
interp = NativeInterpreter(get_world_counter(interp); opt_params = OptimizationParams(; inlining = false))
src, rt = typeinf_code(interp, match.method, match.spec_types, match.sparams, true)
return src
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the approach to use uncached source when the callsite is forced to be inlined (a1b1435), but it turns out that then inlining will depend o local state of inference and so the first entry and the succeeding ones may generate different code, e.g. from a test case I added:

function multilimited(a)
    if @noinline(isType(a))
        return @inline(multilimited(a.parameters[1]))
    else
        return rand(Bool) ? rand(a) : @inline(multilimited(a))
    end
end

# output might be different if we depend on inference local state
code_typed(m.multilimited, (Any,))
code_typed(m.multilimited, (Any,))

We might be able to keep source so that it doesn't depend on the local state, but it could be tricky.
Rather I'd like to always just re-infer when a source is unavailable, at least in this first iteration.

/cc @vtjnash

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to happen during inference. It seems quite wrong to make a new NativeInterpreter here, and can probably have many other problems. We used to do some amount of recursion here, and we have gotten rid of it.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about fea41f2 ? We still need fresh optimization w/o inlining hack because those uncached sources are not optimized.

Well, now I'm wondering if it's really useful. I benchmarked some functions that hit this pass, and I couldn't find any cases where this "single-level-inlining" can help the performance.. Honestly I don't think it's worth the complexity that comes with this support.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tisztamo
Copy link
Contributor

Does callsite @inline force inlining completely, ignoring the cost model? (Also close #40292?) (re-commenting after accidental delete)

@aviatesk
Copy link
Sponsor Member Author

aviatesk commented Jun 28, 2021

Does callsite @inline force inlining completely, ignoring the cost model?

Callsite inlining will happen in regardless of the cost model, but not "force inlining completely", since there are some cases when a method body itself is unavailable (think when there are recursive calls but annotated as @inline).

I would appreciate if you could test this PR for your target, though.

@tisztamo
Copy link
Contributor

Just tested and it works, so this also closes #40292. Thank you very much!

src/method.c Show resolved Hide resolved
src/method.c Outdated Show resolved Hide resolved
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Sep 1, 2021
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Sep 1, 2021
aviatesk added a commit to JuliaDebug/Cthulhu.jl that referenced this pull request Sep 1, 2021
@aviatesk

This comment has been minimized.

@KristofferC

This comment has been minimized.

@aviatesk

This comment has been minimized.

aviatesk added a commit that referenced this pull request Sep 1, 2021
After #41328, inference can observe statement flags and try to re-infer
a discarded source if it's going to be inlined.
The re-inferred source will only be cached into the inference-local
cache, and won't be cached globally.
aviatesk added a commit that referenced this pull request Sep 2, 2021
After #41328, inference can observe statement flags and try to re-infer
a discarded source if it's going to be inlined.
The re-inferred source will only be cached into the inference-local
cache, and won't be cached globally.
aviatesk added a commit that referenced this pull request Sep 3, 2021
After #41328, inference can observe statement flags and try to re-infer
a discarded source if it's going to be inlined.
The re-inferred source will only be cached into the inference-local
cache, and won't be cached globally.
aviatesk added a commit that referenced this pull request Sep 4, 2021
After #41328, inference can observe statement flags and try to re-infer
a discarded source if it's going to be inlined.
The re-inferred source will only be cached into the inference-local
cache, and won't be cached globally.
@aviatesk
Copy link
Sponsor Member Author

aviatesk commented Sep 4, 2021

#42082 should fix the "serious" flaw with this PR, and now callsite inlining should work more reliably.

@DilumAluthge DilumAluthge removed the status:merge me PR is reviewed. Merge when all tests are passing label Sep 6, 2021
aviatesk added a commit to JuliaLang/Compat.jl that referenced this pull request Sep 11, 2021
aviatesk added a commit to JuliaLang/Compat.jl that referenced this pull request Sep 11, 2021
aviatesk added a commit to JuliaLang/Compat.jl that referenced this pull request Sep 11, 2021
@chriselrod
Copy link
Contributor

@oscardssmith If you haven't tried this out yet:

julia> function vexp!(y,x)
           @inbounds for i  eachindex(y,x)
               y[i] = exp(x[i])
           end
       end
vexp! (generic function with 1 method)

julia> function vexp_inline!(y,x)
           @inbounds for i  eachindex(y,x)
               y[i] = @inline exp(x[i])
           end
       end
vexp_inline! (generic function with 1 method)

julia> x = randn(256); y = similar(x);

julia> @benchmark vexp!($y, $x)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min  max):  1.234 μs   2.336 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.329 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.334 μs ± 49.344 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                       █▄                                    ▁
  ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅██▁▇▃▁▆▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃▁▁▁▁▁▆▃▄▄▇▆▅▆ █
  1.23 μs      Histogram: log(frequency) by time     1.49 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark vexp_inline!($y, $x)
BenchmarkTools.Trial: 10000 samples with 540 evaluations.
 Range (min  max):  212.893 ns  316.361 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     213.315 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   214.200 ns ±   6.480 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▂▇█▇▃                   ▁▂▃▂▂▂▂                              ▂
  ▅█████▃▁▃▃▁▁▁▁▁▁▄▁▁▁▁▁▃▄▆████████▇██▇▇▇▆▅▄▃▅▄▅▃▁▄▄▃▄▄▄▄▄▅▃▄▄▅ █
  213 ns        Histogram: log(frequency) by time        220 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
…#18773 (JuliaLang#41328)

* optimizer: supports callsite annotations of inlining, fixes JuliaLang#18773

Enable `@inline`/`@noinline` annotations on function callsites.
From JuliaLang#40754.

Now `@inline` and `@noinline` can be applied to a code block and then
the compiler will try to (not) inline calls within the block:
```julia
@inline f(...) # The compiler will try to inline `f`

@inline f(...) + g(...) # The compiler will try to inline `f`, `g` and `+`

@inline f(args...) = ... # Of course annotations on a definition is still allowed
```

Here are couple of notes on how those callsite annotations will work:
- callsite annotation always has the precedence over the annotation
  applied to the definition of the called function, whichever we use
  `@inline`/`@noinline`:
  ```julia
  @inline function explicit_inline(args...)
      # body
  end

  let
      @noinline explicit_inline(args...) # this call will not be inlined
  end
  ```
- when callsite annotations are nested, the innermost annotations has
  the precedence
  ```julia
  @noinline let a0, b0 = ...
      a = @inline f(a0)  # the compiler will try to inline this call
      b = notinlined(b0) # the compiler will NOT try to inline this call
      return a, b
  end
  ```
They're both tested and included in documentations.

* set ssaflags on `CodeInfo` construction

* try to keep source if it will be force-inlined

* give inlining when source isn't available

* style nits

* Update base/compiler/ssair/inlining.jl

Co-authored-by: Jameson Nash <vtjnash@gmail.com>

* Update src/method.c

Co-authored-by: Jameson Nash <vtjnash@gmail.com>

* fixup

- remove preprocessed flags from `jl_code_info_set_ir`
- fix duplicated definition warning
- add and fix comments

* more clean up

* add caveat about the recursive call limitation

* update NEWS.md

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
…liaLang#42082)

After JuliaLang#41328, inference can observe statement flags and try to re-infer
a discarded source if it's going to be inlined.
The re-inferred source will only be cached into the inference-local
cache, and won't be cached globally.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
…#18773 (JuliaLang#41328)

* optimizer: supports callsite annotations of inlining, fixes JuliaLang#18773

Enable `@inline`/`@noinline` annotations on function callsites.
From JuliaLang#40754.

Now `@inline` and `@noinline` can be applied to a code block and then
the compiler will try to (not) inline calls within the block:
```julia
@inline f(...) # The compiler will try to inline `f`

@inline f(...) + g(...) # The compiler will try to inline `f`, `g` and `+`

@inline f(args...) = ... # Of course annotations on a definition is still allowed
```

Here are couple of notes on how those callsite annotations will work:
- callsite annotation always has the precedence over the annotation
  applied to the definition of the called function, whichever we use
  `@inline`/`@noinline`:
  ```julia
  @inline function explicit_inline(args...)
      # body
  end

  let
      @noinline explicit_inline(args...) # this call will not be inlined
  end
  ```
- when callsite annotations are nested, the innermost annotations has
  the precedence
  ```julia
  @noinline let a0, b0 = ...
      a = @inline f(a0)  # the compiler will try to inline this call
      b = notinlined(b0) # the compiler will NOT try to inline this call
      return a, b
  end
  ```
They're both tested and included in documentations.

* set ssaflags on `CodeInfo` construction

* try to keep source if it will be force-inlined

* give inlining when source isn't available

* style nits

* Update base/compiler/ssair/inlining.jl

Co-authored-by: Jameson Nash <vtjnash@gmail.com>

* Update src/method.c

Co-authored-by: Jameson Nash <vtjnash@gmail.com>

* fixup

- remove preprocessed flags from `jl_code_info_set_ir`
- fix duplicated definition warning
- add and fix comments

* more clean up

* add caveat about the recursive call limitation

* update NEWS.md

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
…liaLang#42082)

After JuliaLang#41328, inference can observe statement flags and try to re-infer
a discarded source if it's going to be inlined.
The re-inferred source will only be cached into the inference-local
cache, and won't be cached globally.
@JeffBezanson JeffBezanson removed the needs news A NEWS entry is required for this change label Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

callsite inlining
8 participants