Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perform inference using optimizer-derived type information #56687

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aviatesk
Copy link
Member

In certain cases, the optimizer can introduce new type information. This is particularly evident in SROA, where load forwarding can reveal type information that was not visible during abstract interpretation. In such cases, re-running abstract interpretation using this new type information can be highly valuable, however, currently, this only occurs when semi-concrete interpretation happens to be triggered.

This commit introduces a new "post-optimization inference" phase at the end of the optimizer pipeline. When the optimizer derives new type information, this phase performs IR abstract interpretation to further optimize the IR.

Such cases are especially common in patterns involving captured
variables, as discussed in #15276. By combining the
"post-optimization inference" implemented in this commit with EA-based
load forwarding, it is anticipated that the issue described in
#15276 can largely be resolved.


@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your job failed.

@oscardssmith
Copy link
Member

does this ensure that we don't end up inferring non-IPO-safe things in an IPO context?

Base automatically changed from avi/fix-cfg_simplify! to master November 27, 2024 05:45
@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your job failed.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your job failed.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks(["inference", "allinference", "Base.init_stdio(::Ptr{Cvoid})"], vs=":master")

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks(!"scalar", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

@nanosoldier runbenchmarks(!"scalar", vs=":master")

@aviatesk
Copy link
Member Author

does this ensure that we don't end up inferring non-IPO-safe things in an IPO context?

In the current implementation of the optimizer, non-IPO-safe optimizations are not performed. So I understand that performing re-inference on such IPO-safe IRCode does not lead to IPO-validity issues.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

aviatesk commented Jan 1, 2025

@nanosoldier runbenchmarks("collection" || "simd" || "sort", vs=":master")

aviatesk added a commit that referenced this pull request Jan 1, 2025
Performance issues related to code containing closures are frequently
discussed (e.g., #15276, #56561). Several
approaches can be considered to address this problem, one of which
involves re-inferring code containing closures (#56687).
To implement this idea, it is necessary to determine whether a given
piece of code includes a closure. However, there is currently no
explicit mechanism for making this determination (although there are
some code that checks whether the function name contains `"#"` for this
purpose, but this is an ad hoc solution).

To address this, this commit lays the foundation for future
optimizations targeting closures by defining closure functions as a
subtype of the new type `Core.Closure <: Function`. This change allows
the optimizer to apply targeted optimizations to code containing calls
to functions that are subtype of `Core.Closure`.
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk
Copy link
Member Author

aviatesk commented Jan 2, 2025

I have confirmed that there are no test failures or performance regressions. This PR is ready to be merged.

ccall(:jl_fill_codeinst, Cvoid, (
Any, Any, Any, Any, Int32, UInt, UInt,
UInt32, Any, Any, Any),
ci, rettype, exctype, rettype_const, const_flags, min_world, max_world,
encode_effects(result.ipo_effects), result.analysis_results, di, edges)
if is_cached(me)
cached_result = cache_result!(me.interp, result, ci)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main question for review is whether this PR makes codegen unsound. Right here, we publish the ABI that will be used by this CodeInstance, for use by the optimizer in creating invoke edges. Any future changes to that info can result in codegen becoming unsound (as well as thread-unsafe).

ccall(:jl_update_codeinst, Cvoid, (
Any, Any, Any, Any, Any, Int32, UInt, UInt,
UInt32, Any, UInt8, Any, Any),
ci, inferred_result, rettype, exctype, rettype_const, const_flags, min_world, max_world,
Copy link
Member

@vtjnash vtjnash Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing any non-atomic fields here (after cache_result) causes codegen to become unsound. I am uncertain how we would preserve the expectation that the optimizer only uses CodeInstances found in the cache, while also delaying publishing them into the cache until after this point in the code. I don't believe it to be impossible to change that (since we are holding the "engine lock" on this until after the optimizer finishes, so we can just predict which CodeInstance will get added to the cache and which are only for local use), but just tricky to figure out how to best correctly adapt the cache to this PR.

@aviatesk
Copy link
Member Author

aviatesk commented Jan 3, 2025

@vtjnash Thank you for the review.

If I understand correctly, this PR has two main issues:

  1. Updating rettype or other (non-atomic) information defining the CodeInstance's ABI after cache_result!.
  2. Performing re-inference during optimization breaks the assumption that the optimizer operates with the global cache stabilized

Regarding (1), it seems that only updating the inferred field may be the viable solution? (Honestly, even if the ABI changes between abstract interpretation and the optimizer, I don’t think it would cause significant issues in most cases, but I understand your concern that introducing a fundamental error could create future risks.)

As for (2), I don’t understand why the optimizer must rely on a stabilized cache. Is the concern that optimization itself might no longer guarantee termination? Even so, it might be possible to resolve the issue by preventing re-inference from performing new inferences and restricting it to use only the results available in the global cache.

@vtjnash
Copy link
Member

vtjnash commented Jan 3, 2025

For (2), the main issue is just that we want the edges to normally come from the global cache, so that we don't accidentally make multiple copies of it and greatly slow down the whole system by accident. The main way we avoid this is with locks (e.g. the engine_lock) so that it can wait for in-progress work to finish at the necessary time, and a local cache to store any in-progress work.

So for both issues, we mainly just need to move the cache_result call until after optimization finishes, and then provide an additional intermediate "future global cache" cache that the optimizer can examine

aviatesk added a commit that referenced this pull request Jan 6, 2025
Performance issues related to code containing closures are frequently
discussed (e.g., #15276, #56561). Several
approaches can be considered to address this problem, one of which
involves re-inferring code containing closures (#56687).
To implement this idea, it is necessary to determine whether a given
piece of code includes a closure. However, there is currently no
explicit mechanism for making this determination (although there are
some code that checks whether the function name contains `"#"` for this
purpose, but this is an ad hoc solution).

To address this, this commit lays the foundation for future
optimizations targeting closures by defining closure functions as a
subtype of the new type `Core.Closure <: Function`. This change allows
the optimizer to apply targeted optimizations to code containing calls
to functions that are subtype of `Core.Closure`.
In certain cases, the optimizer can introduce new type information.
This is particularly evident in SROA, where load forwarding can reveal
type information that was not visible during abstract interpretation.
In such cases, re-running abstract interpretation using this new type
information can be highly valuable, however, currently, this only occurs
when semi-concrete interpretation happens to be triggered.

This commit introduces a new "post-optimization inference" phase at the
end of the optimizer pipeline. When the optimizer derives new type
information, this phase performs IR abstract interpretation to further
optimize the IR.
@MasonProtter
Copy link
Contributor

Interestingly, this doesn't seem to be able to catch cases where the boxed variable is the function itself (i.e. #53295)

julia> function outer()
           function inner()
               false && inner()
           end
           inner()
       end;

julia> @btime outer()
  14.758 ns (2 allocations: 32 bytes)
false

julia> @code_warntype outer()
MethodInstance for outer()
  from outer() @ Main REPL[13]:1
Arguments
  #self#::Core.Const(Main.outer)
Locals
  inner@_2::Core.Box
  inner@_3::Union{}
  inner@_4::Union{}
Body::Any
1 ─       (inner@_2 = Core.Box())
│   %2  = Main.:(var"#inner#8")::Core.Const(var"#inner#8")
│   %3  = inner@_2::Core.Box%4  = %new(%2, %3)::var"#inner#8"%5  = inner@_2::Core.Box
│         Core.setfield!(%5, :contents, %4)
│   %7  = inner@_2::Core.Box%8  = Core.isdefined(%7, :contents)::Bool
└──       goto #3 if not %8
2 ─       goto #4
3 ─       Core.NewvarNode(:(inner@_4))
└──       inner@_4
4%13 = inner@_2::Core.Box%14 = Core.getfield(%13, :contents)::Any%15 = (%14)()::Any
└──       return %15

even though this seems like it should be closely related to equivalent things with non-callables being boxed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants