-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perform inference using optimizer-derived type information #56687
base: master
Are you sure you want to change the base?
Conversation
Your job failed. |
does this ensure that we don't end up inferring non-IPO-safe things in an IPO context? |
2842416
to
231b196
Compare
@nanosoldier |
Your job failed. |
231b196
to
1bf7837
Compare
@nanosoldier |
Your job failed. |
@nanosoldier |
1bf7837
to
b167eb1
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
b167eb1
to
c553fe4
Compare
@nanosoldier |
In the current implementation of the optimizer, non-IPO-safe optimizations are not performed. So I understand that performing re-inference on such IPO-safe |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
@nanosoldier |
Performance issues related to code containing closures are frequently discussed (e.g., #15276, #56561). Several approaches can be considered to address this problem, one of which involves re-inferring code containing closures (#56687). To implement this idea, it is necessary to determine whether a given piece of code includes a closure. However, there is currently no explicit mechanism for making this determination (although there are some code that checks whether the function name contains `"#"` for this purpose, but this is an ad hoc solution). To address this, this commit lays the foundation for future optimizations targeting closures by defining closure functions as a subtype of the new type `Core.Closure <: Function`. This change allows the optimizer to apply targeted optimizations to code containing calls to functions that are subtype of `Core.Closure`.
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
I have confirmed that there are no test failures or performance regressions. This PR is ready to be merged. |
ccall(:jl_fill_codeinst, Cvoid, ( | ||
Any, Any, Any, Any, Int32, UInt, UInt, | ||
UInt32, Any, Any, Any), | ||
ci, rettype, exctype, rettype_const, const_flags, min_world, max_world, | ||
encode_effects(result.ipo_effects), result.analysis_results, di, edges) | ||
if is_cached(me) | ||
cached_result = cache_result!(me.interp, result, ci) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main question for review is whether this PR makes codegen unsound. Right here, we publish the ABI that will be used by this CodeInstance, for use by the optimizer in creating invoke edges. Any future changes to that info can result in codegen becoming unsound (as well as thread-unsafe).
ccall(:jl_update_codeinst, Cvoid, ( | ||
Any, Any, Any, Any, Any, Int32, UInt, UInt, | ||
UInt32, Any, UInt8, Any, Any), | ||
ci, inferred_result, rettype, exctype, rettype_const, const_flags, min_world, max_world, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing any non-atomic fields here (after cache_result
) causes codegen to become unsound. I am uncertain how we would preserve the expectation that the optimizer only uses CodeInstances found in the cache, while also delaying publishing them into the cache until after this point in the code. I don't believe it to be impossible to change that (since we are holding the "engine lock" on this until after the optimizer finishes, so we can just predict which CodeInstance will get added to the cache and which are only for local use), but just tricky to figure out how to best correctly adapt the cache to this PR.
@vtjnash Thank you for the review. If I understand correctly, this PR has two main issues:
Regarding (1), it seems that only updating the As for (2), I don’t understand why the optimizer must rely on a stabilized cache. Is the concern that optimization itself might no longer guarantee termination? Even so, it might be possible to resolve the issue by preventing re-inference from performing new inferences and restricting it to use only the results available in the global cache. |
For (2), the main issue is just that we want the edges to normally come from the global cache, so that we don't accidentally make multiple copies of it and greatly slow down the whole system by accident. The main way we avoid this is with locks (e.g. the engine_lock) so that it can wait for in-progress work to finish at the necessary time, and a local cache to store any in-progress work. So for both issues, we mainly just need to move the |
Performance issues related to code containing closures are frequently discussed (e.g., #15276, #56561). Several approaches can be considered to address this problem, one of which involves re-inferring code containing closures (#56687). To implement this idea, it is necessary to determine whether a given piece of code includes a closure. However, there is currently no explicit mechanism for making this determination (although there are some code that checks whether the function name contains `"#"` for this purpose, but this is an ad hoc solution). To address this, this commit lays the foundation for future optimizations targeting closures by defining closure functions as a subtype of the new type `Core.Closure <: Function`. This change allows the optimizer to apply targeted optimizations to code containing calls to functions that are subtype of `Core.Closure`.
In certain cases, the optimizer can introduce new type information. This is particularly evident in SROA, where load forwarding can reveal type information that was not visible during abstract interpretation. In such cases, re-running abstract interpretation using this new type information can be highly valuable, however, currently, this only occurs when semi-concrete interpretation happens to be triggered. This commit introduces a new "post-optimization inference" phase at the end of the optimizer pipeline. When the optimizer derives new type information, this phase performs IR abstract interpretation to further optimize the IR.
Interestingly, this doesn't seem to be able to catch cases where the boxed variable is the function itself (i.e. #53295) julia> function outer()
function inner()
false && inner()
end
inner()
end;
julia> @btime outer()
14.758 ns (2 allocations: 32 bytes)
false
julia> @code_warntype outer()
MethodInstance for outer()
from outer() @ Main REPL[13]:1
Arguments
#self#::Core.Const(Main.outer)
Locals
inner@_2::Core.Box
inner@_3::Union{}
inner@_4::Union{}
Body::Any
1 ─ (inner@_2 = Core.Box())
│ %2 = Main.:(var"#inner#8")::Core.Const(var"#inner#8")
│ %3 = inner@_2::Core.Box
│ %4 = %new(%2, %3)::var"#inner#8"
│ %5 = inner@_2::Core.Box
│ Core.setfield!(%5, :contents, %4)
│ %7 = inner@_2::Core.Box
│ %8 = Core.isdefined(%7, :contents)::Bool
└── goto #3 if not %8
2 ─ goto #4
3 ─ Core.NewvarNode(:(inner@_4))
└── inner@_4
4 ┄ %13 = inner@_2::Core.Box
│ %14 = Core.getfield(%13, :contents)::Any
│ %15 = (%14)()::Any
└── return %15 even though this seems like it should be closely related to equivalent things with non-callables being boxed. |
In certain cases, the optimizer can introduce new type information. This is particularly evident in SROA, where load forwarding can reveal type information that was not visible during abstract interpretation. In such cases, re-running abstract interpretation using this new type information can be highly valuable, however, currently, this only occurs when semi-concrete interpretation happens to be triggered.
This commit introduces a new "post-optimization inference" phase at the end of the optimizer pipeline. When the optimizer derives new type information, this phase performs IR abstract interpretation to further optimize the IR.
Such cases are especially common in patterns involving captured
variables, as discussed in #15276. By combining the
"post-optimization inference" implemented in this commit with EA-based
load forwarding, it is anticipated that the issue described in
#15276 can largely be resolved.
@nanosoldier
runbenchmarks("inference", vs=":master")