More explicit TLS access and GC allocation optimization #17116
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The main goal of this PR is a small step toward more explicit TLS access in the C code and an experiment of passing TLS pointer around explicitly in the GC. Most of the explicit calls of
jl_get_ptls_states()
added in this PR (except GC ones and maybe some signal handling ones) do not cross function boundaries.As for passing the TLS pointer explicitly as function argument, it is faster or the same in most of the case. On Linux, I measured a few percent improvement in GC time (both by running a function with the same allocation pattern or calling GC directly). This improvement might be larger for OSX since, as we recently learned, it doesn't have a static TLS model =(.
However, there is one case where passing the TLS pointer generates slower code (it can increase the pool allocation time by ~15%). My current explanation is that the user (
jl_gc_pool_alloc
, previously__pool_alloc
) only calls this function in a slow branch in the middle of the fast path and keeping an unused value alive increases the register pressure in the fast path.....This also simplifies and further optimizes GC allocation functions, especially in C.
newobj
,newstruct
,jl_gc_allocobj
,jl_gc_alloc_*w
,allocb
are all merged intojl_gc_alloc
and the compile time pool address lookup optimization is generalized to all sizes. This also removes the double initialization of the tag when allocating memory.This is based on #16893 (which should be ready now) to reduce conflicts. See here for actual diff.