-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Optimize TLS access in generated code on Linux #17178
Conversation
asm("movl %%gs:0, %0" : "=r"(tp)); | ||
#elif defined(_CPU_AARCH64_) | ||
asm("mrs %0, tpidr_el0" : "=r"(tp)); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have an #else #error
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, I see now.
Cool!! |
asm_str = "movl %gs:0, $0"; | ||
# elif defined(_CPU_AARCH64_) | ||
asm_str = "mrs $0, tpidr_el0"; | ||
# endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment about what happens on other arches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we want is a description of what it does do on other architectures. Something like "for the 3 supported architectures above, load tls pointer from its known offset. on all others, fall back to calling jl_get_ptls_states" (I think that's how it works?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. (Although we resolve jl_get_ptls_states
to the real one in codegen.) I didn't think about this because this is documented in
Line 5136 in bf44142
// For threading, we emit a call to the getter function. |
9bc2714
to
133831b
Compare
Ready to merge? |
Should be, the RFC is mainly for how people feel like emitting assembly directly (in the case where LLVM can't do it) so should be ready to go as long as people don't have problem with that. There's (yet) another optimization I kind of got working last night to optimize the tls getter call in C too but I'll probably need some more experiment with that and it can go in another PR. |
* Detect if we are using a static TLS model * Emit the assembly directly in codegen to access static TLS variables
133831b
to
e2bd129
Compare
So assuming people are fine with inline assembly (and my next one close to being ready), I'll merge this later today. |
By emitting the assembly sequence directly when static TLS is detected.
With this, I can't measure any performance difference in JIT code with threading on or off anymore =)
Currently supports and tested on LLVM3.7+ on Linux x86/x64/aarch64. ARM should be possible too but the thread pointer seems more complicated to emit (we might be able to use compiler intrinsics). FreeBSD or other platforms that uses ELF format probably works too assuming it has
dl_iterate_phdr
but I can't really test. This uses a glibc extensiondl_iterate_phdr
but AFAIK it is provided by musl too (and is required by libunwind).This could make runtime JIT code harder to share but