Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Inline TLS access for linux/arm64 #97910

Merged
merged 24 commits into from
Feb 15, 2024

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Feb 3, 2024

We now generate the inlined TLS access during compilation

:	90000000 	adrp	x0, 0 <System_Console_System_ConsoleKeyInfo____GetFieldHelper>
			5a2f0: R_AARCH64_TLSDESC_ADR_PAGE21	tls_InlinedThreadStatics
   5a2f4:	91000000 	add	x0, x0, #0x0
			5a2f4: R_AARCH64_TLSDESC_ADD_LO12	tls_InlinedThreadStatics
   5a2f8:	d53bd041 	mrs	x1, tpidr_el0
   5a2fc:	f9400002 	ldr	x2, [x0]
			5a2fc: R_AARCH64_TLSDESC_LD64_LO12	tls_InlinedThreadStatics
   5a300:	d63f0040 	blr	x2
			5a300: R_AARCH64_TLSDESC_CALL	tls_InlinedThreadStatics
   5a304:	8b000020 	add	x0, x1, x0
   5a308:	f9400013 	ldr	x19, [x0]

which gets converted by the linker in following:

image

TODO

  • Sharedlibrary scenario fails for some reason. The objdump on the object file looks identical between before and after, however when I do objdump on the shared library i.e. .so file, I see a difference in bits for adrp which tells me that I am missing some relocation update, but cannot find out what. @MichalStrehovsky - any idea?
image
  • superpmi changes
  • Add IMAGE_REL_ARM64_TLSDESC* in ntimage.h?

@ghost ghost assigned kunalspathak Feb 3, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 3, 2024
@ghost
Copy link

ghost commented Feb 3, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

We now generate the inlined TLS access during compilation

:	90000000 	adrp	x0, 0 <System_Console_System_ConsoleKeyInfo____GetFieldHelper>
			5a2f0: R_AARCH64_TLSDESC_ADR_PAGE21	tls_InlinedThreadStatics
   5a2f4:	91000000 	add	x0, x0, #0x0
			5a2f4: R_AARCH64_TLSDESC_ADD_LO12	tls_InlinedThreadStatics
   5a2f8:	d53bd041 	mrs	x1, tpidr_el0
   5a2fc:	f9400002 	ldr	x2, [x0]
			5a2fc: R_AARCH64_TLSDESC_LD64_LO12	tls_InlinedThreadStatics
   5a300:	d63f0040 	blr	x2
			5a300: R_AARCH64_TLSDESC_CALL	tls_InlinedThreadStatics
   5a304:	8b000020 	add	x0, x1, x0
   5a308:	f9400013 	ldr	x19, [x0]

which gets converted by the linker in following:

image

TODO

Sharedlibrary scenario fails for some reason. The objdump on the object file looks identical between before and after, however when I do objdump on the shared library i.e. .so file, I see a difference in bits for adrp which tells me that I am missing some relocation update, but cannot find out what.

image
Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member Author

@jkotas

@filipnavara
Copy link
Member

  • The objdump on the object file looks identical between before and after, however when I do objdump on the shared library i.e. .so file, I see a difference in bits for adrp which tells me that I am missing some relocation update, but cannot find out what.

Can you dump the object file relocations for a given offset with readelf and share it?

@kunalspathak
Copy link
Member Author

Can you dump the object file relocations for a given offset with readelf and share it?

Here are the files. Look for ReturnsPrimitiveInt that is my test code.

@kunalspathak
Copy link
Member Author

@filipnavara - did you get a chance to check this?
CC: @MichalStrehovsky and @VSadov - in case they see what is missing.

@filipnavara
Copy link
Member

did you get a chance to check this?

Sorry, I was out for the weekend and now I have an emergency to attend to... will hopefully get to look at it in the evening.

@filipnavara
Copy link
Member

filipnavara commented Feb 5, 2024

I tried to reproduced it locally but all the tests passed.

Nevermind, I accidentally built a Debug build. Trying again.

@MichalStrehovsky
Copy link
Member

I see a difference in bits for adrp which tells me that I am missing some relocation update, but cannot find out what. @MichalStrehovsky - any idea?

Are we generating the exact instruction sequence clang/gcc would generate? Various breadcrumbs online seem to suggest linkers want to see exact instruction sequences: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150302/263562.html

@kunalspathak
Copy link
Member Author

Are we generating the exact instruction sequence clang/gcc would generate?

It works for executable, so yes, we generate exact sequence. It is failing only from sharedlibrary. When @VSadov added initial support of hand assembly, he didn't add anything special for sharedlibrary, so not sure what is wrong.

@filipnavara
Copy link
Member

I can reproduce it locally. It helped to check out the correct branch 😅

Notably, the sequence is different from what clang generates.

clang:

 adrp	x0, 0 <foo>
    R_AARCH64_TLSDESC_ADR_PAGE21 foo
 ldr	x1, [x0]
    R_AARCH64_TLSDESC_LD64_LO12 foo
 add	x0, x0, #0x0
    R_AARCH64_TLSDESC_ADD_LO12 foo
 blr	x1
    R_AARCH64_TLSDESC_CALL foo
 mrs	x8, tpidr_el0
 ldr	w0, [x8, x0]

The ldr follows the page address before applying the add.

Shared libraries get a different rewrite than executables due to the way they are relocated at runtime.

@kunalspathak
Copy link
Member Author

The ldr follows the page address before applying the add.

Ah, that could be it. I thought adrp/add is a pair and shouldn't matter. Let me fix it and see.

@kunalspathak
Copy link
Member Author

The ldr follows the page address before applying the add.

Ah, that could be it. I thought adrp/add is a pair and shouldn't matter. Let me fix it and see.

This fixed the problem for sharedlibrary.

image

executable:

  G_M48060_IG03:  ;; offset=0x0020
              mrs     x1, tpidr_el0
              adrp    x0, [HIGH RELOC #0x4201B8]
              ldr     x2, [x0]
              add     x0, x0, [LOW RELOC #0x4201B8]
              blr     x0
              add     x0, x1, x0
              ldr     x19, [x0]
              cbz     x19, G_M48060_IG07
                                                ;; size=32 bbWeight=1 PerfScore 10.50
image

@kunalspathak
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Member Author

superpmi changes:

I don't think we need any superpmi changes just for the relocs. I locally tried to collect and replay and it works fine

image

@kunalspathak kunalspathak marked this pull request as ready for review February 8, 2024 17:14
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

// that we will attach to this node to guarantee that they are available
// during generating this node.
assert(call->gtFlags & GTF_TLS_GET_ADDR);
newRefPosition(REG_R0, currentLoc, RefTypeFixedReg, nullptr, genRegMask(REG_R0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these have to be R0/R1/R2 for the linker, or is this just a choice you've made to make things simpler?

Or is it the case that since the ultimate expansion has a call these choices don't add new constraints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used by the linker to match the pattern as seen here and thats how we did that in hand-assembly code here.

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
@kunalspathak
Copy link
Member Author

@VSadov PTAL

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!!

@kunalspathak kunalspathak merged commit dc7f703 into dotnet:main Feb 15, 2024
113 of 115 checks passed
@kunalspathak kunalspathak deleted the nativeaot-tls-linux-arm64 branch February 15, 2024 21:03
@github-actions github-actions bot locked and limited conversation to collaborators Mar 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants