Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NativeAOT ThunksPool thunk data block size handling. #110732

Merged
merged 5 commits into from
Dec 19, 2024

Conversation

lateralusX
Copy link
Member

@lateralusX lateralusX commented Dec 16, 2024

#88710 made a change in ThunkPool.cs moving away from using a page size define to calling ThunkBlockSize to get to end of thunk data block where a common stub address get stored.

This change is not equivalent on platforms where the thunk blocks are laid out in pair where a stub thunk blocks are followed by a data thunk block and all gets mapped from file. This is the schema used on Windows platforms.

In that layout schema the ThunkBlockSize is 2 * page size meaning that the calculation getting to the end of the thunk data block will move to the end of next thunk stub that is RX memory and storing the common stub address at that location will trigger an AV.

This works on iOS since it reports its ThunkBlockSize as one page but that is not totally correct since it uses 2 pages, just that they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs blocks followed by all thunk data blocks. The reason why this works is because it only maps the thunk stubs from file, reporting a ThunkBlockSize that is inline with what gets map:ed from file, but then there is a special handling in PalAllocateThunksFromTemplate on iOS that virutal alloc template size * 2, mapping the first template size bytes from the file and the rest are kept as its thunk data blocks.

This commit calculates the size of code/data block based on code/data thunk size * number of thunks per block and since this is guaranteed to fit into one block and that the block size needs to be a power of 2, the correct full block size used in arch specific implementation when laying out the stub and data blocks can be calculated directly in managed code.

dotnet#88710 made a change in TunkPool.cs
moving away from using a page size define to calling ThunkBlockSize
to get to end of thunk data block where a common stub address get stored.

This change is not equivalent on platforms where the thunk blocks are laid out
in pair where a stub thunk blocks are followed by a data thunk block and all
gets mapped from file. This is the schema used on Windows platforms.

In that layout schema the ThunkBlockSize is 2 * page size meaning that the
calculation getting to the end of the thunk data block will move to the end
of next thunk stub that is RX memory and storing the common stub address
at that location will trigger an AV.

This works on iOS since it reports its ThunkBlockSize as one page
but that is not totally correct since it uses 2 pages, just that
they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs
blocks followed by all thunk data blocks. The reason why this works
is because it only maps the thunk stubs from file, reporting a
ThunkBlockSize that is inline with what gets map:ed from file,
but then there is a special handling in PalAllocateThunksFromTemplate on
iOS that virutal alloc template size * 2, mapping the first
template size bytes from the file and the rest are kept as its
thunk data blocks.

This commit adds a new function returning the size of the thunk data
block and use that when calculating the end of the data block instead
of using the ThunkBlockSize since its reflects the size of each block getting
mapped from file, on Windows platforms that is stub+data, 2 * page size.
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Dec 16, 2024

This works on iOS since it reports its ThunkBlockSize

It seems that ThunkBlockSize means code size in some situations and code + data size in other situations. It would be nice to clean it up.

@lateralusX
Copy link
Member Author

lateralusX commented Dec 17, 2024

This works on iOS since it reports its ThunkBlockSize

It seems that ThunkBlockSize means code size in some situations and code + data size in other situations. It would be nice to clean it up.

Yes, I was tempted rewrite it, the introduction of iOS changes the patterns a little, since it uses the per arch implementation but changed how it allocates the data blocks, only the stub blocks are laid out in the asm implementation, and the rest is handled in the platform specific implementation in native code, so the original logic in the asm implementation that describes layout of what gets re-mapped was altered, meaning that ThunkBlockSize for implementations used on iOS ended up as one page (just the stub), while it will be 2 pages on Windows platforms (stub+data page). So, Windows platforms are inline with how these functions were originally designed while iOS ended up using a hybrid only using the arch specific implementation to layout the thunk stubs. Cleaning it up would mean to expand the concept that the arch implementation might or might not handle the layout of data and reshape what the different sizes return from functions means and how they are used when calculating what needs to be virtual allocated, how much that should be re-mapped from file etc. In the end, since the arch asm implementation handles most of these implementation details, I believe it make sense to get all that data from the arch asm implementation, in the end the native code needs to know how much memory to virtual alloc for the complete mapping and how much of that memory it should map from the file, on Windows this will be the same, but on platforms like iOS it would be only half of the virtual memory would be re-mapped (just the stubs) while the rest is kept as is (data pages).

In this PR, I decided to just implement what was needed to mainly restore the previous behavior that used page size directly in ThunkPool.cs but read it directly from the arch implementation that has the correct details of the size of individual blocks (in this case the thunk data block size). It was a simpler change solving the problem that dynamic ThunkPool is broken on Windows platforms but also getting the size from the arch specific asm implementation that have the correct size of the data block.

I can certainly try to clean this up if we think that it's feasible. The underlying issue is the definition and usage of a "block", depending on implementation it is viewed as the stub thunk block or a combination of stub+data block. The block size is used to figure out the total size of virtual allocated memory as well as the stride used to go from one stub block to the next etc. Cleaning this up probably mean to change the current perception that a block original was a stub+data pair into something more aligned to the different implementations currently in use under the various variations of thunk pool implementations.

@lateralusX lateralusX force-pushed the lateralusX/fix-thunkpool-av branch from 958b9e7 to 0595470 Compare December 18, 2024 10:03
@lateralusX lateralusX force-pushed the lateralusX/fix-thunkpool-av branch from 0595470 to 258c248 Compare December 18, 2024 10:05
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jkotas jkotas merged commit b91087f into dotnet:main Dec 19, 2024
88 checks passed
jkotas added a commit that referenced this pull request Dec 29, 2024
lateralusX added a commit to lateralusX/runtime that referenced this pull request Jan 7, 2025
Updated version of dotnet#110732 fixing
issues on ARM32 and other platforms where size diffrence between code
and data thunk caused thunk data block size calculations to be to small.

dotnet#88710 made a change in TunkPool.cs
moving away from using a page size define to calling ThunkBlockSize
to get to end of thunk data block where a common stub address get stored.

This change is not equivalent on platforms where the thunk blocks are laid out
in pair where a stub thunk blocks are followed by a data thunk block and all
gets mapped from file. This is the schema used on Windows platforms.

In that layout schema the ThunkBlockSize is 2 * page size meaning that the
calculation getting to the end of the thunk data block will move to the end
of next thunk stub that is RX memory and storing the common stub address
at that location will trigger an AV.

This works on iOS since it reports its ThunkBlockSize as one page
but that is not totally correct since it uses 2 pages, just that
they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs
blocks followed by all thunk data blocks. The reason why this works
is because it only maps the thunk stubs from file, reporting a
ThunkBlockSize that is inline with what gets map:ed from file,
but then there is a special handling in PalAllocateThunksFromTemplate on
iOS that virutal alloc template size * 2, mapping the first
template size bytes from the file and the rest are kept as its
thunk data blocks.

This commit calculates the page size of code/data block based on
max of code/data thunk size * number of thunks per block and since
this is guaranteed to fit into one block and that the block size
needs to be a power of 2, the correct full block size used in
arch specific implementation when laying out the stub and data blocks
can be calculated directly in managed code.

Review feedback.

Switch to ThunkDataBlockSizeMask in one place.

Fix build error.

Include pointer size slot in data block size calculation.

Calculate page size using NumThunksPerBlock.
@lateralusX
Copy link
Member Author

Slightly modified version of this PR fixing ARM32, #111149.

lateralusX added a commit to lateralusX/runtime that referenced this pull request Jan 9, 2025
Updated version of dotnet#110732 fixing
issues on ARM32 and other platforms where size diffrence between code
and data thunk caused thunk data block size calculations to be to small.

dotnet#88710 made a change in TunkPool.cs
moving away from using a page size define to calling ThunkBlockSize
to get to end of thunk data block where a common stub address get stored.

This change is not equivalent on platforms where the thunk blocks are laid out
in pair where a stub thunk blocks are followed by a data thunk block and all
gets mapped from file. This is the schema used on Windows platforms.

In that layout schema the ThunkBlockSize is 2 * page size meaning that the
calculation getting to the end of the thunk data block will move to the end
of next thunk stub that is RX memory and storing the common stub address
at that location will trigger an AV.

This works on iOS since it reports its ThunkBlockSize as one page
but that is not totally correct since it uses 2 pages, just that
they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs
blocks followed by all thunk data blocks. The reason why this works
is because it only maps the thunk stubs from file, reporting a
ThunkBlockSize that is inline with what gets map:ed from file,
but then there is a special handling in PalAllocateThunksFromTemplate on
iOS that virutal alloc template size * 2, mapping the first
template size bytes from the file and the rest are kept as its
thunk data blocks.

This commit calculates the page size of code/data block based on
max of code/data thunk size * number of thunks per block and since
this is guaranteed to fit into one block and that the block size
needs to be a power of 2, the correct full block size used in
arch specific implementation when laying out the stub and data blocks
can be calculated directly in managed code.

Review feedback.

Switch to ThunkDataBlockSizeMask in one place.

Fix build error.

Include pointer size slot in data block size calculation.

Calculate page size using NumThunksPerBlock.
jkotas pushed a commit that referenced this pull request Jan 13, 2025
Updated version of #110732 fixing
issues on ARM32 and other platforms where size diffrence between code
and data thunk caused thunk data block size calculations to be to small.

#88710 made a change in TunkPool.cs
moving away from using a page size define to calling ThunkBlockSize
to get to end of thunk data block where a common stub address get stored.

This change is not equivalent on platforms where the thunk blocks are laid out
in pair where a stub thunk blocks are followed by a data thunk block and all
gets mapped from file. This is the schema used on Windows platforms.

In that layout schema the ThunkBlockSize is 2 * page size meaning that the
calculation getting to the end of the thunk data block will move to the end
of next thunk stub that is RX memory and storing the common stub address
at that location will trigger an AV.

This works on iOS since it reports its ThunkBlockSize as one page
but that is not totally correct since it uses 2 pages, just that
they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs
blocks followed by all thunk data blocks. The reason why this works
is because it only maps the thunk stubs from file, reporting a
ThunkBlockSize that is inline with what gets map:ed from file,
but then there is a special handling in PalAllocateThunksFromTemplate on
iOS that virutal alloc template size * 2, mapping the first
template size bytes from the file and the rest are kept as its
thunk data blocks.

This commit calculates the page size of code/data block based on
max of code/data thunk size * number of thunks per block and since
this is guaranteed to fit into one block and that the block size
needs to be a power of 2, the correct full block size used in
arch specific implementation when laying out the stub and data blocks
can be calculated directly in managed code.

Review feedback.

Switch to ThunkDataBlockSizeMask in one place.

Fix build error.

Include pointer size slot in data block size calculation.

Calculate page size using NumThunksPerBlock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants