-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NativeAOT ThunksPool thunk data block size handling. #110732
Conversation
dotnet#88710 made a change in TunkPool.cs moving away from using a page size define to calling ThunkBlockSize to get to end of thunk data block where a common stub address get stored. This change is not equivalent on platforms where the thunk blocks are laid out in pair where a stub thunk blocks are followed by a data thunk block and all gets mapped from file. This is the schema used on Windows platforms. In that layout schema the ThunkBlockSize is 2 * page size meaning that the calculation getting to the end of the thunk data block will move to the end of next thunk stub that is RX memory and storing the common stub address at that location will trigger an AV. This works on iOS since it reports its ThunkBlockSize as one page but that is not totally correct since it uses 2 pages, just that they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs blocks followed by all thunk data blocks. The reason why this works is because it only maps the thunk stubs from file, reporting a ThunkBlockSize that is inline with what gets map:ed from file, but then there is a special handling in PalAllocateThunksFromTemplate on iOS that virutal alloc template size * 2, mapping the first template size bytes from the file and the rest are kept as its thunk data blocks. This commit adds a new function returning the size of the thunk data block and use that when calculating the end of the data block instead of using the ThunkBlockSize since its reflects the size of each block getting mapped from file, on Windows platforms that is stub+data, 2 * page size.
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/ThunkPool.cs
Outdated
Show resolved
Hide resolved
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/ThunkPool.cs
Outdated
Show resolved
Hide resolved
It seems that ThunkBlockSize means code size in some situations and code + data size in other situations. It would be nice to clean it up. |
Yes, I was tempted rewrite it, the introduction of iOS changes the patterns a little, since it uses the per arch implementation but changed how it allocates the data blocks, only the stub blocks are laid out in the asm implementation, and the rest is handled in the platform specific implementation in native code, so the original logic in the asm implementation that describes layout of what gets re-mapped was altered, meaning that ThunkBlockSize for implementations used on iOS ended up as one page (just the stub), while it will be 2 pages on Windows platforms (stub+data page). So, Windows platforms are inline with how these functions were originally designed while iOS ended up using a hybrid only using the arch specific implementation to layout the thunk stubs. Cleaning it up would mean to expand the concept that the arch implementation might or might not handle the layout of data and reshape what the different sizes return from functions means and how they are used when calculating what needs to be virtual allocated, how much that should be re-mapped from file etc. In the end, since the arch asm implementation handles most of these implementation details, I believe it make sense to get all that data from the arch asm implementation, in the end the native code needs to know how much memory to virtual alloc for the complete mapping and how much of that memory it should map from the file, on Windows this will be the same, but on platforms like iOS it would be only half of the virtual memory would be re-mapped (just the stubs) while the rest is kept as is (data pages). In this PR, I decided to just implement what was needed to mainly restore the previous behavior that used page size directly in ThunkPool.cs but read it directly from the arch implementation that has the correct details of the size of individual blocks (in this case the thunk data block size). It was a simpler change solving the problem that dynamic ThunkPool is broken on Windows platforms but also getting the size from the arch specific asm implementation that have the correct size of the data block. I can certainly try to clean this up if we think that it's feasible. The underlying issue is the definition and usage of a "block", depending on implementation it is viewed as the stub thunk block or a combination of stub+data block. The block size is used to figure out the total size of virtual allocated memory as well as the stride used to go from one stub block to the next etc. Cleaning this up probably mean to change the current perception that a block original was a stub+data pair into something more aligned to the different implementations currently in use under the various variations of thunk pool implementations. |
958b9e7
to
0595470
Compare
0595470
to
258c248
Compare
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/ThunkPool.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Updated version of dotnet#110732 fixing issues on ARM32 and other platforms where size diffrence between code and data thunk caused thunk data block size calculations to be to small. dotnet#88710 made a change in TunkPool.cs moving away from using a page size define to calling ThunkBlockSize to get to end of thunk data block where a common stub address get stored. This change is not equivalent on platforms where the thunk blocks are laid out in pair where a stub thunk blocks are followed by a data thunk block and all gets mapped from file. This is the schema used on Windows platforms. In that layout schema the ThunkBlockSize is 2 * page size meaning that the calculation getting to the end of the thunk data block will move to the end of next thunk stub that is RX memory and storing the common stub address at that location will trigger an AV. This works on iOS since it reports its ThunkBlockSize as one page but that is not totally correct since it uses 2 pages, just that they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs blocks followed by all thunk data blocks. The reason why this works is because it only maps the thunk stubs from file, reporting a ThunkBlockSize that is inline with what gets map:ed from file, but then there is a special handling in PalAllocateThunksFromTemplate on iOS that virutal alloc template size * 2, mapping the first template size bytes from the file and the rest are kept as its thunk data blocks. This commit calculates the page size of code/data block based on max of code/data thunk size * number of thunks per block and since this is guaranteed to fit into one block and that the block size needs to be a power of 2, the correct full block size used in arch specific implementation when laying out the stub and data blocks can be calculated directly in managed code. Review feedback. Switch to ThunkDataBlockSizeMask in one place. Fix build error. Include pointer size slot in data block size calculation. Calculate page size using NumThunksPerBlock.
Slightly modified version of this PR fixing ARM32, #111149. |
Updated version of dotnet#110732 fixing issues on ARM32 and other platforms where size diffrence between code and data thunk caused thunk data block size calculations to be to small. dotnet#88710 made a change in TunkPool.cs moving away from using a page size define to calling ThunkBlockSize to get to end of thunk data block where a common stub address get stored. This change is not equivalent on platforms where the thunk blocks are laid out in pair where a stub thunk blocks are followed by a data thunk block and all gets mapped from file. This is the schema used on Windows platforms. In that layout schema the ThunkBlockSize is 2 * page size meaning that the calculation getting to the end of the thunk data block will move to the end of next thunk stub that is RX memory and storing the common stub address at that location will trigger an AV. This works on iOS since it reports its ThunkBlockSize as one page but that is not totally correct since it uses 2 pages, just that they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs blocks followed by all thunk data blocks. The reason why this works is because it only maps the thunk stubs from file, reporting a ThunkBlockSize that is inline with what gets map:ed from file, but then there is a special handling in PalAllocateThunksFromTemplate on iOS that virutal alloc template size * 2, mapping the first template size bytes from the file and the rest are kept as its thunk data blocks. This commit calculates the page size of code/data block based on max of code/data thunk size * number of thunks per block and since this is guaranteed to fit into one block and that the block size needs to be a power of 2, the correct full block size used in arch specific implementation when laying out the stub and data blocks can be calculated directly in managed code. Review feedback. Switch to ThunkDataBlockSizeMask in one place. Fix build error. Include pointer size slot in data block size calculation. Calculate page size using NumThunksPerBlock.
Updated version of #110732 fixing issues on ARM32 and other platforms where size diffrence between code and data thunk caused thunk data block size calculations to be to small. #88710 made a change in TunkPool.cs moving away from using a page size define to calling ThunkBlockSize to get to end of thunk data block where a common stub address get stored. This change is not equivalent on platforms where the thunk blocks are laid out in pair where a stub thunk blocks are followed by a data thunk block and all gets mapped from file. This is the schema used on Windows platforms. In that layout schema the ThunkBlockSize is 2 * page size meaning that the calculation getting to the end of the thunk data block will move to the end of next thunk stub that is RX memory and storing the common stub address at that location will trigger an AV. This works on iOS since it reports its ThunkBlockSize as one page but that is not totally correct since it uses 2 pages, just that they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs blocks followed by all thunk data blocks. The reason why this works is because it only maps the thunk stubs from file, reporting a ThunkBlockSize that is inline with what gets map:ed from file, but then there is a special handling in PalAllocateThunksFromTemplate on iOS that virutal alloc template size * 2, mapping the first template size bytes from the file and the rest are kept as its thunk data blocks. This commit calculates the page size of code/data block based on max of code/data thunk size * number of thunks per block and since this is guaranteed to fit into one block and that the block size needs to be a power of 2, the correct full block size used in arch specific implementation when laying out the stub and data blocks can be calculated directly in managed code. Review feedback. Switch to ThunkDataBlockSizeMask in one place. Fix build error. Include pointer size slot in data block size calculation. Calculate page size using NumThunksPerBlock.
#88710 made a change in ThunkPool.cs moving away from using a page size define to calling ThunkBlockSize to get to end of thunk data block where a common stub address get stored.
This change is not equivalent on platforms where the thunk blocks are laid out in pair where a stub thunk blocks are followed by a data thunk block and all gets mapped from file. This is the schema used on Windows platforms.
In that layout schema the ThunkBlockSize is 2 * page size meaning that the calculation getting to the end of the thunk data block will move to the end of next thunk stub that is RX memory and storing the common stub address at that location will trigger an AV.
This works on iOS since it reports its ThunkBlockSize as one page but that is not totally correct since it uses 2 pages, just that they are allocated in the same way as FEATURE_RX_THUNKS, all thunk stubs blocks followed by all thunk data blocks. The reason why this works is because it only maps the thunk stubs from file, reporting a ThunkBlockSize that is inline with what gets map:ed from file, but then there is a special handling in PalAllocateThunksFromTemplate on iOS that virutal alloc template size * 2, mapping the first template size bytes from the file and the rest are kept as its thunk data blocks.
This commit calculates the size of code/data block based on code/data thunk size * number of thunks per block and since this is guaranteed to fit into one block and that the block size needs to be a power of 2, the correct full block size used in arch specific implementation when laying out the stub and data blocks can be calculated directly in managed code.