-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for memory alignment #5931
Comments
@JeffCyr, what about |
@tannergooding The Pack parameter won't affect the base address of the object. |
I created a proposal on the CoreFX side here: https://github.com/dotnet/corefx/issues/22790 |
It has been mentioned that this feature would require major changes to implement in the GC. Do you think it would worth it to have a global App.Config setting to force 8 byte alignment of all ref types for x86 process running in a x64 OS? This should be a lot simpler to implement and it resolves the random perf of Int64 in x86 processes. (e.g. #4811) |
As I understand, @Maoni0 and @swgillespie are the GC people to tag on these issues |
if you want all objects to have a different alignment that's trivial to implement - we have an Align function that enforces the alignment and is called by every place that calculates the size of an object but does introduce perf penalty as the alignment is no longer a const; if you want the alignment to be a property of a type (which is what FEATURE_STRUCTALIGN implements) that's certainly much more work (the implementation of FEATURE_STRUCTALIGN is incomplete right now) but also has perf penalty as already pointed out on the other thread. there needs to be a cost-benefit analysis. |
What about just changing the x86 alignment to 8-bytes instead of 4-bytes? The memory increase should be marginal no? And since x86 processors don't really exist anymore, all x86 app could perform better if the alignment match the processor architecture. |
This seems to be an overkill for what you are trying to achieve, and I don't think the memory increase would be marginal. Moreover, this would not help with "Cache line alignment optimizations" goal listed above.
I don't follow this statement. For best performance, the recommendation is to align data on natural alignment boundaries. |
I don't see how changing to 8-bytes alignment in x86 could increase the memory usage significantly. The worst case is +4 bytes per object, so 4MB per million objects.
I meant that nowadays, all x86 application run on a x64 CPU. So if all objects base address is 8-byte aligned, it guarantees that all 8 byte types are 8-byte aligned matching the underlying x64 CPU natural alignment. Anyway, you're right that this proposition doesn't address the original issue, this conversation could be continued in another issue. |
the average size of objects on x86, according to analysis we did, was about 35 bytes, so a 4-byte increase is >10%. that is significant. |
Worst case, if its 4 byte aligned and the law of large numbers kicks in then its 2 bytes average so ~ 5.7% still possibly significant... |
Tagging subscribers to this area: @dotnet/gc Issue DetailsThere are some optimizations not available with managed code in .Net because there are currently no ways to enforce a memory alignment greater than the pointer size:
I have no idea if this is easy or hard in the current coreclr design, but it would be nice to have a My motivation for this feature would be to implement an UnfairSemaphore (#2383) that isn't randomly inefficient in x86 when its 64bit state crosses a cache line boundary. I have created a gist to isolate the consequences of unaligned Interlocked: On my machine, unaligned Interlocked operation can be 61x slower. category:proposal
|
There are some optimizations not available with managed code in .Net because there are currently no ways to enforce a memory alignment greater than the pointer size:
I have no idea if this is easy or hard in the current coreclr design, but it would be nice to have a
MemoryAlignmentAttribute
that could specify alignment minimally on class type and possibly on any class/struct/field.My motivation for this feature would be to implement an UnfairSemaphore (#2383) that isn't randomly inefficient in x86 when its 64bit state crosses a cache line boundary.
I have created a gist to isolate the consequences of unaligned Interlocked:
https://gist.github.com/JeffCyr/9e162f440e30b567507cc95b6ba5a4a4
On my machine, unaligned Interlocked operation can be 61x slower.
category:proposal
theme:alignment
skill-level:expert
cost:large
impact:medium
The text was updated successfully, but these errors were encountered: