Add support for memory alignment #5931

JeffCyr · 2016-05-24T21:02:37Z

There are some optimizations not available with managed code in .Net because there are currently no ways to enforce a memory alignment greater than the pointer size:

Interlocked 64bit in x86 process when the underlying OS is 64bit (See discussion in issue ThreadPool's UnfairSemaphore Interlocked64 operation on misaligned address in x86 #4811)
Interlocked 128bit
Cache line alignment optimizations

I have no idea if this is easy or hard in the current coreclr design, but it would be nice to have a MemoryAlignmentAttribute that could specify alignment minimally on class type and possibly on any class/struct/field.

My motivation for this feature would be to implement an UnfairSemaphore (#2383) that isn't randomly inefficient in x86 when its 64bit state crosses a cache line boundary.

I have created a gist to isolate the consequences of unaligned Interlocked:
https://gist.github.com/JeffCyr/9e162f440e30b567507cc95b6ba5a4a4

On my machine, unaligned Interlocked operation can be 61x slower.

category:proposal
theme:alignment
skill-level:expert
cost:large
impact:medium

The text was updated successfully, but these errors were encountered:

tannergooding · 2016-05-24T21:29:31Z

@JeffCyr, what about [StructLayout(LayoutKind.Sequential, Pack = 16)] (types are in the System.Runtime.InteropServices namespace)?

JeffCyr · 2016-05-24T21:42:24Z

@tannergooding The Pack parameter won't affect the base address of the object.

tannergooding · 2017-07-31T18:29:09Z

I created a proposal on the CoreFX side here: https://github.com/dotnet/corefx/issues/22790

JeffCyr · 2017-10-11T17:44:42Z

It has been mentioned that this feature would require major changes to implement in the GC.

Do you think it would worth it to have a global App.Config setting to force 8 byte alignment of all ref types for x86 process running in a x64 OS?

This should be a lot simpler to implement and it resolves the random perf of Int64 in x86 processes. (e.g. #4811)

tannergooding · 2017-10-11T17:58:15Z

As I understand, @Maoni0 and @swgillespie are the GC people to tag on these issues

Maoni0 · 2017-10-11T21:13:08Z

if you want all objects to have a different alignment that's trivial to implement - we have an Align function that enforces the alignment and is called by every place that calculates the size of an object but does introduce perf penalty as the alignment is no longer a const; if you want the alignment to be a property of a type (which is what FEATURE_STRUCTALIGN implements) that's certainly much more work (the implementation of FEATURE_STRUCTALIGN is incomplete right now) but also has perf penalty as already pointed out on the other thread.

there needs to be a cost-benefit analysis.

JeffCyr · 2017-10-11T21:32:20Z

What about just changing the x86 alignment to 8-bytes instead of 4-bytes? The memory increase should be marginal no? And since x86 processors don't really exist anymore, all x86 app could perform better if the alignment match the processor architecture.

hanblee · 2017-10-13T00:07:05Z

What about just changing the x86 alignment to 8-bytes instead of 4-bytes?

This seems to be an overkill for what you are trying to achieve, and I don't think the memory increase would be marginal. Moreover, this would not help with "Cache line alignment optimizations" goal listed above.

And since x86 processors don't really exist anymore, all x86 app could perform better if the alignment match the processor architecture.

I don't follow this statement. For best performance, the recommendation is to align data on natural alignment boundaries.

JeffCyr · 2017-10-13T01:48:54Z

@hanblee

This seems to be an overkill for what you are trying to achieve, and I don't think the memory increase would be marginal

I don't see how changing to 8-bytes alignment in x86 could increase the memory usage significantly. The worst case is +4 bytes per object, so 4MB per million objects.

I don't follow this statement. For best performance, the recommendation is to align data on natural alignment boundaries.

I meant that nowadays, all x86 application run on a x64 CPU. So if all objects base address is 8-byte aligned, it guarantees that all 8 byte types are 8-byte aligned matching the underlying x64 CPU natural alignment.

Anyway, you're right that this proposition doesn't address the original issue, this conversation could be continued in another issue.

Maoni0 · 2017-10-13T02:17:36Z

I don't see how changing to 8-bytes alignment in x86 could increase the memory usage significantly. The worst case is +4 bytes per object, so 4MB per million objects.

the average size of objects on x86, according to analysis we did, was about 35 bytes, so a 4-byte increase is >10%. that is significant.

Drawaes · 2017-10-17T03:35:31Z

Worst case, if its 4 byte aligned and the law of large numbers kicks in then its 2 bytes average so ~ 5.7% still possibly significant...

ghost · 2022-12-22T03:28:30Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

There are some optimizations not available with managed code in .Net because there are currently no ways to enforce a memory alignment greater than the pointer size:

Interlocked 64bit in x86 process when the underlying OS is 64bit (See discussion in issue ThreadPool's UnfairSemaphore Interlocked64 operation on misaligned address in x86 #4811)
Interlocked 128bit
Cache line alignment optimizations

I have no idea if this is easy or hard in the current coreclr design, but it would be nice to have a MemoryAlignmentAttribute that could specify alignment minimally on class type and possibly on any class/struct/field.

My motivation for this feature would be to implement an UnfairSemaphore (#2383) that isn't randomly inefficient in x86 when its 64bit state crosses a cache line boundary.

I have created a gist to isolate the consequences of unaligned Interlocked:
https://gist.github.com/JeffCyr/9e162f440e30b567507cc95b6ba5a4a4

On my machine, unaligned Interlocked operation can be 61x slower.

category:proposal
theme:alignment
skill-level:expert
cost:large

Author:	JeffCyr
Assignees:	-
Labels:	`enhancement`, `design-discussion`, `area-GC-coreclr`
Milestone:	Future

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

kunalspathak added area-GC-coreclr and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage labels Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for memory alignment #5931

Add support for memory alignment #5931

JeffCyr commented May 24, 2016 •

edited by BruceForstall

Loading

tannergooding commented May 24, 2016

JeffCyr commented May 24, 2016

tannergooding commented Jul 31, 2017

JeffCyr commented Oct 11, 2017

tannergooding commented Oct 11, 2017

Maoni0 commented Oct 11, 2017

JeffCyr commented Oct 11, 2017

hanblee commented Oct 13, 2017

JeffCyr commented Oct 13, 2017

Maoni0 commented Oct 13, 2017

Drawaes commented Oct 17, 2017

ghost commented Dec 22, 2022

Add support for memory alignment #5931

Add support for memory alignment #5931

Comments

JeffCyr commented May 24, 2016 • edited by BruceForstall Loading

tannergooding commented May 24, 2016

JeffCyr commented May 24, 2016

tannergooding commented Jul 31, 2017

JeffCyr commented Oct 11, 2017

tannergooding commented Oct 11, 2017

Maoni0 commented Oct 11, 2017

JeffCyr commented Oct 11, 2017

hanblee commented Oct 13, 2017

JeffCyr commented Oct 13, 2017

Maoni0 commented Oct 13, 2017

Drawaes commented Oct 17, 2017

ghost commented Dec 22, 2022

JeffCyr commented May 24, 2016 •

edited by BruceForstall

Loading