Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A mechanism for specifying alignment on a field or struct should be supported. #22990

Closed
tannergooding opened this issue Jul 31, 2017 · 17 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime.InteropServices

Comments

@tannergooding
Copy link
Member

Rationale

In certain high performance or specialized data structures/algorithms, it is desirable to enforce an alignment for structs, fields, or locals.

Today, CoreFX provides several specialized data structures for which the runtime either has special alignment handling (System.Numerics.Vector) or for which they have some specialized padding (dotnet/corefx#22724).

As such, the framework/runtime should provide a mechanism for encforcing a specified alignment for structs and fields. Locals should also be included if that is feasible (I'm not sure if that is readily possible today given that attributes cannot be specified on locals).

Additional Thoughts

It might be worthwhile to additionally expose this on the existing StructLayoutAttribute as an Alignment property.

An alignment of 0 should be treated as "Automatic" (the current behavior of letting the runtime decide alignment).

A mechanism for aligning to the cache would be ideal (dotnet/corefx#22724 (comment)). This could perhaps be a special value that would otherwise be invalid (such as Alignment=-1). Other special alignments could also be allowed in a similar manner.

If a field specifies an alignment less than that of the struct, it should be aligned to the alignment of the struct. For example, if you do Alignment=8 on a Vector4 (which has an Alignment=16), the field should be treated as Alignment=16.

[Design Decision] If a struct specifies an alignment less than that of its first field it should either:
A. Align the struct as specified and add the appropriate padding so that the first field is also aligned as specified
-or-
B. Align the struct as per the requirements of the first field

[EDIT] Make reference to the PR a link by @karelz

@JonHanna
Copy link
Contributor

This could perhaps be a special value that would otherwise be invalid (such as Alignment=-1). Other special alignments could also be allowed in a similar manner.

Perhaps -4 to mean 4 octets from the end of the cache line?

@tannergooding
Copy link
Member Author

This probably deserves/requires input from some runtime folks as well, given that they have the best understanding of how determining alignment works today.

@karelz, do you know who should be tagged?

@tannergooding
Copy link
Member Author

Going to tag @jkotas, @fiigii, and @mellinoe right now.

This will be very useful for ensuring the backing data structures are properly aligned when they are used in combination with the Hardware Intrinsics feature.

@jkotas
Copy link
Member

jkotas commented Oct 1, 2017

There are two aspects of this:

  1. Controlling field offset alignment within type
  2. Controlling alignment when the storage for the type is allocated (on GC heap, on stack, ...)

Is this issue about 1, 2 or both?

@tannergooding
Copy link
Member Author

The second (controlling alignment for the entire type).

If that is provided, the first can be achieved by aligning the whole type and using the appropriate field offset attributes on the individual members.

@tannergooding
Copy link
Member Author

Although, It does somewhat extend into the range of both when dealing with types that have an alignment but are also members of another type. I mentioned in the original comment some scenarios where this may come up.

@jkotas
Copy link
Member

jkotas commented Oct 1, 2017

  1. is GC feature. It is non pay-for-play GC feature at its core: The GC would need to look at alignment of every object in various situations (slows it down everywhere), but only a few situations benefit. It would need to be prototyped and we would need to get convinced that it is a good tradeoff to make. There was some work on this done earlier - the code is under FEATURE_STRUCTALIGN ifdefs.

@tannergooding
Copy link
Member Author

@jkotas, thanks for the reference (going to take a look through this when I have some time)!

I'm guessing the issue isn't in the first allocation, since that can be considered "trivial". That is, you just need to allocate, at most, Size + (Alignment - 1) bytes and return the first address with the correct alignment.

So, I think the hardest part for the GC probably comes in play when the heap is compressed or when objects are otherwise moved, since alignment limits where it can be moved to. I'm wondering, however, if this can be done without bringing in too much cost.

I would think (possibly naively) that the GC would set a flag indicating whether an object is aligned (or maybe a separate tree containing these objects or something similar). Most objects are not expected to be aligned, so they don't need to do anything else. The few objects that are aligned need to relocated to an address that is still aligned. This can be any address that is between Size and Size + (Alignment - 1) bytes in length (where Size is for an address that is perfectly aligned and Size + (Alignment - 1) is an address with "worst case" alignment).

@jkotas
Copy link
Member

jkotas commented Oct 2, 2017

It is the 10,000ft view of how this may work. You can tell from the 37 FEATURE_STRUCTALIGN ifdefs left over in the GC from previous attempt to implement this that it is not exactly trivial to implement. Also, I would expect that the implementation itself is not where most of the work would be - most of the work would be in both functional and performance testing.

@tannergooding
Copy link
Member Author

I put a bit of thought into why the feature is requested (feel free to correct me if you disagree)...

On modern computers, unaligned reads/writes are (generally speaking) as fast as aligned reads/writes. The exception to this is when the load/store crosses a cache-line boundary (or worse, a page boundary).

Looking at the "Intel Optimization Manual", a load/store that crosses a cache-line boundary can take ~4.5x more cycles on modern CPUs and more on older (this is assuming I didn't miss a section that says something different for even newer processors).

The most commonly used alignments will likely be:

  • 16 (SIMD128; SSE/SSE2)
  • 32 (SIMD256; AVX, AVX2)
  • 64 (SIMD512; AVX512)
  • Cache Line Size (also beneficial for concurrent code; although I didn't touch on that in depth)
  • Page Size (also beneficial for large blocks of memory, such as file reads; although I didn't touch on that in depth)

Other alignments (those between cache line size and page size), as far as I can tell, do not provide any real performance benefit. This is because there is no register which can read the data all at once and because it won't provide any additional guarantees of not crossing a cache-line or page boundary.

If getting the GC to support custom aligned types is hard (and not likely to get this feature any time soon), then is there a reasonable workaround for the near or long term?For example:

  • Providing a 'high-performance' API for allocating aligned blocks of memory not tracked by the GC
  • Providing a set of 'high-performance' APIs for manual memory management (allocating/freeing/zeroing/copying heaps/pages/blocks/etc)
    • There are some issues with the existing memory management functions in the Marshal class, some of which are probably fixable

On the other hand, has any consideration been put in to support custom aligned types, but with certain limitations? For example:

  • custom alignment is supported, but only for specific sizes
  • custom alignment is supported, but only for arrays
    • I can't, at least this late at night, think of any real world use-cases for heap allocated single-objects that require specific alignment, all the use cases that come to mind involve arrays and multiple reads/writes
    • For single value-type objects (if required), stack respected custom alignment would likely work, even if heap respected custom-alignment didn't exist
    • Only having stack respected custom alignment won't work for large arrays, since that can easily cause a stack overflow

@vermorel
Copy link

custom alignment is supported, but only for arrays.

Yes, arrays are our only use case. Actually, we would not even need all arrays, gaining control on byte[] arrays only would already be sufficient, thanks to MemoryMarshal.Cast.

@saucecontrol
Copy link
Member

If https://github.com/dotnet/coreclr/issues/19936 is implemented, you'll at least be able to roll your own aligned buffers with the knowledge the GC will never move them.

@benaadams
Copy link
Member

My interest would be for CMPXCHG16b with a object reference + tag type struct

@vermorel
Copy link

@saucecontrol A memory mapped file will already give you aligned buffers. However, it's an IDisposable object to deal with. To make aligned memory convenient, we need support from the GC.

@saucecontrol
Copy link
Member

The issue I linked is specifically about adding GC support. It doesn't handle the alignment, but it solves the problem of the GC potentially moving something after you've found an aligned section to work with.

There's also https://github.com/dotnet/corefx/issues/31787, which addresses aligned allocation of arrays.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Feb 24, 2020
@AaronRobinsonMSFT AaronRobinsonMSFT modified the milestones: 5.0, Future May 14, 2020
@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Nov 13, 2024
@dotnet-policy-service dotnet-policy-service bot removed this from the Future milestone Nov 27, 2024
@Unknown6656
Copy link

what is the current state of this proposal?

@dotnet-policy-service dotnet-policy-service bot removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Dec 18, 2024
@tannergooding
Copy link
Member Author

Closed/inactive. It would require substantial GC work to support and is low priority.

Data is already naturally aligned according to the primitive elements it contains and in the cases of things like arrays and large data processing there are often better ways to deal with the data (which you'll often have as part of your core algorithm anyways) such as doing opportunistic alignment.

If you'd like to manually manage memory, convenience APIs such as NativeMemory.AlignedAlloc exist and most BCL APIs take Span which allows you to transparently work with managed or native memory.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime.InteropServices
Projects
Archived in project
Development

No branches or pull requests