ThreadPool's UnfairSemaphore Interlocked64 operation on misaligned address in x86 #4811

JeffCyr · 2015-12-18T05:23:52Z

The UnfairSemaphore uses a 64bit structure to store its state and swap the values with FastInterlockCompareExchangeLong. This could have a performance hit if the address is not 64bit aligned in x86.

https://github.com/dotnet/coreclr/blob/master/src/vm/win32threadpool.h#L161

I reproduced the issue in c# when I ported the UnfairSemaphore to make a custom ThreadPool, on a particular scenario I get a 10x performance hit when the address is misaligned.

However I could not reproduce the issue with the CLR ThreadPool, is the address always 64bit aligned in the clr runtime for a reason I'm not aware of?

mikedn · 2015-12-18T05:59:04Z

Why wouldn't it be 8 byte aligned? asLongLong has type LONGLONG which is 8 bytes in size. That member will be 8 byte aligned unless the default compiler alignment is changed and that's unlikely to be the case.

JeffCyr · 2015-12-18T06:06:46Z

Oh that's true, you made me realize that I specified LayoutKind.Sequential on my version of UnfairSemaphore.

Thanks for that and sorry for the waste of your time, I'll close the issue.

JeffCyr · 2015-12-18T18:48:17Z

@mikedn I may be misguided, but it appears that the CLR in x86 does not enforce the 8 bytes alignment of Int64 fields. This results in random performance hit when using InterlockedCompareExchanged64 depending on whether the class was allocated on a 4 byte boundary or 8 byte boundary. Are you sure that this is not an issue in unmanaged?

mikedn · 2015-12-19T07:32:56Z

Are you sure that this is not an issue in unmanaged?

In the x86 desktop CLR this code looks like this:

lea     esi, [ecx+40h] ; get the address of asLongLong
...
lock cmpxchg8b qword ptr [esi]

ecx is the this pointer and 40h is the size of padding1 array and it is obviously a multiple of 8. This means that the only way asLongLong could end up being unaligned is if the entire object is improperly aligned.

There appears to be a single such object and it is allocated via the new operator. This is a custom new operator and it's not readily obvious how it works but one possible path is ends up in EEHeapAlloc which uses HeapAlloc and HeapAlloc returns 8 byte aligned memory.

It can be seen that in debug versions of the CLR EEHeapAlloc returns memory that's only 4 byte aligned because it stores the heap handle at the start of the block. But for debug builds this shouldn't make a difference as they're slow anyway.

Now, what you are saying in your post is that x86 CLR doesn't seem to enforce 8 byte alignment on Int64 managed fields. Yes, that's true and it's easy to verify that:

class Semaphore {
    public long count;
    static unsafe void Main() {
        while (true) {
            var sem = new Semaphore();
            fixed (long* p = &sem.count) {
                if (((int)p & 7) != 0)
                    Console.WriteLine("BAD");
            }
        }
    }
}

This will print out BAD soon enough.

JeffCyr · 2015-12-19T12:02:06Z

Thank you for this detailed answer @mikedn. Very informative.

It would be nice if we could specify the required memory alignment with a class attribute in .Net.

JeffCyr closed this as completed Dec 18, 2015

JeffCyr reopened this Dec 18, 2015

JeffCyr closed this as completed Dec 19, 2015

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

JeffCyr mentioned this issue Jan 30, 2020

Add support for memory alignment #5931

Open

ghost locked as resolved and limited conversation to collaborators Jan 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThreadPool's UnfairSemaphore Interlocked64 operation on misaligned address in x86 #4811

ThreadPool's UnfairSemaphore Interlocked64 operation on misaligned address in x86 #4811

JeffCyr commented Dec 18, 2015

mikedn commented Dec 18, 2015

JeffCyr commented Dec 18, 2015

JeffCyr commented Dec 18, 2015

mikedn commented Dec 19, 2015

JeffCyr commented Dec 19, 2015

ThreadPool's UnfairSemaphore Interlocked64 operation on misaligned address in x86 #4811

ThreadPool's UnfairSemaphore Interlocked64 operation on misaligned address in x86 #4811

Comments

JeffCyr commented Dec 18, 2015

mikedn commented Dec 18, 2015

JeffCyr commented Dec 18, 2015

JeffCyr commented Dec 18, 2015

mikedn commented Dec 19, 2015

JeffCyr commented Dec 19, 2015