-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThreadPool's UnfairSemaphore Interlocked64 operation on misaligned address in x86 #4811
Comments
Why wouldn't it be 8 byte aligned? |
Oh that's true, you made me realize that I specified LayoutKind.Sequential on my version of UnfairSemaphore. Thanks for that and sorry for the waste of your time, I'll close the issue. |
@mikedn I may be misguided, but it appears that the CLR in x86 does not enforce the 8 bytes alignment of Int64 fields. This results in random performance hit when using InterlockedCompareExchanged64 depending on whether the class was allocated on a 4 byte boundary or 8 byte boundary. Are you sure that this is not an issue in unmanaged? |
In the x86 desktop CLR this code looks like this:
There appears to be a single such object and it is allocated via the It can be seen that in debug versions of the CLR Now, what you are saying in your post is that x86 CLR doesn't seem to enforce 8 byte alignment on class Semaphore {
public long count;
static unsafe void Main() {
while (true) {
var sem = new Semaphore();
fixed (long* p = &sem.count) {
if (((int)p & 7) != 0)
Console.WriteLine("BAD");
}
}
}
} This will print out |
Thank you for this detailed answer @mikedn. Very informative. It would be nice if we could specify the required memory alignment with a class attribute in .Net. |
The UnfairSemaphore uses a 64bit structure to store its state and swap the values with FastInterlockCompareExchangeLong. This could have a performance hit if the address is not 64bit aligned in x86.
https://github.com/dotnet/coreclr/blob/master/src/vm/win32threadpool.h#L161
I reproduced the issue in c# when I ported the UnfairSemaphore to make a custom ThreadPool, on a particular scenario I get a 10x performance hit when the address is misaligned.
However I could not reproduce the issue with the CLR ThreadPool, is the address always 64bit aligned in the clr runtime for a reason I'm not aware of?
The text was updated successfully, but these errors were encountered: