-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor GC performance on AMD processors (32-bit runtime) #34478
Comments
The problem is that the code to get cache size on x86 is broken for any recent processors: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/util.cpp#L1735 @Maoni0 I remember we talked about deleting the As far as I can tell, this is |
thanks for reporting this, Jan! what are the values you seeing from the CpuId implementation vs the OS impl? and presumably the OS impl returns the correct size? |
I have tried this on:
CpuId implementation returns 0 on all them. The penalty for Intel is actually even higher than the penalty for AMD (e.g. the small repro is 1.5x slower than what it should be on Xeon E5) |
dotnet/coreclr#27323 broke the CpuId path. |
ahh, so it was not that it was broken; it just got broken because of #27323? |
#27323 broke it completely. It worked somewhat before that change, but it still was not quite right. Before #27323, AMD Ryzen 9 machine:
|
and 16mb in this case is the right cache size, correct? |
The spec at https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x says it has total 64MB L3 cache,16 cores, 32 logical CPUs. GetLogicalProcessorInformation API reports each 4 cores sharing 16MB of L3 cache. 4 * 16MB = 64MB. What should be the right cache size reported to the GC in this case? |
FWIW, for the two Intel machines that I have available, the cache sizes reported by the CpuId path and the OS path are the same (with the fix for bug introduced by #27323). |
it would be correct to report 16mb to GC in this case. I suspect all the hand written code from before should just be deleted 'cause the OS is able to get the correct cache size. the only reason I didn't get rid of it before was that I didn't have time to test at all on 32-bit. |
I have noticed that GC is running a lot more often than it should on AMD processors (32-bit runtime):
Small repro:
Results - AMD Ryzen 9 3950X, 32-bit runtime:
Actual: 20+ seconds per iteration
Expected: <15 seconds per iteration (on par with 64-bit runtime)
The text was updated successfully, but these errors were encountered: