Poor GC performance on AMD processors (32-bit runtime) #34478

jkotas · 2020-04-02T21:04:45Z

I have noticed that GC is running a lot more often than it should on AMD processors (32-bit runtime):

Small repro:

using System;

class Test
{
    static object[] o = new object[1000];

    static void Main()
    {
        Random r = new Random();
        for (;;)
        {
            int start = Environment.TickCount;
            for (int i = 0; i < 1000000000; i++)
                o[r.Next(o.Length)] = new object();
            int end = Environment.TickCount;
            Console.WriteLine(end-start);
        }
   }
}

Results - AMD Ryzen 9 3950X, 32-bit runtime:

Actual: 20+ seconds per iteration
Expected: <15 seconds per iteration (on par with 64-bit runtime)

The text was updated successfully, but these errors were encountered:

jkotas · 2020-04-02T21:10:11Z

The problem is that the code to get cache size on x86 is broken for any recent processors: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/util.cpp#L1735

@Maoni0 I remember we talked about deleting the GetCacheSizeFromCpuId path in the past. What kind of testing you would like to see to make this happen?

As far as I can tell, this is GetCacheSizeFromCpuId is broken for any recent AMD processor (and likely Intel as well).

Maoni0 · 2020-04-02T22:04:28Z

thanks for reporting this, Jan! what are the values you seeing from the CpuId implementation vs the OS impl? and presumably the OS impl returns the correct size?

jkotas · 2020-04-02T22:16:28Z

I have tried this on:

AMD Ryzen 9
Intel Xeon E5
Intel Core i7

CpuId implementation returns 0 on all them. The penalty for Intel is actually even higher than the penalty for AMD (e.g. the small repro is 1.5x slower than what it should be on Xeon E5)

jkotas · 2020-04-02T22:28:22Z

dotnet/coreclr#27323 broke the CpuId path.

Maoni0 · 2020-04-02T22:42:36Z

ahh, so it was not that it was broken; it just got broken because of #27323?

jkotas · 2020-04-02T23:20:22Z

#27323 broke it completely. It worked somewhat before that change, but it still was not quite right.

Before #27323, AMD Ryzen 9 machine:

CpuId returns cache size 2,621,440
OS returns cache size 16,777,216
The above test performance with OS cache size is ~14.4 seconds
The above test performance with CpuId cache size is ~15.3 seconds (6% slower)

Maoni0 · 2020-04-03T00:00:38Z

and 16mb in this case is the right cache size, correct?

jkotas · 2020-04-03T01:00:40Z

The spec at https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x says it has total 64MB L3 cache,16 cores, 32 logical CPUs.

GetLogicalProcessorInformation API reports each 4 cores sharing 16MB of L3 cache. 4 * 16MB = 64MB.

What should be the right cache size reported to the GC in this case?

jkotas · 2020-04-03T01:13:41Z

FWIW, for the two Intel machines that I have available, the cache sizes reported by the CpuId path and the OS path are the same (with the fix for bug introduced by #27323).

Maoni0 · 2020-04-03T01:42:49Z

it would be correct to report 16mb to GC in this case. I suspect all the hand written code from before should just be deleted 'cause the OS is able to get the correct cache size. the only reason I didn't get rid of it before was that I didn't have time to test at all on 32-bit.

Fixes dotnet#34478

Fixes #34478

jkotas added the area-GC-coreclr label Apr 2, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 2, 2020

Maoni0 added this to the 5.0 milestone Apr 2, 2020

Maoni0 removed the untriaged New issue has not been triaged by the area owner label Apr 2, 2020

jkotas mentioned this issue Apr 3, 2020

Fix cache size detection using cpuid #34484

Closed

jkotas added a commit to jkotas/runtime that referenced this issue Apr 3, 2020

Delete stale CPU cache size detection

9563efb

Fixes dotnet#34478

jkotas mentioned this issue Apr 3, 2020

Delete stale CPU cache size detection #34488

Merged

jkotas closed this as completed in #34488 Apr 3, 2020

jkotas added a commit that referenced this issue Apr 3, 2020

Delete stale CPU cache size detection (#34488)

fd60286

Fixes #34478

mangod9 mentioned this issue Sep 9, 2020

Backport AMD perf fix to 3.1 #42034

Closed

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor GC performance on AMD processors (32-bit runtime) #34478

Poor GC performance on AMD processors (32-bit runtime) #34478

jkotas commented Apr 2, 2020

jkotas commented Apr 2, 2020

Maoni0 commented Apr 2, 2020

jkotas commented Apr 2, 2020 •

edited

Loading

jkotas commented Apr 2, 2020

Maoni0 commented Apr 2, 2020

jkotas commented Apr 2, 2020

Maoni0 commented Apr 3, 2020

jkotas commented Apr 3, 2020 •

edited

Loading

jkotas commented Apr 3, 2020

Maoni0 commented Apr 3, 2020

Poor GC performance on AMD processors (32-bit runtime) #34478

Poor GC performance on AMD processors (32-bit runtime) #34478

Comments

jkotas commented Apr 2, 2020

jkotas commented Apr 2, 2020

Maoni0 commented Apr 2, 2020

jkotas commented Apr 2, 2020 • edited Loading

jkotas commented Apr 2, 2020

Maoni0 commented Apr 2, 2020

jkotas commented Apr 2, 2020

Maoni0 commented Apr 3, 2020

jkotas commented Apr 3, 2020 • edited Loading

jkotas commented Apr 3, 2020

Maoni0 commented Apr 3, 2020

jkotas commented Apr 2, 2020 •

edited

Loading

jkotas commented Apr 3, 2020 •

edited

Loading