Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating StringBuilder with large initial capacity has regressed #59142

Closed
adamsitnik opened this issue Sep 15, 2021 · 17 comments
Closed

Creating StringBuilder with large initial capacity has regressed #59142

adamsitnik opened this issue Sep 15, 2021 · 17 comments
Assignees
Labels
area-GC-coreclr tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter "System.Text.Tests.Perf_StringBuilder.ctor_capacity"

It's not specific to any OS, it repros on both Windows and Unix:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

image

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

image

@DrewScoggins it seems that it got detected by the bot in dotnet/perf-autofiling-issues#304, but it got reported a month after it had actually happened. Do we know why?

image

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

Result Base Diff Ratio Alloc Delta Modality Operating System Bit Processor Name Base V Diff V
Slower 3050.00 5058.37 0.60 +0 Windows 10.0.19043.1165 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores 5.0.921.35908 6.0.21.41701
Slower 4089.58 5954.37 0.69 +0 Windows 10.0.20348 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 4132.44 5912.88 0.70 +0 Windows 10.0.20348 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 5679.51 6874.07 0.83 +0 Windows 10.0.18363.1621 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Slower 5710.62 7429.97 0.77 +0 Windows 8.1 X64 Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge) 5.0.921.35908 6.0.21.45401
Same 5536.15 5959.82 0.93 +0 Windows 10.0.19042.685 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 5.0.921.35908 6.0.21.41701
Same 6102.76 6609.79 0.92 +0 several? Windows 10.0.19043.1165 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake) 5.0.921.35908 6.0.21.41701
Slower 6148.23 7890.59 0.78 +0 Windows 10.0.22454 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) 5.0.921.35908 6.0.21.41701
Slower 4434.70 7083.12 0.63 +1 Windows 10.0.22451 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) 5.0.921.35908 6.0.21.41701
Slower 4078.26 5141.72 0.79 +0 Windows 10.0.19042.1165 X64 Intel Core i9-9900T CPU 2.10GHz 5.0.921.35908 6.0.21.41701
Slower 9410.39 13478.34 0.70 -200072 Windows 7 SP1 X64 Intel Core2 Duo CPU T9600 2.80GHz 5.0.721.25508 6.0.21.41701
Same 9069.64 9604.87 0.94 +0 centos 8 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 7865.46 8368.56 0.94 +0 debian 10 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 20147.96 26456.29 0.76 +0 rhel 7 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 9453.04 10139.36 0.93 +0 sles 15 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 7528.38 7967.33 0.94 +0 opensuse-leap 15.3 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 8146.26 8802.99 0.93 +0 bimodal ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Same 9764.07 10225.09 0.95 +0 ubuntu 18.04 X64 Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge) 5.0.921.35908 6.0.21.41701
Same 9382.91 9883.81 0.95 +0 alpine 3.13 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 5.0.921.35908 6.0.21.41701
Same 20906.32 22371.62 0.93 +0 ubuntu 16.04 Arm64 Unknown processor 5.0.421.11614 6.0.21.41701
Slower 5269.16 7873.35 0.67 +0 Windows 10.0.19043.1165 Arm64 Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Slower 6875.80 8930.89 0.77 +0 Windows 10.0.22000 Arm64 Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Slower 2963.91 3981.51 0.74 +0 Windows 10.0.19043.1165 X86 AMD Ryzen Threadripper PRO 3945WX 12-Cores 5.0.921.35908 6.0.21.41701
Slower 5410.13 7045.48 0.77 +0 Windows 10.0.18363.1621 X86 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Slower 8075.79 9430.32 0.86 +0 Windows 10.0.19043.1165 Arm Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Same 9107.70 10015.71 0.91 +25 macOS Big Sur 11.5.2 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell) 5.0.921.35908 6.0.21.41701
Same 20246.89 21171.72 0.96 +0 macOS Big Sur 11.5.2 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell) 5.0.921.35908 6.0.21.41701
Same 8215.92 8089.95 1.02 +0 macOS Big Sur 11.4 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 5.0.921.35908 6.0.21.41701
@ghost
Copy link

ghost commented Sep 15, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter "System.Text.Tests.Perf_StringBuilder.ctor_capacity"

It's not specific to any OS, it repros on both Windows and Unix:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

image

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

image

@DrewScoggins it seems that it got detected by the bot in dotnet/perf-autofiling-issues#304, but it got reported a month after it had actually happened. Do we know why?

image

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

Result Base Diff Ratio Alloc Delta Modality Operating System Bit Processor Name Base V Diff V
Slower 3050.00 5058.37 0.60 +0 Windows 10.0.19043.1165 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores 5.0.921.35908 6.0.21.41701
Slower 4089.58 5954.37 0.69 +0 Windows 10.0.20348 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 4132.44 5912.88 0.70 +0 Windows 10.0.20348 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 5679.51 6874.07 0.83 +0 Windows 10.0.18363.1621 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Slower 5710.62 7429.97 0.77 +0 Windows 8.1 X64 Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge) 5.0.921.35908 6.0.21.45401
Same 5536.15 5959.82 0.93 +0 Windows 10.0.19042.685 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 5.0.921.35908 6.0.21.41701
Same 6102.76 6609.79 0.92 +0 several? Windows 10.0.19043.1165 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake) 5.0.921.35908 6.0.21.41701
Slower 6148.23 7890.59 0.78 +0 Windows 10.0.22454 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) 5.0.921.35908 6.0.21.41701
Slower 4434.70 7083.12 0.63 +1 Windows 10.0.22451 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) 5.0.921.35908 6.0.21.41701
Slower 4078.26 5141.72 0.79 +0 Windows 10.0.19042.1165 X64 Intel Core i9-9900T CPU 2.10GHz 5.0.921.35908 6.0.21.41701
Slower 9410.39 13478.34 0.70 -200072 Windows 7 SP1 X64 Intel Core2 Duo CPU T9600 2.80GHz 5.0.721.25508 6.0.21.41701
Same 9069.64 9604.87 0.94 +0 centos 8 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 7865.46 8368.56 0.94 +0 debian 10 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 20147.96 26456.29 0.76 +0 rhel 7 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 9453.04 10139.36 0.93 +0 sles 15 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 7528.38 7967.33 0.94 +0 opensuse-leap 15.3 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 8146.26 8802.99 0.93 +0 bimodal ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Same 9764.07 10225.09 0.95 +0 ubuntu 18.04 X64 Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge) 5.0.921.35908 6.0.21.41701
Same 9382.91 9883.81 0.95 +0 alpine 3.13 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 5.0.921.35908 6.0.21.41701
Same 20906.32 22371.62 0.93 +0 ubuntu 16.04 Arm64 Unknown processor 5.0.421.11614 6.0.21.41701
Slower 5269.16 7873.35 0.67 +0 Windows 10.0.19043.1165 Arm64 Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Slower 6875.80 8930.89 0.77 +0 Windows 10.0.22000 Arm64 Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Slower 2963.91 3981.51 0.74 +0 Windows 10.0.19043.1165 X86 AMD Ryzen Threadripper PRO 3945WX 12-Cores 5.0.921.35908 6.0.21.41701
Slower 5410.13 7045.48 0.77 +0 Windows 10.0.18363.1621 X86 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Slower 8075.79 9430.32 0.86 +0 Windows 10.0.19043.1165 Arm Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Same 9107.70 10015.71 0.91 +25 macOS Big Sur 11.5.2 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell) 5.0.921.35908 6.0.21.41701
Same 20246.89 21171.72 0.96 +0 macOS Big Sur 11.5.2 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell) 5.0.921.35908 6.0.21.41701
Same 8215.92 8089.95 1.02 +0 macOS Big Sur 11.4 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 5.0.921.35908 6.0.21.41701
Author: adamsitnik
Assignees: -
Labels:

area-System.Runtime, tenet-performance

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 15, 2021
@adamsitnik
Copy link
Member Author

adamsitnik commented Sep 15, 2021

Some other benchmarks that use large string builders like System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000) and System.Text.Tests.Perf_StringBuilder.ctor_string(length: 100000) also seem to be slightly regressed.

@DrewScoggins
Copy link
Member

@DrewScoggins it seems that it got detected by the bot in dotnet/perf-autofiling-issues#304, but it got reported a month after it had actually happened. Do we know why?

This was because we did a rerun of the bot over that time period, after we did lab wide reruns over that time period because of missing data.

@jeffhandley jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Sep 30, 2021
@jeffhandley jeffhandley added this to the 6.0.0 milestone Sep 30, 2021
@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 14, 2021

Ok, so I investigated the regression. Here is what I've found:

All this benchmark does, is creating a StringBuilder instance with quite large capacity:

https://github.com/dotnet/performance/blob/a36bec8eed2a3139d067de0d9d81e72422328de7/src/benchmarks/micro/libraries/System.Text/Perf.StringBuilder.cs#L34-L37

Which is more or less exercising GC.AllocateUninitializedArray<char>(capacity) and... garbage collection (see profiler output below).

m_MaxCapacity = maxCapacity;
m_ChunkChars = GC.AllocateUninitializedArray<char>(capacity);

GC.AllocateUninitializedArray delagates most of the work to AllocateNewArray:

return Unsafe.As<T[]>(AllocateNewArray(typeof(T[]).TypeHandle.Value, length, flags));

which according to git blame was not modified for at least 2 years which does not answer the question why it has regressed.

By using the following command:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter *ctor_capacity --bdn-arguments "--invocationCount 75088 --iterationCount 20 --profiler ETW"

I was able to capture two ETW trace files, filter them to a single benchmark iteration that performed exactly the same work (see --invocationCount 75088 in the arguments above) using Start&Stop BDN events:

image

After loading all the symbols, disabling folding, filtering out Finalizer and Tiered JIT threads I got the following output for .NET 5:

Name                                                Exc %ExcInc %   IncFold                             When
coreclr!WKS::gc_heap::plan_phase                     18.8 58 20.7    64   0 12_3114422114112112231261431_230
coreclr!WKS::gc_heap::mark_object_simple1            16.5 51 16.5    51   0 212211_1121_152__112_12_43222122
coreclr!WKS::gc_heap::mark_object_simple             11.736.088 28.287.088   0 34442311222025223332124154223122
ntdll!LdrpDispatchUserCallTarget                      3.6 11  3.6    11   0 ____1__01__11______1_2____1__1_1
ntoskrnl!EtwpWriteUserEvent                           3.2 10  7.4    23   0 1__1_1__0121111011_12_____00_11_
coreclr!GcInfoDecoder::GcInfoDecoder                  2.3  7  2.3     7   0 ___1___111_1_____________001____
ntoskrnl!KiSystemServiceUser                          1.9  6  1.9     6   0 _____11_________1___11________1_
coreclr!EEJitManager::JitCodeToMethodInfo             1.3  4  1.3     4   0 ____1_________11___________00___
coreclr!GcInfoDecoder::EnumerateLiveSlots             1.0  3 12.338.142   0 2421_13_111__4___11__13_23112_12
coreclr!WKS::GCHeap::Alloc                            1.0  3 98.4304.230   0 99999999999998998999999999999998
coreclr!WKS::gc_heap::mark_phase                      1.0  3 49.6153.230   0 76555444443545464445455164367336
coreclr!ScanConsecutiveHandlesWithoutUserData         1.0  3 21.466.088   0 32332311122121223222121131123122
coreclr!WKS::GCHeap::UpdatePostGCCounters             1.0  3  2.9     9   0 1_1________1_21__________1___11_
ntdll!EtwpEventWriteFull                              1.0  3 12.6    39   0 1_11_2_10133112031_231____11112_
coreclr!WKS::gc_heap::sweep_uoh_objects               1.0  3  1.0     3   0 ___1________1_1_________________
coreclr!EECodeManager::EnsureCallerContextIsValid     1.0  3  2.6     8   0 ____1_________111_11_______11___
ntoskrnl!ObfDereferenceObjectWithTag                  1.0  3  1.0     3   0 __1_______________________11____
ntoskrnl!HvlGetReferenceTimeUsingTscPage              1.0  3  1.0     3   0 _____________1___1__1___________
ntdll!RtlpUnwindPrologue                              1.0  3  1.0     3   0 ____1___________1_1_____________
ntoskrnl!EtwpReserveTraceBuffer                       1.0  3  2.6     8   0 ___________101_011__1________1__
coreclr!GcSlotDecoder::DecodeSlotTable                0.72.142  0.7 2.142   0 _11____________________________0
coreclr!StackFrameIterator::ProcessCurrentFrame       0.6  2  4.2    13   0 ____1__1______121_12__1____21___
ntdll!NtFlushProcessWriteBuffers                      0.6  2  0.6     2   0 ______________________00____1___
coreclr!WKS::gc_heap::gc1                             0.6  2 76.7237.230   0 98686698867798785567777789777676
ntoskrnl!ObpReferenceObjectByHandleWithTag            0.6  2  0.6     2   0 _____________________________11_
ntoskrnl!memcpy                                       0.6  2  0.6     2   0 ___________11___________________
ntdll!RtlQueryPerformanceCounter                      0.6  2  0.6     2   0 ____________________1____1______
coreclr!ReadyToRunJitManager::JitCodeToMethodInfo     0.6  2  0.6     2   0 _______1_________1______________
ntoskrnl!MiLockWorkingSetShared                       0.6  2  0.6     2   0 ______________1________________1
ntdll!NtTraceEvent                                    0.6  2 11.6    36   0 1_11_2_10133112031_131____11_12_
coreclr!MethodTable::_GetFullyQualifiedNameForClass  0.6  2  5.8    18   0 00102___1_00_____12___111_1__21_
coreclr!SString::Replace                              0.6  2  2.6     8   0 00__1____________11____1__1__11_
ntdll!NtQueryInformationProcess                       0.6  2  2.3     7   0 ______1_11____1______1__1______1
coreclr!GcSlotDecoder::GetSlotDesc                    0.6  2  0.6     2   0 _____1______________________1___
coreclr!GCInterface::AllocateNewArray                 0.6  2 99.0306.230   0 99998999999999998999999999999999
ntoskrnl!NtTraceEvent                                 0.6  2  9.1    28   0 1_11_1__0121112021_12_____11_11_
coreclr!memset                                        0.6  2  0.6     2   0 ____1__1________________________

https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/adamsitnik/5448f82b75dc00c57139cd3553dad3a8/raw/09fc56890bfc7b4665ca05c9f97cb48db74c9dc7/net50.speedscope.json

and .NET 6:

Name                                            Exc % ExcInc %   IncFold                             When
coreclr!WKS::gc_heap::mark_object_simple1        31.0213.084 31.0213.084   0 22343352340341223235332221412241
coreclr!WKS::gc_heap::plan_phase                 23.0 158 24.8   170   0 02112211243112243223014223221124
coreclr!WKS::gc_heap::find_first_object           9.968.240  9.968.240   0 111010000011000001002000100_0100
coreclr!WKS::gc_heap::mark_object_simple          5.8  40 36.9253.084   0 23443453451443323335333322424342
ntoskrnl!EtwpWriteUserEvent                       2.5  17  5.0    34   0 0_100000__000000_00_0_00_10100_0
ntoskrnl!ObpReferenceObjectByHandleWithTag        1.5  10  1.5    10   0 00___0_0____0__0____00__0____0__
ntoskrnl!KiSystemServiceUser                      1.0   7  1.0     7   0 ______0_______00_________0_0____
ntoskrnl!KiDeferredReadySingleThread              0.9   6  0.9     6   0 ___0______0__00___0__0__________
ntdll!LdrpDispatchUserCallTarget                  0.9   6  0.9     6   0 ________0___000__________0______
ntoskrnl!HvlGetReferenceTimeUsingTscPage          0.9   6  0.9     6   0 __0___0____0___0_________000____
ntdll!NtTraceEvent                                0.7   5  8.2    56   0 101000000_100010_01_000001011001
coreclr!SString::ConvertToUnicode                 0.7   5  1.3     9   0 ___0_______0_____00____00____1__
ntoskrnl!memcpy                                   0.7   5  0.7     5   0 __0__0________________00___0___0
coreclr!GcSlotDecoder::DecodeSlotTable            0.6   4  0.6     4   0 ____00______0________________0__
coreclr!WKS::gc_heap::desired_new_allocation      0.6   4  1.3     9   0 ___0_0___________00____00__0____
coreclr!WKS::gc_heap::relocate_survivors          0.4   3  0.4     3   0 ________0_____0___________0_____
coreclr!FireEtwGCAllocationTick_V4                0.4   3  5.7    39   0 0_000001__000000_0__00__01010000
coreclr!WKS::gc_heap::sweep_uoh_objects           0.4   3  0.7     5   0 _0__0______0_____________0_____0
ntoskrnl!NtTraceEvent                             0.4   3  6.8    47   0 00100000__100000_00_000001010000
coreclr!SegmentScanByTypeMap                      0.4   3  0.6     4   0 ______0___0_00__________________
ntdll!RtlQueryPerformanceCounter                  0.4   3  0.4     3   0 ___________0_______________0___0
ntdll!NtQuerySystemInformation                    0.4   3  0.6     4   0 ________________________0__0__0_
coreclr!StackFrameIterator::Filter                0.3   2  0.3     2   0 ________________________0_______
ntoskrnl!KiExitDispatcher                         0.3   2  1.2     8   0 ___0______0__00_0_0__0____0_____
coreclr!GcInfoDecoder::GcInfoDecoder              0.3   2  0.6     4   0 ______o0_____0______0_0_________
coreclr!WKS::gc_heap::generation_size             0.3   2  0.3     2   0 ______0_________0_______________
ntdll!EtwpEventWriteFull                          0.3   2  8.5    58   0 201000000_100010_01_000001011001
ntoskrnl!KiTryUnwaitThread                        0.3   2  0.3     2   0 ______0_____0___________________
coreclr!GcInfoDecoder::ReportUntrackedSlots       0.3   2 16.1110.240   0 221211112012010112102102110_1111
coreclr!VirtualCallStubManager::ReclaimAll        0.3   2  0.3     2   0 _0________0_____________________
coreclr!StackFrameIterator::ProcessCurrentFrame  0.3   2  1.3     9   0 0__0___0____000_____________000_
coreclr!WKS::gc_heap::uoh_object_marked           0.3   2  0.3     2   0 ____0______0____________________

https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/adamsitnik/5448f82b75dc00c57139cd3553dad3a8/raw/09fc56890bfc7b4665ca05c9f97cb48db74c9dc7/net60.speedscope.json

and compared them using PerfView:

Overweight report for symbols common between both files


Base (old) Time: 309.2
Test (new) Time: 686.3
Delta: 377.1
Delta %: 121.9

In this report, overweight is ratio of actual growth compared to 121.9%.
Interest level attempts to identify smaller methods which changed a lot. These are likely the most interesting frames to sart investigating
An overweight of greater than 100% indicates the symbol grew in cost more than average.
High overweights are a likely source of regressions and represent good places to investigate.
Only symbols that have at least 2% impact are shown.

Name Base Test Delta Responsibility % Overweight % Interest Level
coreclr!WKS::gc_heap::mark_object_simple1 51.0 213.1 162.1 42.98 260.62 5
coreclr!WKS::GCHeap::Promote 87.1 323.3 236.2 62.65 222.44 5
coreclr!GcInfoDecoder::ReportUntrackedSlots 30.0 110.2 80.2 21.28 219.33 5
coreclr!HndScanHandlesForGC 68.1 231.1 163.0 43.22 196.31 5
coreclr!TableScanHandles 68.1 229.1 161.0 42.69 193.90 5
coreclr!ScanConsecutiveHandlesWithoutUserData 66.1 221.1 155.0 41.10 192.32 5
coreclr!BlockScanBlocksWithoutUserData 66.1 221.1 155.0 41.10 192.32 5
coreclr!GcInfoDecoder::EnumerateLiveSlots 38.1 116.2 78.1 20.71 167.91 5
coreclr!WKS::gc_heap::mark_object_simple 87.1 253.1 166.0 44.02 156.30 5
coreclr!EECodeManager::EnumGcRefs 43.1 119.2 76.1 20.18 144.65 5
coreclr!WKS::gc_heap::plan_phase 64.0 170.0 106.0 28.11 135.82 5
coreclr!WKS::gc_heap::mark_phase 153.2 373.3 220.1 58.37 117.79 5
coreclr!WKS::gc_heap::allocate_uoh 1.0 63.0 62.0 16.44 5084.21 4
coreclr!WKS::gc_heap::a_fit_free_list_uoh_p 1.0 61.0 60.0 15.91 4920.20 4
coreclr!WKS::gc_heap::adjust_limit_clr 1.0 60.0 59.0 15.65 4838.20 4
coreclr!WKS::gc_heap::allocate_more_space 258.2 682.3 424.1 112.46 134.68 4
coreclr!GcStackCrawlCallBack 48.1 120.2 72.1 19.12 122.81 4
coreclr!WKS::gc_heap::gc1 237.2 578.3 341.1 90.45 117.91 4
coreclr!WKS::GCHeap::GarbageCollectGeneration 256.2 615.3 359.1 95.23 114.92 4
coreclr!WKS::gc_heap::garbage_collect 245.2 583.3 338.1 89.66 113.06 4
ntoskrnl!ObReferenceObjectByHandle 2.0 11.0 9.0 2.39 369.01 3

It seems that garbage collection of large arrays became more expensive compared to .NET 5.

@Maoni0 is it something expected and we should just close the issue? I'am moving this issue to 7.0 and to GC area as I did all I could on my side.

FWIW my env info:

OS=Windows 10.0.19043.1288 (21H1/May2021Update)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores

@ghost
Copy link

ghost commented Oct 14, 2021

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter "System.Text.Tests.Perf_StringBuilder.ctor_capacity"

It's not specific to any OS, it repros on both Windows and Unix:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

image

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

image

@DrewScoggins it seems that it got detected by the bot in dotnet/perf-autofiling-issues#304, but it got reported a month after it had actually happened. Do we know why?

image

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

Result Base Diff Ratio Alloc Delta Modality Operating System Bit Processor Name Base V Diff V
Slower 3050.00 5058.37 0.60 +0 Windows 10.0.19043.1165 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores 5.0.921.35908 6.0.21.41701
Slower 4089.58 5954.37 0.69 +0 Windows 10.0.20348 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 4132.44 5912.88 0.70 +0 Windows 10.0.20348 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 5679.51 6874.07 0.83 +0 Windows 10.0.18363.1621 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Slower 5710.62 7429.97 0.77 +0 Windows 8.1 X64 Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge) 5.0.921.35908 6.0.21.45401
Same 5536.15 5959.82 0.93 +0 Windows 10.0.19042.685 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 5.0.921.35908 6.0.21.41701
Same 6102.76 6609.79 0.92 +0 several? Windows 10.0.19043.1165 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake) 5.0.921.35908 6.0.21.41701
Slower 6148.23 7890.59 0.78 +0 Windows 10.0.22454 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) 5.0.921.35908 6.0.21.41701
Slower 4434.70 7083.12 0.63 +1 Windows 10.0.22451 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) 5.0.921.35908 6.0.21.41701
Slower 4078.26 5141.72 0.79 +0 Windows 10.0.19042.1165 X64 Intel Core i9-9900T CPU 2.10GHz 5.0.921.35908 6.0.21.41701
Slower 9410.39 13478.34 0.70 -200072 Windows 7 SP1 X64 Intel Core2 Duo CPU T9600 2.80GHz 5.0.721.25508 6.0.21.41701
Same 9069.64 9604.87 0.94 +0 centos 8 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 7865.46 8368.56 0.94 +0 debian 10 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Slower 20147.96 26456.29 0.76 +0 rhel 7 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 9453.04 10139.36 0.93 +0 sles 15 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 7528.38 7967.33 0.94 +0 opensuse-leap 15.3 X64 AMD EPYC 7452 5.0.921.35908 6.0.21.41701
Same 8146.26 8802.99 0.93 +0 bimodal ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Same 9764.07 10225.09 0.95 +0 ubuntu 18.04 X64 Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge) 5.0.921.35908 6.0.21.41701
Same 9382.91 9883.81 0.95 +0 alpine 3.13 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 5.0.921.35908 6.0.21.41701
Same 20906.32 22371.62 0.93 +0 ubuntu 16.04 Arm64 Unknown processor 5.0.421.11614 6.0.21.41701
Slower 5269.16 7873.35 0.67 +0 Windows 10.0.19043.1165 Arm64 Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Slower 6875.80 8930.89 0.77 +0 Windows 10.0.22000 Arm64 Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Slower 2963.91 3981.51 0.74 +0 Windows 10.0.19043.1165 X86 AMD Ryzen Threadripper PRO 3945WX 12-Cores 5.0.921.35908 6.0.21.41701
Slower 5410.13 7045.48 0.77 +0 Windows 10.0.18363.1621 X86 Intel Xeon CPU E5-1650 v4 3.60GHz 5.0.921.35908 6.0.21.41701
Slower 8075.79 9430.32 0.86 +0 Windows 10.0.19043.1165 Arm Microsoft SQ1 3.0 GHz 5.0.921.35908 6.0.21.41701
Same 9107.70 10015.71 0.91 +25 macOS Big Sur 11.5.2 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell) 5.0.921.35908 6.0.21.41701
Same 20246.89 21171.72 0.96 +0 macOS Big Sur 11.5.2 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell) 5.0.921.35908 6.0.21.41701
Same 8215.92 8089.95 1.02 +0 macOS Big Sur 11.4 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 5.0.921.35908 6.0.21.41701
Author: adamsitnik
Assignees: adamsitnik
Labels:

tenet-performance, area-GC-coreclr, needs further triage

Milestone: 7.0.0

@Maoni0
Copy link
Member

Maoni0 commented Oct 14, 2021

would it be possible to share the traces? that would be very helpful

@adamsitnik
Copy link
Member Author

@Maoni0
Copy link
Member

Maoni0 commented Oct 14, 2021

wow, both traces crashed perfview 😅

@jeffhandley
Copy link
Member

@adamsitnik Thanks for digging into this so deep!!

@jeffhandley jeffhandley removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Oct 16, 2021
@danmoseley
Copy link
Member

I like that you recorded how you did the @adamsitnik so others can learn.

@cshung
Copy link
Member

cshung commented Oct 18, 2021

Do we know how to reproduce this with a particular commit? I am wondering if I can bisect the change that caused the regression.
@adamsitnik

@danmoseley
Copy link
Member

@cshung the regression does not seem to have clear boundaries? If you go to the links above and zoom in, you can try to bisect but it seems it may be gradual.

I made a quick attempt to bracket the time and got this range b1375a9...f59a132 that has a couple GC changes in, but they may well be irrelevant.

@danmoseley
Copy link
Member

Another option might be to attempt to add a perf test for "garbage collection of large arrays" and see whether it regressed.

@mangod9
Copy link
Member

mangod9 commented Jul 19, 2022

Might be worthwhile checking how things are with regions.

@mrsharm
Copy link
Member

mrsharm commented Jul 23, 2022

@mangod9:

The trends for System.Text.Tests.Perf_StringBuilder.ctor_string(length: 100000):
image

and System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000):
image

both seemed to have improved after the initial regression i.e., when regions in the GC space were re-introduced. I have taken a look at other trends related to StringBuilder from here and they either didn't regress or exhibited similar behavior where there was a regression followed by an improvement. Subsequent regressions seem unrelated to the GC.

Should we close this issue?

@mangod9
Copy link
Member

mangod9 commented Jul 23, 2022

yeah should be ok to close if the current perf is comparable to .NET 6 numbers.

@mrsharm
Copy link
Member

mrsharm commented Jul 24, 2022

Closing since we are back to (in some cases better than) the same numbers last year. Feel free to reopen if needed.

image

image

@mrsharm mrsharm closed this as completed Jul 24, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-GC-coreclr tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

8 participants