Creating StringBuilder with large initial capacity has regressed #59142

adamsitnik · 2021-09-15T08:01:48Z

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter "System.Text.Tests.Perf_StringBuilder.ctor_capacity"

It's not specific to any OS, it repros on both Windows and Unix:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

@DrewScoggins it seems that it got detected by the bot in dotnet/perf-autofiling-issues#304, but it got reported a month after it had actually happened. Do we know why?

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

Result	Base	Diff	Ratio	Alloc Delta	Modality	Operating System	Bit	Processor Name	Base V	Diff V
Slower	3050.00	5058.37	0.60	+0		Windows 10.0.19043.1165	X64	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Slower	4089.58	5954.37	0.69	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	4132.44	5912.88	0.70	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	5679.51	6874.07	0.83	+0		Windows 10.0.18363.1621	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	5710.62	7429.97	0.77	+0		Windows 8.1	X64	Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge)	5.0.921.35908	6.0.21.45401
Same	5536.15	5959.82	0.93	+0		Windows 10.0.19042.685	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701
Same	6102.76	6609.79	0.92	+0	several?	Windows 10.0.19043.1165	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)	5.0.921.35908	6.0.21.41701
Slower	6148.23	7890.59	0.78	+0		Windows 10.0.22454	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)	5.0.921.35908	6.0.21.41701
Slower	4434.70	7083.12	0.63	+1		Windows 10.0.22451	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)	5.0.921.35908	6.0.21.41701
Slower	4078.26	5141.72	0.79	+0		Windows 10.0.19042.1165	X64	Intel Core i9-9900T CPU 2.10GHz	5.0.921.35908	6.0.21.41701
Slower	9410.39	13478.34	0.70	-200072		Windows 7 SP1	X64	Intel Core2 Duo CPU T9600 2.80GHz	5.0.721.25508	6.0.21.41701
Same	9069.64	9604.87	0.94	+0		centos 8	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	7865.46	8368.56	0.94	+0		debian 10	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	20147.96	26456.29	0.76	+0		rhel 7	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	9453.04	10139.36	0.93	+0		sles 15	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	7528.38	7967.33	0.94	+0		opensuse-leap 15.3	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	8146.26	8802.99	0.93	+0	bimodal	ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Same	9764.07	10225.09	0.95	+0		ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)	5.0.921.35908	6.0.21.41701
Same	9382.91	9883.81	0.95	+0		alpine 3.13	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	5.0.921.35908	6.0.21.41701
Same	20906.32	22371.62	0.93	+0		ubuntu 16.04	Arm64	Unknown processor	5.0.421.11614	6.0.21.41701
Slower	5269.16	7873.35	0.67	+0		Windows 10.0.19043.1165	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	6875.80	8930.89	0.77	+0		Windows 10.0.22000	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	2963.91	3981.51	0.74	+0		Windows 10.0.19043.1165	X86	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Slower	5410.13	7045.48	0.77	+0		Windows 10.0.18363.1621	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	8075.79	9430.32	0.86	+0		Windows 10.0.19043.1165	Arm	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Same	9107.70	10015.71	0.91	+25		macOS Big Sur 11.5.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)	5.0.921.35908	6.0.21.41701
Same	20246.89	21171.72	0.96	+0		macOS Big Sur 11.5.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)	5.0.921.35908	6.0.21.41701
Same	8215.92	8089.95	1.02	+0		macOS Big Sur 11.4	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701

ghost · 2021-09-15T08:01:51Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter "System.Text.Tests.Perf_StringBuilder.ctor_capacity"

It's not specific to any OS, it repros on both Windows and Unix:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

@DrewScoggins it seems that it got detected by the bot in dotnet/perf-autofiling-issues#304, but it got reported a month after it had actually happened. Do we know why?

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

Result	Base	Diff	Ratio	Alloc Delta	Modality	Operating System	Bit	Processor Name	Base V	Diff V
Slower	3050.00	5058.37	0.60	+0		Windows 10.0.19043.1165	X64	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Slower	4089.58	5954.37	0.69	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	4132.44	5912.88	0.70	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	5679.51	6874.07	0.83	+0		Windows 10.0.18363.1621	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	5710.62	7429.97	0.77	+0		Windows 8.1	X64	Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge)	5.0.921.35908	6.0.21.45401
Same	5536.15	5959.82	0.93	+0		Windows 10.0.19042.685	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701
Same	6102.76	6609.79	0.92	+0	several?	Windows 10.0.19043.1165	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)	5.0.921.35908	6.0.21.41701
Slower	6148.23	7890.59	0.78	+0		Windows 10.0.22454	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)	5.0.921.35908	6.0.21.41701
Slower	4434.70	7083.12	0.63	+1		Windows 10.0.22451	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)	5.0.921.35908	6.0.21.41701
Slower	4078.26	5141.72	0.79	+0		Windows 10.0.19042.1165	X64	Intel Core i9-9900T CPU 2.10GHz	5.0.921.35908	6.0.21.41701
Slower	9410.39	13478.34	0.70	-200072		Windows 7 SP1	X64	Intel Core2 Duo CPU T9600 2.80GHz	5.0.721.25508	6.0.21.41701
Same	9069.64	9604.87	0.94	+0		centos 8	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	7865.46	8368.56	0.94	+0		debian 10	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	20147.96	26456.29	0.76	+0		rhel 7	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	9453.04	10139.36	0.93	+0		sles 15	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	7528.38	7967.33	0.94	+0		opensuse-leap 15.3	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	8146.26	8802.99	0.93	+0	bimodal	ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Same	9764.07	10225.09	0.95	+0		ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)	5.0.921.35908	6.0.21.41701
Same	9382.91	9883.81	0.95	+0		alpine 3.13	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	5.0.921.35908	6.0.21.41701
Same	20906.32	22371.62	0.93	+0		ubuntu 16.04	Arm64	Unknown processor	5.0.421.11614	6.0.21.41701
Slower	5269.16	7873.35	0.67	+0		Windows 10.0.19043.1165	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	6875.80	8930.89	0.77	+0		Windows 10.0.22000	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	2963.91	3981.51	0.74	+0		Windows 10.0.19043.1165	X86	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Slower	5410.13	7045.48	0.77	+0		Windows 10.0.18363.1621	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	8075.79	9430.32	0.86	+0		Windows 10.0.19043.1165	Arm	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Same	9107.70	10015.71	0.91	+25		macOS Big Sur 11.5.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)	5.0.921.35908	6.0.21.41701
Same	20246.89	21171.72	0.96	+0		macOS Big Sur 11.5.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)	5.0.921.35908	6.0.21.41701
Same	8215.92	8089.95	1.02	+0		macOS Big Sur 11.4	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.Runtime`, `tenet-performance`
Milestone:	-

adamsitnik · 2021-09-15T12:39:34Z

Some other benchmarks that use large string builders like System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000) and System.Text.Tests.Perf_StringBuilder.ctor_string(length: 100000) also seem to be slightly regressed.

DrewScoggins · 2021-09-15T16:37:54Z

@DrewScoggins it seems that it got detected by the bot in dotnet/perf-autofiling-issues#304, but it got reported a month after it had actually happened. Do we know why?

This was because we did a rerun of the bot over that time period, after we did lab wide reruns over that time period because of missing data.

adamsitnik · 2021-10-14T15:10:14Z

Ok, so I investigated the regression. Here is what I've found:

All this benchmark does, is creating a StringBuilder instance with quite large capacity:

https://github.com/dotnet/performance/blob/a36bec8eed2a3139d067de0d9d81e72422328de7/src/benchmarks/micro/libraries/System.Text/Perf.StringBuilder.cs#L34-L37

Which is more or less exercising GC.AllocateUninitializedArray<char>(capacity) and... garbage collection (see profiler output below).

runtime/src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs

Lines 180 to 181 in cf49643

    
           m_MaxCapacity = maxCapacity; 
        
           m_ChunkChars = GC.AllocateUninitializedArray<char>(capacity);

GC.AllocateUninitializedArray delagates most of the work to AllocateNewArray:

runtime/src/coreclr/System.Private.CoreLib/src/System/GC.cs

Line 695 in cf49643

    
           return Unsafe.As<T[]>(AllocateNewArray(typeof(T[]).TypeHandle.Value, length, flags));

which according to git blame was not modified for at least 2 years which does not answer the question why it has regressed.

By using the following command:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter *ctor_capacity --bdn-arguments "--invocationCount 75088 --iterationCount 20 --profiler ETW"

I was able to capture two ETW trace files, filter them to a single benchmark iteration that performed exactly the same work (see --invocationCount 75088 in the arguments above) using Start&Stop BDN events:

After loading all the symbols, disabling folding, filtering out Finalizer and Tiered JIT threads I got the following output for .NET 5:

Name	Exc %	Exc	Inc %	Inc	Fold	When
coreclr!WKS::gc_heap::plan_phase	18.8	58	20.7	64	0	12_3114422114112112231261431_230
coreclr!WKS::gc_heap::mark_object_simple1	16.5	51	16.5	51	0	212211_1121_152__112_12_43222122
coreclr!WKS::gc_heap::mark_object_simple	11.7	36.088	28.2	87.088	0	34442311222025223332124154223122
ntdll!LdrpDispatchUserCallTarget	3.6	11	3.6	11	0	____1__01__11______1_2____1__1_1
ntoskrnl!EtwpWriteUserEvent	3.2	10	7.4	23	0	1__1_1__0121111011_12_____00_11_
coreclr!GcInfoDecoder::GcInfoDecoder	2.3	7	2.3	7	0	___1___111_1_____________001____
ntoskrnl!KiSystemServiceUser	1.9	6	1.9	6	0	_____11_________1___11________1_
coreclr!EEJitManager::JitCodeToMethodInfo	1.3	4	1.3	4	0	____1_________11___________00___
coreclr!GcInfoDecoder::EnumerateLiveSlots	1.0	3	12.3	38.142	0	2421_13_111__4___11__13_23112_12
coreclr!WKS::GCHeap::Alloc	1.0	3	98.4	304.230	0	99999999999998998999999999999998
coreclr!WKS::gc_heap::mark_phase	1.0	3	49.6	153.230	0	76555444443545464445455164367336
coreclr!ScanConsecutiveHandlesWithoutUserData	1.0	3	21.4	66.088	0	32332311122121223222121131123122
coreclr!WKS::GCHeap::UpdatePostGCCounters	1.0	3	2.9	9	0	1_1________1_21__________1___11_
ntdll!EtwpEventWriteFull	1.0	3	12.6	39	0	1_11_2_10133112031_231____11112_
coreclr!WKS::gc_heap::sweep_uoh_objects	1.0	3	1.0	3	0	___1________1_1_________________
coreclr!EECodeManager::EnsureCallerContextIsValid	1.0	3	2.6	8	0	____1_________111_11_______11___
ntoskrnl!ObfDereferenceObjectWithTag	1.0	3	1.0	3	0	__1_______________________11____
ntoskrnl!HvlGetReferenceTimeUsingTscPage	1.0	3	1.0	3	0	_____________1___1__1___________
ntdll!RtlpUnwindPrologue	1.0	3	1.0	3	0	____1___________1_1_____________
ntoskrnl!EtwpReserveTraceBuffer	1.0	3	2.6	8	0	___________101_011__1________1__
coreclr!GcSlotDecoder::DecodeSlotTable	0.7	2.142	0.7	2.142	0	_11____________________________0
coreclr!StackFrameIterator::ProcessCurrentFrame	0.6	2	4.2	13	0	____1__1______121_12__1____21___
ntdll!NtFlushProcessWriteBuffers	0.6	2	0.6	2	0	______________________00____1___
coreclr!WKS::gc_heap::gc1	0.6	2	76.7	237.230	0	98686698867798785567777789777676
ntoskrnl!ObpReferenceObjectByHandleWithTag	0.6	2	0.6	2	0	_____________________________11_
ntoskrnl!memcpy	0.6	2	0.6	2	0	___________11___________________
ntdll!RtlQueryPerformanceCounter	0.6	2	0.6	2	0	____________________1____1______
coreclr!ReadyToRunJitManager::JitCodeToMethodInfo	0.6	2	0.6	2	0	_______1_________1______________
ntoskrnl!MiLockWorkingSetShared	0.6	2	0.6	2	0	______________1________________1
ntdll!NtTraceEvent	0.6	2	11.6	36	0	1_11_2_10133112031_131____11_12_
coreclr!MethodTable::_GetFullyQualifiedNameForClass	0.6	2	5.8	18	0	00102___1_00_____12___111_1__21_
coreclr!SString::Replace	0.6	2	2.6	8	0	00__1____________11____1__1__11_
ntdll!NtQueryInformationProcess	0.6	2	2.3	7	0	______1_11____1______1__1______1
coreclr!GcSlotDecoder::GetSlotDesc	0.6	2	0.6	2	0	_____1______________________1___
coreclr!GCInterface::AllocateNewArray	0.6	2	99.0	306.230	0	99998999999999998999999999999999
ntoskrnl!NtTraceEvent	0.6	2	9.1	28	0	1_11_1__0121112021_12_____11_11_
coreclr!memset	0.6	2	0.6	2	0	____1__1________________________

https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/adamsitnik/5448f82b75dc00c57139cd3553dad3a8/raw/09fc56890bfc7b4665ca05c9f97cb48db74c9dc7/net50.speedscope.json

and .NET 6:

Name	Exc %	Exc	Inc %	Inc	Fold	When
coreclr!WKS::gc_heap::mark_object_simple1	31.0	213.084	31.0	213.084	0	22343352340341223235332221412241
coreclr!WKS::gc_heap::plan_phase	23.0	158	24.8	170	0	02112211243112243223014223221124
coreclr!WKS::gc_heap::find_first_object	9.9	68.240	9.9	68.240	0	111010000011000001002000100_0100
coreclr!WKS::gc_heap::mark_object_simple	5.8	40	36.9	253.084	0	23443453451443323335333322424342
ntoskrnl!EtwpWriteUserEvent	2.5	17	5.0	34	0	0_100000__000000_00_0_00_10100_0
ntoskrnl!ObpReferenceObjectByHandleWithTag	1.5	10	1.5	10	0	00___0_0____0__0____00__0____0__
ntoskrnl!KiSystemServiceUser	1.0	7	1.0	7	0	______0_______00_________0_0____
ntoskrnl!KiDeferredReadySingleThread	0.9	6	0.9	6	0	___0______0__00___0__0__________
ntdll!LdrpDispatchUserCallTarget	0.9	6	0.9	6	0	________0___000__________0______
ntoskrnl!HvlGetReferenceTimeUsingTscPage	0.9	6	0.9	6	0	__0___0____0___0_________000____
ntdll!NtTraceEvent	0.7	5	8.2	56	0	101000000_100010_01_000001011001
coreclr!SString::ConvertToUnicode	0.7	5	1.3	9	0	___0_______0_____00____00____1__
ntoskrnl!memcpy	0.7	5	0.7	5	0	__0__0________________00___0___0
coreclr!GcSlotDecoder::DecodeSlotTable	0.6	4	0.6	4	0	____00______0________________0__
coreclr!WKS::gc_heap::desired_new_allocation	0.6	4	1.3	9	0	___0_0___________00____00__0____
coreclr!WKS::gc_heap::relocate_survivors	0.4	3	0.4	3	0	________0_____0___________0_____
coreclr!FireEtwGCAllocationTick_V4	0.4	3	5.7	39	0	0_000001__000000_0__00__01010000
coreclr!WKS::gc_heap::sweep_uoh_objects	0.4	3	0.7	5	0	_0__0______0_____________0_____0
ntoskrnl!NtTraceEvent	0.4	3	6.8	47	0	00100000__100000_00_000001010000
coreclr!SegmentScanByTypeMap	0.4	3	0.6	4	0	______0___0_00__________________
ntdll!RtlQueryPerformanceCounter	0.4	3	0.4	3	0	___________0_______________0___0
ntdll!NtQuerySystemInformation	0.4	3	0.6	4	0	________________________0__0__0_
coreclr!StackFrameIterator::Filter	0.3	2	0.3	2	0	________________________0_______
ntoskrnl!KiExitDispatcher	0.3	2	1.2	8	0	___0______0__00_0_0__0____0_____
coreclr!GcInfoDecoder::GcInfoDecoder	0.3	2	0.6	4	0	______o0_____0______0_0_________
coreclr!WKS::gc_heap::generation_size	0.3	2	0.3	2	0	______0_________0_______________
ntdll!EtwpEventWriteFull	0.3	2	8.5	58	0	201000000_100010_01_000001011001
ntoskrnl!KiTryUnwaitThread	0.3	2	0.3	2	0	______0_____0___________________
coreclr!GcInfoDecoder::ReportUntrackedSlots	0.3	2	16.1	110.240	0	221211112012010112102102110_1111
coreclr!VirtualCallStubManager::ReclaimAll	0.3	2	0.3	2	0	_0________0_____________________
coreclr!StackFrameIterator::ProcessCurrentFrame	0.3	2	1.3	9	0	0__0___0____000_____________000_
coreclr!WKS::gc_heap::uoh_object_marked	0.3	2	0.3	2	0	____0______0____________________

https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/adamsitnik/5448f82b75dc00c57139cd3553dad3a8/raw/09fc56890bfc7b4665ca05c9f97cb48db74c9dc7/net60.speedscope.json

and compared them using PerfView:

Overweight report for symbols common between both files

Base (old) Time:	309.2
Test (new) Time:	686.3
Delta:	377.1
Delta %:	121.9

In this report, overweight is ratio of actual growth compared to 121.9%.
Interest level attempts to identify smaller methods which changed a lot. These are likely the most interesting frames to sart investigating
An overweight of greater than 100% indicates the symbol grew in cost more than average.
High overweights are a likely source of regressions and represent good places to investigate.
Only symbols that have at least 2% impact are shown.

Name	Base	Test	Delta	Responsibility %	Overweight %	Interest Level
coreclr!WKS::gc_heap::mark_object_simple1	51.0	213.1	162.1	42.98	260.62	5
coreclr!WKS::GCHeap::Promote	87.1	323.3	236.2	62.65	222.44	5
coreclr!GcInfoDecoder::ReportUntrackedSlots	30.0	110.2	80.2	21.28	219.33	5
coreclr!HndScanHandlesForGC	68.1	231.1	163.0	43.22	196.31	5
coreclr!TableScanHandles	68.1	229.1	161.0	42.69	193.90	5
coreclr!ScanConsecutiveHandlesWithoutUserData	66.1	221.1	155.0	41.10	192.32	5
coreclr!BlockScanBlocksWithoutUserData	66.1	221.1	155.0	41.10	192.32	5
coreclr!GcInfoDecoder::EnumerateLiveSlots	38.1	116.2	78.1	20.71	167.91	5
coreclr!WKS::gc_heap::mark_object_simple	87.1	253.1	166.0	44.02	156.30	5
coreclr!EECodeManager::EnumGcRefs	43.1	119.2	76.1	20.18	144.65	5
coreclr!WKS::gc_heap::plan_phase	64.0	170.0	106.0	28.11	135.82	5
coreclr!WKS::gc_heap::mark_phase	153.2	373.3	220.1	58.37	117.79	5
coreclr!WKS::gc_heap::allocate_uoh	1.0	63.0	62.0	16.44	5084.21	4
coreclr!WKS::gc_heap::a_fit_free_list_uoh_p	1.0	61.0	60.0	15.91	4920.20	4
coreclr!WKS::gc_heap::adjust_limit_clr	1.0	60.0	59.0	15.65	4838.20	4
coreclr!WKS::gc_heap::allocate_more_space	258.2	682.3	424.1	112.46	134.68	4
coreclr!GcStackCrawlCallBack	48.1	120.2	72.1	19.12	122.81	4
coreclr!WKS::gc_heap::gc1	237.2	578.3	341.1	90.45	117.91	4
coreclr!WKS::GCHeap::GarbageCollectGeneration	256.2	615.3	359.1	95.23	114.92	4
coreclr!WKS::gc_heap::garbage_collect	245.2	583.3	338.1	89.66	113.06	4
ntoskrnl!ObReferenceObjectByHandle	2.0	11.0	9.0	2.39	369.01	3

It seems that garbage collection of large arrays became more expensive compared to .NET 5.

@Maoni0 is it something expected and we should just close the issue? I'am moving this issue to 7.0 and to GC area as I did all I could on my side.

FWIW my env info:

OS=Windows 10.0.19043.1288 (21H1/May2021Update)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores

ghost · 2021-10-14T15:10:37Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter "System.Text.Tests.Perf_StringBuilder.ctor_capacity"

It's not specific to any OS, it repros on both Windows and Unix:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Text.Tests.Perf_StringBuilder.ctor_capacity(length%3a%20100000).html

@DrewScoggins it seems that it got detected by the bot in dotnet/perf-autofiling-issues#304, but it got reported a month after it had actually happened. Do we know why?

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

Result	Base	Diff	Ratio	Alloc Delta	Modality	Operating System	Bit	Processor Name	Base V	Diff V
Slower	3050.00	5058.37	0.60	+0		Windows 10.0.19043.1165	X64	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Slower	4089.58	5954.37	0.69	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	4132.44	5912.88	0.70	+0		Windows 10.0.20348	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	5679.51	6874.07	0.83	+0		Windows 10.0.18363.1621	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	5710.62	7429.97	0.77	+0		Windows 8.1	X64	Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge)	5.0.921.35908	6.0.21.45401
Same	5536.15	5959.82	0.93	+0		Windows 10.0.19042.685	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701
Same	6102.76	6609.79	0.92	+0	several?	Windows 10.0.19043.1165	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)	5.0.921.35908	6.0.21.41701
Slower	6148.23	7890.59	0.78	+0		Windows 10.0.22454	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)	5.0.921.35908	6.0.21.41701
Slower	4434.70	7083.12	0.63	+1		Windows 10.0.22451	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)	5.0.921.35908	6.0.21.41701
Slower	4078.26	5141.72	0.79	+0		Windows 10.0.19042.1165	X64	Intel Core i9-9900T CPU 2.10GHz	5.0.921.35908	6.0.21.41701
Slower	9410.39	13478.34	0.70	-200072		Windows 7 SP1	X64	Intel Core2 Duo CPU T9600 2.80GHz	5.0.721.25508	6.0.21.41701
Same	9069.64	9604.87	0.94	+0		centos 8	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	7865.46	8368.56	0.94	+0		debian 10	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Slower	20147.96	26456.29	0.76	+0		rhel 7	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	9453.04	10139.36	0.93	+0		sles 15	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	7528.38	7967.33	0.94	+0		opensuse-leap 15.3	X64	AMD EPYC 7452	5.0.921.35908	6.0.21.41701
Same	8146.26	8802.99	0.93	+0	bimodal	ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Same	9764.07	10225.09	0.95	+0		ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)	5.0.921.35908	6.0.21.41701
Same	9382.91	9883.81	0.95	+0		alpine 3.13	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	5.0.921.35908	6.0.21.41701
Same	20906.32	22371.62	0.93	+0		ubuntu 16.04	Arm64	Unknown processor	5.0.421.11614	6.0.21.41701
Slower	5269.16	7873.35	0.67	+0		Windows 10.0.19043.1165	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	6875.80	8930.89	0.77	+0		Windows 10.0.22000	Arm64	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Slower	2963.91	3981.51	0.74	+0		Windows 10.0.19043.1165	X86	AMD Ryzen Threadripper PRO 3945WX 12-Cores	5.0.921.35908	6.0.21.41701
Slower	5410.13	7045.48	0.77	+0		Windows 10.0.18363.1621	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.921.35908	6.0.21.41701
Slower	8075.79	9430.32	0.86	+0		Windows 10.0.19043.1165	Arm	Microsoft SQ1 3.0 GHz	5.0.921.35908	6.0.21.41701
Same	9107.70	10015.71	0.91	+25		macOS Big Sur 11.5.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)	5.0.921.35908	6.0.21.41701
Same	20246.89	21171.72	0.96	+0		macOS Big Sur 11.5.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)	5.0.921.35908	6.0.21.41701
Same	8215.92	8089.95	1.02	+0		macOS Big Sur 11.4	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.921.35908	6.0.21.41701

Author:	adamsitnik
Assignees:	adamsitnik
Labels:	`tenet-performance`, `area-GC-coreclr`, `needs further triage`
Milestone:	7.0.0

Maoni0 · 2021-10-14T17:25:41Z

would it be possible to share the traces? that would be very helpful

adamsitnik · 2021-10-14T17:31:41Z

would it be possible to share the traces?

sure!

https://microsofteur-my.sharepoint.com/:f:/g/personal/adsitnik_microsoft_com/Eg8TlzuJA-ZAvxDxzib7hCMBQUAtV3xm5hJi-Qr-cZ4mKg?e=ruwRHQ

Maoni0 · 2021-10-14T18:26:06Z

wow, both traces crashed perfview 😅

jeffhandley · 2021-10-16T06:05:38Z

@adamsitnik Thanks for digging into this so deep!!

danmoseley · 2021-10-16T13:42:30Z

I like that you recorded how you did the @adamsitnik so others can learn.

cshung · 2021-10-18T18:17:04Z

Do we know how to reproduce this with a particular commit? I am wondering if I can bisect the change that caused the regression.
@adamsitnik

danmoseley · 2021-10-18T20:49:37Z

@cshung the regression does not seem to have clear boundaries? If you go to the links above and zoom in, you can try to bisect but it seems it may be gradual.

I made a quick attempt to bracket the time and got this range b1375a9...f59a132 that has a couple GC changes in, but they may well be irrelevant.

danmoseley · 2021-10-18T20:50:54Z

Another option might be to attempt to add a perf test for "garbage collection of large arrays" and see whether it regressed.

mangod9 · 2022-07-19T15:38:39Z

Might be worthwhile checking how things are with regions.

mrsharm · 2022-07-23T19:17:30Z

@mangod9:

The trends for System.Text.Tests.Perf_StringBuilder.ctor_string(length: 100000):

and System.Text.Tests.Perf_StringBuilder.ToString_MultipleSegments(length: 100000):

both seemed to have improved after the initial regression i.e., when regions in the GC space were re-introduced. I have taken a look at other trends related to StringBuilder from here and they either didn't regress or exhibited similar behavior where there was a regression followed by an improvement. Subsequent regressions seem unrelated to the GC.

Should we close this issue?

mangod9 · 2022-07-23T19:19:27Z

yeah should be ok to close if the current perf is comparable to .NET 6 numbers.

mrsharm · 2022-07-24T01:00:20Z

Closing since we are back to (in some cases better than) the same numbers last year. Feel free to reopen if needed.

adamsitnik added area-System.Runtime tenet-performance Performance related issue labels Sep 15, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 15, 2021

adamsitnik mentioned this issue Sep 17, 2021

.NET 6.0 Microbenchmarks Performance Study Report #59272

Closed

19 tasks

jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Sep 30, 2021

jeffhandley added this to the 6.0.0 milestone Sep 30, 2021

jeffhandley assigned adamsitnik Oct 7, 2021

adamsitnik modified the milestones: 6.0.0, 7.0.0 Oct 14, 2021

adamsitnik added area-GC-coreclr and removed area-System.Runtime labels Oct 14, 2021

adamsitnik removed their assignment Oct 14, 2021

adamsitnik mentioned this issue Oct 14, 2021

new T[], GC.AllocateArray and GC.AllocateUninitializedArray micro benchmarks dotnet/performance#2072

Merged

jeffhandley removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Oct 16, 2021

mangod9 assigned mrsharm Jul 19, 2022

mrsharm closed this as completed Jul 24, 2022

ghost locked as resolved and limited conversation to collaborators Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating StringBuilder with large initial capacity has regressed #59142

Creating StringBuilder with large initial capacity has regressed #59142

adamsitnik commented Sep 15, 2021

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

ghost commented Sep 15, 2021

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

adamsitnik commented Sep 15, 2021 •

edited

Loading

DrewScoggins commented Sep 15, 2021

adamsitnik commented Oct 14, 2021 •

edited

Loading

ghost commented Oct 14, 2021

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

Maoni0 commented Oct 14, 2021

adamsitnik commented Oct 14, 2021

Maoni0 commented Oct 14, 2021

jeffhandley commented Oct 16, 2021

danmoseley commented Oct 16, 2021

cshung commented Oct 18, 2021

danmoseley commented Oct 18, 2021

danmoseley commented Oct 18, 2021

mangod9 commented Jul 19, 2022

mrsharm commented Jul 23, 2022

mangod9 commented Jul 23, 2022

mrsharm commented Jul 24, 2022

Creating StringBuilder with large initial capacity has regressed #59142

Creating StringBuilder with large initial capacity has regressed #59142

Comments

adamsitnik commented Sep 15, 2021

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

ghost commented Sep 15, 2021

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

adamsitnik commented Sep 15, 2021 • edited Loading

DrewScoggins commented Sep 15, 2021

adamsitnik commented Oct 14, 2021 • edited Loading

Overweight report for symbols common between both files

ghost commented Oct 14, 2021

System.Text.Tests.Perf_StringBuilder.ctor_capacity(length: 100000)

Maoni0 commented Oct 14, 2021

adamsitnik commented Oct 14, 2021

Maoni0 commented Oct 14, 2021

jeffhandley commented Oct 16, 2021

danmoseley commented Oct 16, 2021

cshung commented Oct 18, 2021

danmoseley commented Oct 18, 2021

danmoseley commented Oct 18, 2021

mangod9 commented Jul 19, 2022

mrsharm commented Jul 23, 2022

mangod9 commented Jul 23, 2022

mrsharm commented Jul 24, 2022

adamsitnik commented Sep 15, 2021 •

edited

Loading

adamsitnik commented Oct 14, 2021 •

edited

Loading