-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET 7.0 Preview 2 Microbenchmarks Performance Study Report #66848
Comments
Tagging subscribers to this area: @dotnet/area-meta Issue DetailsDataThis time we have covered following configs:
Most of the benchmarks were run on bare-metal machines, some were executed via WSL. This would not be possible without the help from: @AndyAyersMS @carlossanlop @danmoseley @jeffhandley @sblom and @dakersnar who contributed their results and time. The full report generated by the tool is available here. The full report contains also improvements, so if you read it from the end you can see the biggest perf improvements. There are plenty of them! Again, the full historical data turned out to be extremely useful. For details about methodology please read #41871. Preview1 report can be found here. RegressionsBy design
Investigation in progress
Noise, flaky or multimodalThe following benchmarks showed up in the report generated by the tool, but were not actual regressions:
Big thanks to everyone involved!
|
Nice work @adamsitnik (and thanks @jeffhandley for big script improvements) From the table above, we could use more coverage on Windows 7/8, AMD hardware, Linux Arm64 and Arm32, and perhaps devices (?). Do you think we are far from the point where it would be feasible to ask community members to submit results as well? The collection is pretty well scripted now, but perhaps the analysis side does not scale well yet? |
" Faster " - 12395 I love this one: 😄 (cc @stephentoub) |
@adamsitnik the data looks super interesting!! I wish there were more arm64 devices, there is currently only one there. And while the table in your issue states there is M1 Max I don't see it in the reports 😞 |
I love these results. I notice that even on the same hardware results favor Windows. Any reason for that? Does Linux was more optimized and thus has less gains, or JIT for Linux works slightly worse? |
@kant2002 are you looking at particular scenarios? Generally something that's CPU bound should be closely comparable. I know sometimes calling ABI differences favor one or the other (cc @AndyAyersMS ) and of course anything that uses syscalls like IO will differ. |
Yeah, that's quite lovely. That's benefiting from us teaching regex how to spot positions in the pattern that are fixed and easily searchable. Previously we would have walked character by character looking for an 'a' through 'q'. Now we'll do an IndexOf('x') and then back off 14 characters. |
Yes, there are lots of factors, for some microbenchmarks ABI does matter a lot, see #66529 (comment) (left - windows-x64, right - linux-x64) |
I was the one that ran those... here's the results I got Net 6.0BenchmarkDotNet=v0.13.1.1706-nightly, OS=macOS Monterey 12.2.1 (21D62) [Darwin 21.3.0]
Net 7.0BenchmarkDotNet=v0.13.1.1706-nightly, OS=macOS Monterey 12.2.1 (21D62) [Darwin 21.3.0]
So "none" is 365x faster; "compiled" 83x faster. |
@danmoseley I was looking at the image which Egor share. I assume that since this is regex, this test was CPU bound, maybe I'm wrong here. |
If you need more results, I can join and execute tests on my machine. Just small guide on this will be really helpful. |
I'm interested of results on Alder Lake. Can we do something about the different cores, for example reporting our threads information to the OS? It should already be a thing for ARM big.LITTLE too. |
Since many folks asked me offline for additional statistics, I've extended the tool to print them in total, per architecture, per namespace and per OS. Legend
StatisticsTotal: 102270 Statistics per Architecture
Statistics per Operating System
Statistics per Namespace
|
Statistical Test threshold: 5%, the noise filter: 1 nsStatisticsTotal: 102270 Statistics per Architecture
Statistics per Operating System
Statistics per Namespace
|
Statistical Test threshold: 2%, the noise filter: 1 nsStatisticsTotal: 102270 Statistics per Architecture
Statistics per Operating System
Statistics per Namespace
|
@adamsitnik is it too late to ask for a report with a bigger noise filter? say, 10ns or even 100ns ? |
It's not too late, I am not going to remove this data :D
what threshold? |
the |
Statistical Test threshold: 5%, the noise filter: 10 nsStatisticsTotal: 102270 Statistics per Architecture
Statistics per Operating System
Statistics per Namespace
Statistical Test threshold: 10%, the noise filter: 100 nsStatisticsTotal: 102270 Statistics per Architecture
Statistics per Operating System
Statistics per Namespace
|
@adamsitnik is it possible to identify areas where Arm64 is disproportionately slower than x64? Eg. ,taking x64 machine A and Arm64 machine B, 95% of the benchmarks are no more than 30% slower on B, but 1% are grossly slower. Not suggesting you launch some time consuming analysis, just thinking aloud whether there's some way to detect any Arm64 perf traps that we aren't aware of, so we could take a look at improving them. |
Thanks, Adam!
That is a good idea! However, we need to find a comparable hw first I'd guess. A possible solution is to run benchmarks on M1 Native (arm64) vs M1 Rosetta (x86) and compare. I did a quick run myself in the past and was surprised to see quite a few cases where x86 emulation was faster than the native (most of them were GC-intensive): #60616 (comment) but I didn't run the full suite |
something like
is the kind of thing I'm looking for. wrt comparable hardware, I was wondering whether just using a naive ratio with x64 might at least show something. You're probably right that Rosetta is a better way. I'm still waiting on my M1 hardware ... |
thanks!
I have Win 7 and 8 machines, but they are 10+ years old and the results are noisy. Unless someone can provide stable Win 7 & 8 results, I think it's OK to not include them in the Preview runs and just run them for RTM sign off.
Definitely! In the Preview 3 email we are going to make it more obvious that it's a call for action and hopefully get more AMD results. (cc @jeffhandley)
Linux Arm64 should be our top priority.
I know that it's possible to run benchmarks on Android and iOS, but I don't know who owns machines that could be used. We should discuss this offline.
The only problem right now is data privacy. The results contain benchmark results, OS version and CPU information. No user name, no machine name etc. We could either do the paper work and get it approved or... ask the users to run the results, don't upload them and use ResultsComparer to analyze the results and report the regressions they discover as bugs. @danmoseley thoughts? |
@EgorBo please excuse me, I have provided a link to the Preview1 report. Updated link: https://github.com/adamsitnik/performance/blob/compareRuntimes/src/tools/ResultsComparer/net60_vs_net70p2_10p_2ns.md |
Ah, now I see Apple M1, nice! Looks like it benefited a lot from #64576 |
If there's no PII it should be fine to accept results, of course we would check. But perhaps if we do expand this it would be best to start with just as you suggest providing the script and results comparer instructions? Might be clearest to have a single issue where people post them. That way we can track what configurations were covered that didn't have regressions, also, and see patterns. |
Data
This time we have covered following configs:
Most of the benchmarks were run on bare-metal machines, some were executed via WSL.
This would not be possible without the help from: @AndyAyersMS @carlossanlop @danmoseley @jeffhandley @sblom and @dakersnar who contributed their results and time.
The full report generated by the tool is available here. The full report contains also improvements, so if you read it from the end you can see the biggest perf improvements. There are plenty of them!
Again, the full historical data turned out to be extremely useful. For details about methodology please read #41871. Preview1 report can be found here.
Regressions
By design
Microsoft.Extensions.DependencyInjection.TimeToFirstService*
regressionsSystem.Numerics.Tests.Perf_VectorConvert.Convert_ulong_double
System.Security.Cryptography.Tests.Perf_Rfc2898DeriveBytes.DeriveBytes
System.Numerics.Tests.Perf_BigInteger.Add(arguments: 65536,65536 bits)
Investigation in progress
System.Collections.Sort<BigStruct>.LinqQuery(Size: 512)
,System.Collections.Sort<BigStruct>.Array_Comparison(Size: 512)
,System.Collections.Sort<BigStruct>.LinqQuery(Size: 512)
System.Threading.Tests.Perf_Timer.SynchronousContention
System.Threading.Tasks.Tests.Perf_AsyncMethods.Yield
System.Text.Json.Tests.Perf_Get.GetSingle
System.IO.Tests.BinaryWriterExtendedTests.WriteAsciiString
Noise, flaky or multimodal
The following benchmarks showed up in the report generated by the tool, but were not actual regressions:
System.Threading.Tests.Perf_Interlocked.CompareExchange_long
,System.Threading.Tests.Perf_Volatile.Read_double
on x86: CompareExchange_long benchmark sometimes reports very long execution time on x86 performance#1497 (comment)MicroBenchmarks.Serializers.Json_ToString<LoginViewModel>.Jil_
- just noisyPerfLabTests.EnumPerf.EnumEquals
- noisyBig thanks to everyone involved!
The text was updated successfully, but these errors were encountered: