Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf -40%] Benchstone.BenchF.Simpsn.Test #534

Open
performanceautofiler bot opened this issue Jun 12, 2020 · 3 comments
Open

[Perf -40%] Benchstone.BenchF.Simpsn.Test #534

performanceautofiler bot opened this issue Jun 12, 2020 · 3 comments

Comments

@performanceautofiler
Copy link

Run Information

Architecture x64
OS Windows 10.0.18362
Changes diff

Regressions in Benchstone.BenchF.Simpsn

Benchmark Baseline Test Test/Base Modality Baseline Outlier
Test 164.06 ms 230.26 ms 1.40 False

graph
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'Benchstone.BenchF.Simpsn*';

Histogram

Benchstone.BenchF.Simpsn.Test

[162358394.287 ; 177057262.379) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[177057262.379 ; 185020170.713) | @
[185020170.713 ; 199719038.804) | 
[199719038.804 ; 214417906.896) | 
[214417906.896 ; 223408815.954) | 
[223408815.954 ; 239552934.046) | @@@@@@@

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@adamsitnik
Copy link
Collaborator

Looks very real! cc @AndyAyersMS

@DrewScoggins Another good candidate for transferring to runtime repo

@AndyAyersMS
Copy link
Collaborator

Focusing just on the jump in runtime [commits)[https://github.com/dotnet/runtime/compare/8527a991b91cabc9bec6f6abe57a52b5bb5e154d...af1db3eccbc238745e1d163458c92c1bfa650fbd] it looks like we lost the ability to hoist calls to Exp. This test makes 4 calls to Exp in its inner loop, but two are loop invariant.

Inner loop at first commit of above:

G_M28296_IG04:
       C5F857C0             vxorps   xmm0, xmm0
       C5FB2AC7             vcvtsi2sd  xmm0, edi
       C5FB5905F0000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD24]
       C5F057C9             vxorps   xmm1, xmm1
       C57B58D1             vaddsd   xmm10, xmm0, xmm1
       C5FB1005E8000000     vmovsd   xmm0, qword ptr [reloc @RWD32]
       C4C17857C2           vxorps   xmm0, xmm10
       C5FB5905E3000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD40]
       E8B6D4B45F           call     System.Math:Exp(double):double
       C5FB58F6             vaddsd   xmm6, xmm0, xmm6
       C5AB5805DA000000     vaddsd   xmm0, xmm10, qword ptr [reloc @RWD48]
       C5FB100DC2000000     vmovsd   xmm1, qword ptr [reloc @RWD32]
       C5F857C1             vxorps   xmm0, xmm1
       C5FB5905CE000000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD56]
       E891D4B45F           call     System.Math:Exp(double):double
       C5FB58FF             vaddsd   xmm7, xmm0, xmm7
       C5C35905C5000000     vmulsd   xmm0, xmm7, qword ptr [reloc @RWD64]
       C5BB58C0             vaddsd   xmm0, xmm8, xmm0
       C5CB590DC1000000     vmulsd   xmm1, xmm6, qword ptr [reloc @RWD72]
       C5FB58C1             vaddsd   xmm0, xmm0, xmm1
       C5B358C0             vaddsd   xmm0, xmm9, xmm0
       C5FB5935B9000000     vmulsd   xmm6, xmm0, qword ptr [reloc @RWD80]
						;; bbWeight=16    PerfScore 1045.33
G_M28296_IG05:
       FFC7                 inc      edi
       83FF63               cmp      edi, 99
       0F8E76FFFFFF         jle      G_M28296_IG04
						;; bbWeight=16    PerfScore 24.00

inner loop with current main

G_M28296_IG04:
       C5F857C0             vxorps   xmm0, xmm0
       C5FB2AC7             vcvtsi2sd  xmm0, edi
       C5FB59050D010000     vmulsd   xmm0, xmm0, qword ptr [reloc @RWD40]
       C5F057C9             vxorps   xmm1, xmm1
       C57B58D1             vaddsd   xmm10, xmm0, xmm1
       C5FB10050D010000     vmovsd   xmm0, qword ptr [reloc @RWD56]
       C4C17857C2           vxorps   xmm0, xmm10
       C5FB58C0             vaddsd   xmm0, xmm0, xmm0
       E83F87B45F           call     System.Math:Exp(double):double
       C4417B58C0           vaddsd   xmm8, xmm0, xmm8
       C5AB580502010000     vaddsd   xmm0, xmm10, qword ptr [reloc @RWD72]
       C5FB100DEA000000     vmovsd   xmm1, qword ptr [reloc @RWD56]
       C5F857C1             vxorps   xmm0, xmm1
       C5FB58C0             vaddsd   xmm0, xmm0, xmm0
       E81D87B45F           call     System.Math:Exp(double):double
       C4417B58C9           vaddsd   xmm9, xmm0, xmm9
       C5FB1005F0000000     vmovsd   xmm0, qword ptr [reloc @RWD88]
       E80B87B45F           call     System.Math:Exp(double):double
       C5B3590DF3000000     vmulsd   xmm1, xmm9, qword ptr [reloc @RWD104]
       C5FB58C1             vaddsd   xmm0, xmm0, xmm1
       C4C13B58C8           vaddsd   xmm1, xmm8, xmm8
       C57B58C1             vaddsd   xmm8, xmm0, xmm1
       C5FB1005EE000000     vmovsd   xmm0, qword ptr [reloc @RWD120]
       E8E986B45F           call     System.Math:Exp(double):double
       C4C17B58C0           vaddsd   xmm0, xmm0, xmm8
       C57B5905EC000000     vmulsd   xmm8, xmm0, qword ptr [reloc @RWD136]
						;; bbWeight=16    PerfScore 1045.33
G_M28296_IG05:
       FFC7                 inc      edi
       83FF63               cmp      edi, 99
       0F8E64FFFFFF         jle      G_M28296_IG04
						;; bbWeight=16    PerfScore 24.00

Nothing jumps out from the commit range as a likely suspect, so will start bisecting.

@AndyAyersMS
Copy link
Collaborator

This regressed because of dotnet/runtime#33024.

That change can introduce assignment trees into what were previously side-effect free computations, inhibiting the computations from being hoisted out of loops.

Old IR during optHoistLoopCode:

N005 ( 46, 15) [000227] --C---------                 |  +--*  INTRINSIC double exp $285
N004 ( 10, 11) [000226] ------------                 |  |  \--*  MUL       double $49
N002 (  2,  3) [000224] ------------                 |  |     +--*  NEG       double $49
N001 (  1,  2) [000098] ------------                 |  |     |  \--*  LCL_VAR   double V00 loc0         u:2 $40
N003 (  3,  4) [000225] ------------                 |  |     \--*  CNS_DBL   double 2.0000000000000000 $45

new intrinsic IR:

N009 ( 45, 15) [000227] -AC---------                 |  +--*  INTRINSIC double exp $285
N008 (  9, 11) [000260] -A----------                 |  |  \--*  ADD       double $49
N006 (  3,  5) [000258] -A----------                 |  |     +--*  COMMA     double $49
N004 (  2,  3) [000256] -A------R---                 |  |     |  +--*  ASG       double $49
N003 (  1,  2) [000255] D------N----                 |  |     |  |  +--*  LCL_VAR   double V24 tmp10        d:2 $49
N002 (  2,  3) [000224] ------------                 |  |     |  |  \--*  NEG       double $49
N001 (  1,  2) [000098] ------------                 |  |     |  |     \--*  LCL_VAR   double V00 loc0         u:2 $40
N005 (  1,  2) [000257] ------------                 |  |     |  \--*  LCL_VAR   double V24 tmp10        u:2 $49
N007 (  1,  2) [000259] ------------                 |  |     \--*  LCL_VAR   double V24 tmp10        u:2 (last use) $49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants