Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks for DateTime.Day/.Month/.Year/.DayOfYear #2549

Merged
merged 3 commits into from
Aug 18, 2022

Conversation

SergeiPavlov
Copy link
Contributor

@SergeiPavlov SergeiPavlov commented Aug 5, 2022

Related to DateTime optimizations in
dotnet/runtime#72712
dotnet/runtime#73277

@SergeiPavlov SergeiPavlov changed the title Benchmark for DateTime.DayOfYear Benchmarks for DateTime.Day/.Month/.Year/.DayOfYear Aug 5, 2022
@@ -41,5 +43,17 @@ public static IEnumerable<string> ToString_MemberData()

[Benchmark]
public DateTime ParseO() => DateTime.ParseExact("1996-06-03T22:15:00.0000000", "o", null);

[Benchmark]
public int Day() => utcNow.Day;
Copy link
Member

@danmoseley danmoseley Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What times is BDN reporting for these benchmarks, on your machine?

@adamsitnik I know we have guidance for extremely fast iterations here but I can't follow it. As you explained to me earlier, it's clearly not expected that one manually adds loops/repetitions inside the benchmark method to make it take 100ms. But in the OperationsPerInvoke section there, it suggests it's necessary to reach 100ms. If there's a IterationSetup, I think it's clear to me why 100ms is necessary -- since otherwise IterationSetup is runningtoo much.

In short -- when there is no per iteration setup required, is a benchmark method ever so extremely fast that we're expected to add a loop/repetition? and if so how fast?

Copy link
Contributor Author

@SergeiPavlov SergeiPavlov Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What times is BDN reporting for these benchmarks, on your machine?

On Windows x64 i9-9900 3.10GHz, .net6.0:

|                         Method |     Mean |
|------------------------------- |---------:|
|             'Old DateTime.Day' | 4.962 ns |
|       'Optimized DateTime.Day' | 2.318 ns |
|           'Old DateTime.Month' | 3.954 ns |
|     'Optimized DateTime.Month' | 1.845 ns |
|            'Old DateTime.Year' | 2.500 ns |
|      'Optimized DateTime.Year' | 1.210 ns |
|       'Old DateTime.DayOfYear' | 3.905 ns |
| 'Optimized DateTime.DayOfYear' | 1.020 ns |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know this is fine, but @adamsitnik please confirm that no special measures are necessary for such fast benchmark methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used OperationsPerInvoke and run these benchmarks locally:

[Benchmark(OperationsPerInvoke = 16)]
public int Day()
{
    DateTime local = date1;

    int consume = local.Day; consume += local.Day; consume += local.Day; consume += local.Day;
    consume += local.Day; consume += local.Day; consume += local.Day; consume += local.Day;
    consume += local.Day; consume += local.Day; consume += local.Day; consume += local.Day;
    consume += local.Day; consume += local.Day; consume += local.Day; consume += local.Day;

    return consume;
}

[Benchmark(OperationsPerInvoke = 16)]
public int Month()
{
    DateTime local = date1;

    int consume = local.Month; consume += local.Month; consume += local.Month; consume += local.Month;
    consume += local.Month; consume += local.Month; consume += local.Month; consume += local.Month;
    consume += local.Month; consume += local.Month; consume += local.Month; consume += local.Month;
    consume += local.Month; consume += local.Month; consume += local.Month; consume += local.Month;

    return consume;
}

[Benchmark(OperationsPerInvoke = 16)]
public int Year()
{
    DateTime local = date1;

    int consume = local.Year; consume += local.Year; consume += local.Year; consume += local.Year;
    consume += local.Year; consume += local.Year; consume += local.Year; consume += local.Year;
    consume += local.Year; consume += local.Year; consume += local.Year; consume += local.Year;
    consume += local.Year; consume += local.Year; consume += local.Year; consume += local.Year;

    return consume;
}

[Benchmark(OperationsPerInvoke = 16)]
public int DayOfYear()
{
    DateTime local = date1;

    int consume = local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear;
    consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear;
    consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear;
    consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear;

    return consume;
}

With the old DateTime implementation I got following results:

No OperationsPerInvoke:

Method Mean Error StdDev Median Min Max Allocated
Day 5.328 ns 0.0282 ns 0.0515 ns 5.326 ns 5.247 ns 5.497 ns -
Month 4.726 ns 0.1242 ns 0.2480 ns 4.620 ns 4.501 ns 5.534 ns -
Year 3.154 ns 0.0207 ns 0.0357 ns 3.157 ns 3.095 ns 3.262 ns -
DayOfYear 3.011 ns 0.0174 ns 0.0322 ns 3.003 ns 2.961 ns 3.100 ns -

OperationsPerInvoke:

Method Mean Error StdDev Median Min Max Allocated
Day 6.164 ns 0.0232 ns 0.0425 ns 6.146 ns 6.108 ns 6.255 ns -
Month 5.646 ns 0.0198 ns 0.0367 ns 5.641 ns 5.587 ns 5.763 ns -
Year 4.303 ns 0.0134 ns 0.0241 ns 4.303 ns 4.249 ns 4.365 ns -
DayOfYear 4.214 ns 0.0751 ns 0.1447 ns 4.175 ns 4.063 ns 4.779 ns -

As we can see, standard deviation has slightly improved (we want to keep it low for nano-benchmarks).

With the new DateTime implementation I got:

No OperationsPerInvoke:

Method Mean Error StdDev Median Min Max Allocated
Day 2.192 ns 0.0173 ns 0.0313 ns 2.187 ns 2.152 ns 2.316 ns -
Month 2.250 ns 0.0083 ns 0.0157 ns 2.247 ns 2.226 ns 2.289 ns -
Year 1.113 ns 0.0063 ns 0.0120 ns 1.113 ns 1.090 ns 1.146 ns -
DayOfYear 1.116 ns 0.0090 ns 0.0165 ns 1.110 ns 1.088 ns 1.156 ns -

OperationsPerInvoke:

Method Mean Error StdDev Median Min Max Allocated
Day 3.0989 ns 0.0119 ns 0.0212 ns 3.0978 ns 3.0605 ns 3.1376 ns -
Month 3.0633 ns 0.0099 ns 0.0187 ns 3.0580 ns 3.0345 ns 3.1085 ns -
Year 2.0554 ns 0.0128 ns 0.0241 ns 2.0434 ns 2.0314 ns 2.1136 ns -
DayOfYear 0.3053 ns 0.0011 ns 0.0020 ns 0.3050 ns 0.3021 ns 0.3098 ns -

The standard deviation has slightly improved (average dropped from 0.018875 to 0.0165), but the time reported for DayOfYear is very close to 0.2ns (single instruction on most modern hardware). Which suggests that the code got optimized by the JIT.

When looking at the disassembly, it seems that DayOfYear has been optimized by the JIT and now no calls are being performed:

; System.Tests.Perf_DateTime.Year()
       push      rsi
       sub       rsp,30
       xor       eax,eax
       mov       [rsp+28],rax
       mov       rcx,[rcx+8]
       mov       [rsp+28],rcx
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       lea       rcx,[rsp+28]
       call      qword ptr [0F090]
       add       eax,esi
       mov       esi,eax
       mov       eax,esi
       add       rsp,30
       pop       rsi
       ret
; Total bytes of code 265
; System.Tests.Perf_DateTime.DayOfYear()
       mov       rax,[rcx+8]
       mov       rdx,0FFFFFFFFFFFF
       and       rdx,rax
       mov       rcx,0FFC778816079
       mov       rax,rdx
       mul       rcx
       shr       rdx,23
       mov       eax,edx
       or        eax,3
       mov       edx,eax
       imul      rdx,396B06BD
       shr       rdx,2F
       imul      edx,23AB1
       sub       eax,edx
       or        eax,3
       imul      eax,2CDB61
       mov       eax,eax
       imul      rax,5B4FFFCB
       shr       rax,36
       lea       edx,[rax+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       lea       edx,[rax+rdx+1]
       mov       eax,edx
       ret
; Total bytes of code 151

Even when I replace += with ^= the JIT is smart enough to recognize that the value does not change, load it into a register and just keep xoring the register:

; System.Tests.Perf_DateTime.DayOfYear()
       mov       rax,[rcx+8]
       mov       rdx,0FFFFFFFFFFFF
       and       rdx,rax
       mov       rcx,0FFC778816079
       mov       rax,rdx
       mul       rcx
       shr       rdx,23
       mov       eax,edx
       or        eax,3
       mov       edx,eax
       imul      rdx,396B06BD
       shr       rdx,2F
       imul      edx,23AB1
       sub       eax,edx
       or        eax,3
       imul      eax,2CDB61
       mov       eax,eax
       imul      rax,5B4FFFCB
       shr       rax,36
       inc       eax
       mov       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       edx,eax
       xor       eax,edx
       mov       edx,eax
       mov       eax,edx
       ret
; Total bytes of code 124

So it looks like currently we can't outsmart JIT and we just need to keep the current benchmark implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adamsitnik for the investigation. Can you help me understand when one should use OperationsPerInvoke and one should not? I can clarify the doc. Perhaps it is "consider it when operations <= 5ns, definitely do it when operations < 1ns" ?

@adamsitnik I know we have guidance for extremely fast iterations here but I can't follow it. As you explained to me earlier, it's clearly not expected that one manually adds loops/repetitions inside the benchmark method to make it take 100ms. But in the OperationsPerInvoke section there, it suggests it's necessary to reach 100ms. If there's a IterationSetup, I think it's clear to me why 100ms is necessary -- since otherwise IterationSetup is runningtoo much.

@danmoseley
Copy link
Member

Thanks for the PR @SergeiPavlov somehow I missed it.

@danmoseley
Copy link
Member

validation failure is #2564
@LoopedBard3 should we wait on a fix for it? It seems hard to imagine how the coverage is necessary in this case.

@LoopedBard3
Copy link
Member

I agree, we can merge this in w/o the centos success considering the issue is unrelated to changes made here.

@adamsitnik adamsitnik merged commit 8ac30b9 into dotnet:main Aug 18, 2022
@SergeiPavlov SergeiPavlov deleted the DateTime_DayOfYear branch August 18, 2022 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants