-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks for DateTime.Day
/.Month
/.Year
/.DayOfYear
#2549
Conversation
.Day
/.Month
/.Year
/.DayOfYear
@@ -41,5 +43,17 @@ public static IEnumerable<string> ToString_MemberData() | |||
|
|||
[Benchmark] | |||
public DateTime ParseO() => DateTime.ParseExact("1996-06-03T22:15:00.0000000", "o", null); | |||
|
|||
[Benchmark] | |||
public int Day() => utcNow.Day; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What times is BDN reporting for these benchmarks, on your machine?
@adamsitnik I know we have guidance for extremely fast iterations here but I can't follow it. As you explained to me earlier, it's clearly not expected that one manually adds loops/repetitions inside the benchmark method to make it take 100ms. But in the OperationsPerInvoke section there, it suggests it's necessary to reach 100ms. If there's a IterationSetup, I think it's clear to me why 100ms is necessary -- since otherwise IterationSetup is runningtoo much.
In short -- when there is no per iteration setup required, is a benchmark method ever so extremely fast that we're expected to add a loop/repetition? and if so how fast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What times is BDN reporting for these benchmarks, on your machine?
On Windows x64 i9-9900 3.10GHz, .net6.0
:
| Method | Mean |
|------------------------------- |---------:|
| 'Old DateTime.Day' | 4.962 ns |
| 'Optimized DateTime.Day' | 2.318 ns |
| 'Old DateTime.Month' | 3.954 ns |
| 'Optimized DateTime.Month' | 1.845 ns |
| 'Old DateTime.Year' | 2.500 ns |
| 'Optimized DateTime.Year' | 1.210 ns |
| 'Old DateTime.DayOfYear' | 3.905 ns |
| 'Optimized DateTime.DayOfYear' | 1.020 ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know this is fine, but @adamsitnik please confirm that no special measures are necessary for such fast benchmark methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used OperationsPerInvoke
and run these benchmarks locally:
[Benchmark(OperationsPerInvoke = 16)]
public int Day()
{
DateTime local = date1;
int consume = local.Day; consume += local.Day; consume += local.Day; consume += local.Day;
consume += local.Day; consume += local.Day; consume += local.Day; consume += local.Day;
consume += local.Day; consume += local.Day; consume += local.Day; consume += local.Day;
consume += local.Day; consume += local.Day; consume += local.Day; consume += local.Day;
return consume;
}
[Benchmark(OperationsPerInvoke = 16)]
public int Month()
{
DateTime local = date1;
int consume = local.Month; consume += local.Month; consume += local.Month; consume += local.Month;
consume += local.Month; consume += local.Month; consume += local.Month; consume += local.Month;
consume += local.Month; consume += local.Month; consume += local.Month; consume += local.Month;
consume += local.Month; consume += local.Month; consume += local.Month; consume += local.Month;
return consume;
}
[Benchmark(OperationsPerInvoke = 16)]
public int Year()
{
DateTime local = date1;
int consume = local.Year; consume += local.Year; consume += local.Year; consume += local.Year;
consume += local.Year; consume += local.Year; consume += local.Year; consume += local.Year;
consume += local.Year; consume += local.Year; consume += local.Year; consume += local.Year;
consume += local.Year; consume += local.Year; consume += local.Year; consume += local.Year;
return consume;
}
[Benchmark(OperationsPerInvoke = 16)]
public int DayOfYear()
{
DateTime local = date1;
int consume = local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear;
consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear;
consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear;
consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear; consume += local.DayOfYear;
return consume;
}
With the old DateTime
implementation I got following results:
No OperationsPerInvoke:
Method | Mean | Error | StdDev | Median | Min | Max | Allocated |
---|---|---|---|---|---|---|---|
Day | 5.328 ns | 0.0282 ns | 0.0515 ns | 5.326 ns | 5.247 ns | 5.497 ns | - |
Month | 4.726 ns | 0.1242 ns | 0.2480 ns | 4.620 ns | 4.501 ns | 5.534 ns | - |
Year | 3.154 ns | 0.0207 ns | 0.0357 ns | 3.157 ns | 3.095 ns | 3.262 ns | - |
DayOfYear | 3.011 ns | 0.0174 ns | 0.0322 ns | 3.003 ns | 2.961 ns | 3.100 ns | - |
OperationsPerInvoke
:
Method | Mean | Error | StdDev | Median | Min | Max | Allocated |
---|---|---|---|---|---|---|---|
Day | 6.164 ns | 0.0232 ns | 0.0425 ns | 6.146 ns | 6.108 ns | 6.255 ns | - |
Month | 5.646 ns | 0.0198 ns | 0.0367 ns | 5.641 ns | 5.587 ns | 5.763 ns | - |
Year | 4.303 ns | 0.0134 ns | 0.0241 ns | 4.303 ns | 4.249 ns | 4.365 ns | - |
DayOfYear | 4.214 ns | 0.0751 ns | 0.1447 ns | 4.175 ns | 4.063 ns | 4.779 ns | - |
As we can see, standard deviation has slightly improved (we want to keep it low for nano-benchmarks).
With the new DateTime
implementation I got:
No OperationsPerInvoke
:
Method | Mean | Error | StdDev | Median | Min | Max | Allocated |
---|---|---|---|---|---|---|---|
Day | 2.192 ns | 0.0173 ns | 0.0313 ns | 2.187 ns | 2.152 ns | 2.316 ns | - |
Month | 2.250 ns | 0.0083 ns | 0.0157 ns | 2.247 ns | 2.226 ns | 2.289 ns | - |
Year | 1.113 ns | 0.0063 ns | 0.0120 ns | 1.113 ns | 1.090 ns | 1.146 ns | - |
DayOfYear | 1.116 ns | 0.0090 ns | 0.0165 ns | 1.110 ns | 1.088 ns | 1.156 ns | - |
OperationsPerInvoke
:
Method | Mean | Error | StdDev | Median | Min | Max | Allocated |
---|---|---|---|---|---|---|---|
Day | 3.0989 ns | 0.0119 ns | 0.0212 ns | 3.0978 ns | 3.0605 ns | 3.1376 ns | - |
Month | 3.0633 ns | 0.0099 ns | 0.0187 ns | 3.0580 ns | 3.0345 ns | 3.1085 ns | - |
Year | 2.0554 ns | 0.0128 ns | 0.0241 ns | 2.0434 ns | 2.0314 ns | 2.1136 ns | - |
DayOfYear | 0.3053 ns | 0.0011 ns | 0.0020 ns | 0.3050 ns | 0.3021 ns | 0.3098 ns | - |
The standard deviation has slightly improved (average dropped from 0.018875 to 0.0165), but the time reported for DayOfYear
is very close to 0.2ns (single instruction on most modern hardware). Which suggests that the code got optimized by the JIT.
When looking at the disassembly, it seems that DayOfYear
has been optimized by the JIT and now no calls are being performed:
; System.Tests.Perf_DateTime.Year()
push rsi
sub rsp,30
xor eax,eax
mov [rsp+28],rax
mov rcx,[rcx+8]
mov [rsp+28],rcx
lea rcx,[rsp+28]
call qword ptr [0F090]
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
lea rcx,[rsp+28]
call qword ptr [0F090]
add eax,esi
mov esi,eax
mov eax,esi
add rsp,30
pop rsi
ret
; Total bytes of code 265
; System.Tests.Perf_DateTime.DayOfYear()
mov rax,[rcx+8]
mov rdx,0FFFFFFFFFFFF
and rdx,rax
mov rcx,0FFC778816079
mov rax,rdx
mul rcx
shr rdx,23
mov eax,edx
or eax,3
mov edx,eax
imul rdx,396B06BD
shr rdx,2F
imul edx,23AB1
sub eax,edx
or eax,3
imul eax,2CDB61
mov eax,eax
imul rax,5B4FFFCB
shr rax,36
lea edx,[rax+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
lea edx,[rax+rdx+1]
mov eax,edx
ret
; Total bytes of code 151
Even when I replace +=
with ^=
the JIT is smart enough to recognize that the value does not change, load it into a register and just keep xoring the register:
; System.Tests.Perf_DateTime.DayOfYear()
mov rax,[rcx+8]
mov rdx,0FFFFFFFFFFFF
and rdx,rax
mov rcx,0FFC778816079
mov rax,rdx
mul rcx
shr rdx,23
mov eax,edx
or eax,3
mov edx,eax
imul rdx,396B06BD
shr rdx,2F
imul edx,23AB1
sub eax,edx
or eax,3
imul eax,2CDB61
mov eax,eax
imul rax,5B4FFFCB
shr rax,36
inc eax
mov edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor edx,eax
xor eax,edx
mov edx,eax
mov eax,edx
ret
; Total bytes of code 124
So it looks like currently we can't outsmart JIT and we just need to keep the current benchmark implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adamsitnik for the investigation. Can you help me understand when one should use OperationsPerInvoke and one should not? I can clarify the doc. Perhaps it is "consider it when operations <= 5ns, definitely do it when operations < 1ns" ?
@adamsitnik I know we have guidance for extremely fast iterations here but I can't follow it. As you explained to me earlier, it's clearly not expected that one manually adds loops/repetitions inside the benchmark method to make it take 100ms. But in the OperationsPerInvoke section there, it suggests it's necessary to reach 100ms. If there's a IterationSetup, I think it's clear to me why 100ms is necessary -- since otherwise IterationSetup is runningtoo much.
Thanks for the PR @SergeiPavlov somehow I missed it. |
validation failure is #2564 |
I agree, we can merge this in w/o the centos success considering the issue is unrelated to changes made here. |
Related to
DateTime
optimizations indotnet/runtime#72712
dotnet/runtime#73277