-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework benchmarks to make it easier to get assembly. #297
Conversation
This naming makes more sense, esspecially if we add more benchmark files. Signed-off-by: Joe Richey <joerichey@google.com>
This PR came about because I discovered the amazing After installing the tool, we can run buffer::p384::bench_getrandom::inner:
push rbx
sub rsp, 64
; Zero the buffer
xorps xmm0, xmm0
movaps xmmword ptr [rsp + 48], xmm0
movaps xmmword ptr [rsp + 32], xmm0
movaps xmmword ptr [rsp + 16], xmm0
; Call the funtion
lea rbx, [rsp + 16]
mov esi, 48
mov rdi, rbx
call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]
; Check for error
test eax, eax
jne .LBB17_1
; test::black_box(slice);
mov qword ptr [rsp], rbx
mov qword ptr [rsp + 8], 48
mov rax, rsp
add rsp, 64
pop rbx
ret We can see the effect of using buffer::p384::bench_getrandom_uninit::inner:
push rbx
sub rsp, 64
lea rbx, [rsp + 16]
mov esi, 48
mov rdi, rbx
call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]
test eax, eax
jne .LBB18_1
mov qword ptr [rsp], rbx
mov qword ptr [rsp + 8], 48
mov rax, rsp
add rsp, 64
pop rbx
ret As the benchmarks are compiled as separate crates, we can see the effect of inlining. Removing the buffer::p384::bench_getrandom_uninit::inner:
push rbx
sub rsp, 80
lea rbx, [rsp + 16]
lea rsi, [rsp + 32]
mov edx, 48
mov rdi, rbx
call qword ptr [rip + getrandom::getrandom_uninit@GOTPCREL]
mov rax, qword ptr [rsp + 16]
test rax, rax
je .LBB18_1
mov rcx, qword ptr [rsp + 24]
mov qword ptr [rsp + 16], rax
mov qword ptr [rsp + 24], rcx
add rsp, 80
pop rbx
ret We can also see that passing the entire array to buffer::p384::bench_getrandom_uninit::inner:
sub rsp, 104
lea rdi, [rsp + 56]
mov esi, 48
call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]
test eax, eax
jne .LBB18_1
; 48 byte copy
movups xmm0, xmmword ptr [rsp + 56]
movups xmm1, xmmword ptr [rsp + 72]
movups xmm2, xmmword ptr [rsp + 88]
movaps xmmword ptr [rsp + 32], xmm2
movaps xmmword ptr [rsp + 16], xmm1
movaps xmmword ptr [rsp], xmm0
mov rax, rsp
add rsp, 104
ret @briansmith this relates to #291 (comment) about how the type you pass to |
This change: - Move the benchmarks from mod.rs to buffer.rs - Move the inner loop we benchmark into an `#[inline(never)]` function - Includes instructions for getting the ASM for a specific benchmark This should hopefully reduce the variance of these benchmarks and make it easier to figure out if we are emitting the assembly or IR we expect for a particular implementation. Signed-off-by: Joe Richey <joerichey@google.com>
No major objections from me.
I think most users don't really want a |
This change:
&[u8]
totest::black_box
for both benchmarks#[inline(never)]
functionThis should hopefully reduce the variance of these benchmarks and make it easier to figure out if we are emitting the assembly or IR we expect for a particular implementation.
Signed-off-by: Joe Richey joerichey@google.com