-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
md5: add inline assembly support for x86_64
and x86
#447
base: master
Are you sure you want to change the base?
Conversation
guarded by feature flag `inline-asm`
x86_64
and x86
Thank you! After a cursory look, I have the following comments:
I will try to do a more thorough review somewhat later. |
Worth a try. Will try that during the weekend. The number of registers on
binary size for bench executable without feature
binary size for bench executable with feature
Not sure why they are the same. Did I do something wrong? I inspected the assembly using
If we don't introduce any utility, the best we can do would be something like this: #[cfg_attr(rustfmt, rustfmt_skip)]
macro_rules! op_f {
($a: literal, $b: literal, $c: literal, $d: literal, $t: literal, $s: literal, $k: literal, $tmp: literal) => {
concat!(
"mov ", $tmp, ", ", $c, "\n",
"add ", $a, ", ", $t, "\n",
"xor ", $tmp, ", ", $d, "\n",
"and ", $tmp, ", ", $b, "\n",
"xor ", $tmp, ", ", $d, "\n",
"lea ", $a, ", [", $tmp, " + ", $a, " + ", $k, "]\n",
"rol ", $a, ", ", $s, "\n",
"add ", $a, ", ", $b, "\n",
)
};
} And call site will be something like op_f!("{a:e}", "{b:e}", "{c:e}", "{d:e}", "[{x} + 0]", 7, 0xd76aa478, "{t1:e}") I personally find the concatenation is hard to read, especially when there are commas in the string. The main disadvantage is that it was even harder to write. And I don't mind moving my crate elsewhere. Just let me know how it should be done. |
Tested on AMD Ryzen 9 5900X (Zen 3), Windows 10 22H2: Before moving block loop inside asm
After moving block loop inside asm
|
This PR implements md5 using inline assembly, which has the benefit of compiling and running on MSVC targets.
This serves as a demo to a path to address #315, RustCrypto/asm-hashes#45, and RustCrypto/asm-hashes#17
Performance Consideration
In the assembly implementation, a 3-way addition (2 registers + 1 immediate) is implemented as
https://github.com/johnmave126/hashes/blob/40a315dea9d5619e664c3b70749d0af692bca8b8/md5/src/asm/x86.rs#L81
However in Intel processors before Ice Lake, 3-way
LEA
has a latency of 3 cycles and uses only 1 port. (see here) In contrast,LEA
has a latency of 1 and can be dispatched in parallel to 2 ports on newer processors.In older processors, the following is better:
LLVM always emits something similar to the second flavor. Hence inline assembly will perform slower on processors before Ice Lake, but faster afterwards.
On Intel i7-6700K (Skylake), ubuntu-22.04.1:
Using
LEA+ADD
in inline assembly still beats non-asm on older processors, but not by much:On AMD Ryzen 9 5900X (Zen 3), Windows 10 22H2:
Dependency
The inline assembly code for md5 comes from a side project by me, which uses a declarative macro crate (asm_block also by me) to help defining macros emitting assembly. This introduces an additional dependency. If that's unwanted, we can strip down the
asm_block
macro to minimally viable and embed it directly.MSRV
Enabling inline assembly will bump MSRV to 1.59.
CI
The PR also includes an additional CI job to test this feature.