-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sha1 and Sha2 cleanups & optimizations #8097
Conversation
#[test] | ||
fn test_md5() { | ||
// Examples from wikipedia | ||
let wikipedia_tests = ~[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFCs for MD4 and MD5 provide some test vectors that you could use too. In my implementation, I found a bug when testing a couple of longer input strings. It might also make sense to use the a_million_letter_a
test from SHA1's suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I updated the pull request with new tests that do the "1 million a" test.
Since there are two competing implementations here is somebody willing to compare with @alco's version, discussed on the list here: https://mail.mozilla.org/pipermail/rust-dev/2013-July/004975.html. Am I right in thinking that this doesn not include MD4 while @alco's does? What's the way forward to merge these two efforts. We probably shouldn't rubber stamp this just because it won the PR race. |
@alco, please chime in with your own comparison if I misstate anything. I'm trying to just state the facts and I'm not trying to recommend one implementation over the other here, so if anything I write gives that impression, it is unintentional.
The goal of my MD5 implementation was to be a demonstration that the new functionality I was implementing was usable for digests outside of Sha2. Anyway, what I'm going to do is update this pull request to rebase out the MD5 implementation and instead integrate the new functionality into the existing Sha1 implementation. |
I've rebased the pull request to remove the MD5 implementation, to update the existing Sha1 implementation, and to add some new tests. A result of these changes is that Sha2 now properly checks the size of the message it is processing. Sha1 now gives a (slightly) more descriptive error message as part of fail!() if you try to process a message that is too large. This may address #2346. |
It might take a while before I learn how to properly rebuild Rust without it taking a month of time to be able to compare our two implementation of MD5. @DaGenix If you send MD5 as a separate PR, we can discuss it there. I can still see md5.rs in the current changeset though. |
Yikes, that was sloppy of me. I deleted the change that introduced it, but I guess the next change brought it back without me realizing. Sorry! I've rebased again to really get rid of it. |
I'm happy to open another pull request for the MD5 changes. However, my MD5 implementation heavily depends on the rest of the changes in this pull request, so, I don't think I can really do that until its decided what to do with this pull request. |
@DaGenix thanks for rebasing out the md5 impl so we can do this incrementally. This series seems pretty reasonable to me though DaGenix@4994417 does still introduce md5.rs only to delete it one commit later. Can you rebase once more to fix that snafu? Anybody else have opinions? |
The code was arranged so that the core Sha2 code came first, and then all of the various implementation of Digest followed along later. The problem is that the Sha512 compression function code is far away from the Sha512 Digest implementation, so, if you are trying to read over the code, you need to scroll all around the file for no good reason. The code was rearranged so that all of the Sha512 code is in one place and all of the Sha256 code is in another and so that all impls for a struct are near the definition of that struct.
The result_X() methods just calculate an output of a fixed size. They don't really have much to do with running the actually hash algorithm until the very last step - the output. It makes much more sense to put all this logic into the Digest impls for each specific variation on the hash function.
…f it. There are 2 main pieces of functionality in cryptoutil.rs: * A set of unsafe function for efficiently reading and writing u32 and u64 values. All of these functions are fairly easy to audit to confirm that they do what they are supposed to. * A FixedBuffer struct. This struct keeps track of input data until there is enough of it to execute the a function on it which expects a fixed block of data. The Sha2 module was rewritten to take advantage of the new functions in cryptoutil as well as FixedBuffer. The result is that the duplicate code for maintaining a buffer of input data is removed from the Sha512 and Sha256 implementation. Additionally, the FixedBuffer code is much more efficient than the previous code was.
The Sha2 compression functions were re-written to execute the message scheduling calculations in the same loop as the rest of the compression function. The compiler is able to generate much better code. Additionally, innermost part of the compression functions were turned into macros to reduce code duplicate and to make the functions more concise.
Create a helper function in cryptoutil.rs which feeds 1,000,000 'a's into a Digest with varying input sizes and then checks the result. This is essentially the same as one of Sha1's existing tests, so, that test was re-implemented using this method. New tests were added using this method for Sha512 and Sha256.
Added functions to cryptoutil.rs that perform an addition after shifting the 2nd parameter by a specified constant. These function fail!() if integer overflow will result. Updated the Sha2 implementation to use these functions.
Ug. Sorry about that. I've rebased again to get rid of the MD5 stuff. Git is determined to not let me get rid of it. |
Same content as #8097, but bors had an issue with that pull request. Opening a new one.
An MD5 implementation was originally included in #8097, but, since there are a couple different implementations of that digest algorithm (@alco mentioned his implementation on the mailing list just before I opened that PR), it was suggested that I remove it from that PR and open up a new PR to discuss the different implementations and the best way forward. If anyone wants to discuss a different implementation, feel free to present it here and discuss and compare it to this one. I'll just discuss my implementation and I'll leave it to others to present details of theirs. This implementation relies on the FixedBuffer struct from cryptoutil.rs for managing the input buffer, just like the Sha1 and Sha2 digest implementations do. I tried manually unrolling the loops in the compression function, but I got slightly worse performance when I did that. Outside of the #[test]s, I also tested the implementation by generating 1,000 inputs of up to 10MB in size and checking the MD5 digest calculated by this code against the MD5 digest calculated by Java's implementation. On my computer, I'm getting the following performance: ``` test md5::bench::md5_10 ... bench: 52 ns/iter (+/- 1) = 192 MB/s test md5::bench::md5_1k ... bench: 2819 ns/iter (+/- 44) = 363 MB/s test md5::bench::md5_64k ... bench: 178566 ns/iter (+/- 4927) = 367 MB/s ```
feat(new lint): new lint `manual_retain` close rust-lang#8097 This PR is a new lint implementation. This lint checks if the `retain` method is available. Thank you in advance. changelog: add new ``[`manual_retain`]`` lint
UPDATED (8/03): I'm leaving in my cycle count measurements in the description below, but just want to warn that I'm not sure how accurate they are. I would suggest not using them for anything unless someone else verifies them.
UPDATED (7/29): Rebased out the MD5 implementation and updated the Sha1 implementation instead.
I reworked the Sha2 code to be more readable and faster. Part of that work entailed extracting out common code having to do with buffer handling into a new struct, FixedBuffer, in the new cryptoutil.rs file. I then updated the Sha1 implementation to use the new functionality.
The first 3 commits in this request just move stuff around in the sha2.rs file. None of the changes here should impact functionality, but hopefully make the code a bit cleaner and easier to work with.
In the 4th commit:
In the 5th commit, I rewrite the compression functions used by Sha2. Not a major change - I inlined the message schedule calculation into the same loop as the rest of the compression function. This results in a nice speedup as I believe it lets LLVM optimize the method better.
In the 6th commit, I update the Digest trait to use default methods and get rid of the the DigestUtil trait now that default methods seem to be working better.
The 7th commit adds a new function to cryptoutil.rs that tests a Digest by feeding it 1,000,000 'a's with a varying input size and then checks the result. This is basically the same test that already exists in Sha1, so, I update that test. I also implement new tests using this function for Sha2.
The 8th commit adds new functions to cryptoutil.rs to perform integer additions with overflow checking. These functions are useful for keeping track of the size of the message to be processed by Sha1 and Sha2 since these Digests are only defined for messages under a certain size. I update the Sha2 code to use these functions.
The 9th commit updates the existing Sha1 code to use the new functionality in cryptoutil.rs. Hopefully this is a good example of how the new functionality is generally applicable to more than just Sha2 and can be used by future block based Digest functions.
Outside of the #tests, I also tested the Sha2 and Sha1 implementations by generating a few thousand random inputs of up to 10MB in size, calculated the digests using these implementations and Java's implementations checking to make sure that they agree.
Performance Measurements:
Benchmarks:
Before making any updates, the Sha2 performance was:
After the changes in change 4, the performance improved to:
After the rest of the changes (changes after 5 didn't impact performance much, so this is basically the same as the performance after change 5), the performance further improved to:
Before making any updates, the Sha1 performance was:
After the updates to the Sha1 code, the performance improved to:
Cycle counts:
Sha512 performance on my machine is: 10.95 cycles / byte. Intel [1], has a Sha512 implementation that achieves 9.4 cycles / byte using the SSE instructions (the AVX implementation is slightly faster, but, only runs on Haswell processors, I think). Assuming that that is the best possible performance, there really isn't all that much more room for optimization without resorting to assembly. I'm not sure what optimized C implementations generally achieve. Prior to the rebase, I was getting about 11.5 cycles / byte. I'm not sure why the performance improved a bit, possibly just variation on my machine or maybe the new code to keep track of the message length is a bit more efficient.
Sha256 performance on my machine is: 18.55 cycles / byte. Intel [2] achieved roughly 13.8 cycles / byte in their single threaded SSE implementation. Like Sha512, I think this puts the Sha256 implementation pretty close to the limit on how efficient it can be without resorting to assembly, though not as close to Sha512. After the rebase, the performance seems to have decreased by about 1 cycle / byte. The only new change was to how the message size is being calculated. I haven't done any testing specifically focusing on that at this time, however.
I used the RDTSC instruction to measure the cycle counts. In my measurements, I took into account the entire digest operation all the way through calculating the output digest. Its a little unclear to me if Intel's measurements were just of the compression function or if they included the output digest calculation steps as well.
I didn't / can't measure performance on non-Intel architectures. However, I don't think that any of the changes I made would result in decreased performance on other architectures. I redid my cycle calculations after the rebase.
[1] - http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementations-ia-processors-paper.html
[2] - http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html