Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sha1: implement collision detection and mitigation #566

Merged
merged 37 commits into from
Mar 27, 2024

Conversation

dignifiedquire
Copy link
Member

@dignifiedquire dignifiedquire commented Feb 22, 2024

This implements sha1 collision detection, including rehashing for mitigation.
As the code is 1-1 based on the version that git uses, the mitigation hashes should match as well.

Closes #204

Open TODOs

  • API
  • More code cleanup
  • Finish integrating of the test cases
  • Happy CI
  • Benchmarks

Limitations

Can only be used with the pure software implementation, asfaiu. The reason for this is, that the algorithm needs access to all intermediary states, and so processing 4 rounds at once through the intrinsics will screw things up.
For that reason I have made it it's own implementation, instead of adapting the existing compress implementations.

It might be possible to add support for the "simpler" assembly implementations that do round for round processing, but I think this could be a follow up in the future, if this is too slow for these platforms.

Prior art

hex!("e1761773e6a35916d99f891b77663e6405313587"),
)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an explicit test for the default values of checked::Config, somewhere, would have helped me in understanding the collision_test! macro, as it wasn't obvious to me that many of these flags that are turned off, are on by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the api to use a builder pattern, which should make it clearer

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I am spoiled already, knowing what I know. However, since it's the 'checked' version of the hash, I think it's absolutely fair to assume that all checking features are enabled by default unless they are disabled explicitly. The builder pattern doesn't make the more discoverable in tests I think (but an assertion for the desired value of the fields would have made it clear).

Using a builder for that might be a matter of whether that pattern was used in this crate before or not. I'd personally lean towards fields in a struct mainly because they are well-known, won't change (it's all part of the spec implemented here), and are easy enough to use.

@dignifiedquire
Copy link
Member Author

dignifiedquire commented Feb 23, 2024

Benchmarks on my macbook (arm M1), using the defaults, so both detection and mitigation enabled.

test sha1_10              ... bench:          12 ns/iter (+/- 0) = 833 MB/s
test sha1_100             ... bench:         107 ns/iter (+/- 3) = 934 MB/s
test sha1_1000            ... bench:       1,055 ns/iter (+/- 48) = 947 MB/s
test sha1_10000           ... bench:      10,521 ns/iter (+/- 659) = 950 MB/s
test sha1_collision_10    ... bench:          19 ns/iter (+/- 1) = 526 MB/s
test sha1_collision_100   ... bench:         180 ns/iter (+/- 6) = 555 MB/s
test sha1_collision_1000  ... bench:       1,660 ns/iter (+/- 55) = 602 MB/s
test sha1_collision_10000 ... bench:      16,414 ns/iter (+/- 509) = 609 MB/s

Copy link

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks on my macbook (arm M1), using the defaults, so both detection and mitigation enabled.

test sha1_10              ... bench:          12 ns/iter (+/- 0) = 833 MB/s
test sha1_100             ... bench:         107 ns/iter (+/- 3) = 934 MB/s
test sha1_1000            ... bench:       1,055 ns/iter (+/- 48) = 947 MB/s
test sha1_10000           ... bench:      10,521 ns/iter (+/- 659) = 950 MB/s
test sha1_collision_10    ... bench:          19 ns/iter (+/- 1) = 526 MB/s
test sha1_collision_100   ... bench:         180 ns/iter (+/- 6) = 555 MB/s
test sha1_collision_1000  ... bench:       1,660 ns/iter (+/- 55) = 602 MB/s
test sha1_collision_10000 ... bench:      16,414 ns/iter (+/- 509) = 609 MB/s

Thanks a lot for adding a benchmark, great for a better feel for what to expect.

I think Git gets about 1.2GB/s, and gitoxide is in the same ballpark when the ASM version of the implementation is used (which is great). Somehow, Git manages to do that with collision detection (probably), it's quite a bit magical.

However, it's without any doubt great to have the implementation and probably it also has to be the new default as correctness has to win, and those who can or want will get a switch to turn it off to get back to the previous (current) behaviour where collisions aren't checked for at all.

This will probably mean that a typical kernel pack, that decompresses to ~100GB, will need ~80s more CPU time. But probably I should actually test this with the version from this branch as other factors may matter as well, like maintaining the additional state that can start to accumulate if there is more than 1m streams hashed per second (but on 10 cores with one hasher per core).

I shall be back (but that really shouldn't hold up this PR at all).

hex!("e1761773e6a35916d99f891b77663e6405313587"),
)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I am spoiled already, knowing what I know. However, since it's the 'checked' version of the hash, I think it's absolutely fair to assume that all checking features are enabled by default unless they are disabled explicitly. The builder pattern doesn't make the more discoverable in tests I think (but an assertion for the desired value of the fields would have made it clear).

Using a builder for that might be a matter of whether that pattern was used in this crate before or not. I'd personally lean towards fields in a struct mainly because they are well-known, won't change (it's all part of the spec implemented here), and are easy enough to use.

@dignifiedquire dignifiedquire marked this pull request as ready for review February 23, 2024 18:30
@Byron
Copy link

Byron commented Feb 23, 2024

Here is my measurements, for informative purposes only, to demonstrate the pack resolution performance - an operation that happens for every clone or fetch.

Baseline

This is comparing the assembly version with the one without assembly, with the default sha1 0.10.6 crate from crates.io.

❯ hyperfine -w1 -N "gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx" "./target/release/gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx" -r2
Benchmark 1: gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):      9.454 s ±  0.061 s    [User: 68.773 s, System: 2.390 s]
  Range (min … max):    9.411 s …  9.497 s    2 runs

Benchmark 2: ./target/release/gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):     16.110 s ±  0.136 s    [User: 132.938 s, System: 2.582 s]
  Range (min … max):   16.013 s … 16.206 s    2 runs

Summary
  gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx ran
    1.70 ± 0.02 times faster than ./target/release/gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx

With default features, ASM didn't really kick in apparently.

Collision Detection

With the collision feature enabled.

❯ hyperfine -w1 -N "gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx" "./target/release/gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx" -r2
Benchmark 1: gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):      9.487 s ±  0.017 s    [User: 68.135 s, System: 2.583 s]
  Range (min … max):    9.475 s …  9.499 s    2 runs

Benchmark 2: ./target/release/gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):     26.471 s ±  0.263 s    [User: 225.025 s, System: 3.169 s]
  Range (min … max):   26.285 s … 26.656 s    2 runs

Summary
  gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx ran
    2.79 ± 0.03 times faster than ./target/release/gix free pack verify /Users/byron/dev/github.com/Byron/gitoxide/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx

Comparison by hand

gitoxide ( sha1-collisions) +1 -1 [$!?] took 26s
❯ ./target/release/gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
 20:54:39 Hash of index 'pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx' done 209.7MB in 0.44s (477.7MB/s)
 20:54:39                                           collecting sorted index done 7.5M entries in 0.99s (7.5M entries/s)
 20:54:41 Hash of pack 'pack-b73c0dd365b8ab881a163a1d342dac7416f52591.pack' done 1.3GB in 3.13s (423.2MB/s)
 20:54:41                                                          indexing done 7.5M objects in 2.15s (3.5M objects/s)
 20:55:05                                                         Resolving done 7.5M objects in 23.75s (315.4k objects/s)
 20:55:05                                                          Decoding done 94.3GB in 23.75s (4.0GB/s)

gitoxide ( sha1-collisions) +1 -1 [$!?] took 27s
❯ gix free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
 20:55:22 Hash of index 'pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx' done 209.7MB in 0.14s (1.5GB/s)
 20:55:23 Hash of pack 'pack-b73c0dd365b8ab881a163a1d342dac7416f52591.pack' done 1.3GB in 0.78s (1.7GB/s)
 20:55:23                                           collecting sorted index done 7.5M entries in 0.83s (9.1M entries/s)
 20:55:24                                                          indexing done 7.5M objects in 1.36s (5.5M objects/s)
 20:55:32                                                         Resolving done 7.5M objects in 7.37s (1.0M objects/s)
 20:55:32                                                          Decoding done 94.3GB in 7.37s (12.8GB/s)

We are seeing a little less than 1/3rd of the ASM performance.

Summary

Here I ran out of time as for some reason, when patching in the modified version of the crate alone (which required a clone and a version modification to match the one available locally), it triggered a lot of compile errors across the codebase that had nothing to do with it. Usually along the line of .as_ref() can't be inferred, needs type annotation. I have no explanation for this.

However, if the benchmarks above are an indication, we might be talking 609/950 (roughly 64%) reduction of this particular operation.

I truly wonder how Git can be this fast, as its hash performance doesn't seem to need ASM and seemingly supports collision detection as well.

Usability in gitoxide

It can probably be used with a runtime switch, but I didn't yet test if the increased footprint (due to the check-state) has a negative effect on performance even when it's disabled. It's probably hard to say much about this if ASM doesn't kick in automatically anymore in version 0.11. Technically it will be usable for sure.

@dignifiedquire
Copy link
Member Author

and gitoxide is in the same ballpark when the ASM version of the implementation is used

Do you have a link to that ASM code, would love to check out how they do it, after simplifying the code a bit, I think there is probably more that can be done, but adding an optimised version certainly can come later.

@dignifiedquire
Copy link
Member Author

I truly wonder how Git can be this fast, as its hash performance doesn't seem to need ASM and seemingly supports collision detection as well.

This is quite surprising indeed, as the implementation here is based on the exact same code

@dignifiedquire
Copy link
Member Author

When running on my linux machine with an AMD that has Sha intrinsics

test sha1_10              ... bench:           7 ns/iter (+/- 0) = 1428 MB/s
test sha1_100             ... bench:          58 ns/iter (+/- 0) = 1724 MB/s
test sha1_1000            ... bench:         516 ns/iter (+/- 1) = 1937 MB/s
test sha1_10000           ... bench:       5,091 ns/iter (+/- 24) = 1964 MB/s
test sha1_collision_10    ... bench:          27 ns/iter (+/- 0) = 370 MB/s
test sha1_collision_100   ... bench:         250 ns/iter (+/- 2) = 400 MB/s
test sha1_collision_1000  ... bench:       2,322 ns/iter (+/- 12) = 430 MB/s
test sha1_collision_10000 ... bench:      22,969 ns/iter (+/- 138) = 435 MB/s

@dignifiedquire dignifiedquire changed the title [WIP] Implement sha1 collision and mitigation Implement sha1 collision detection and mitigation Feb 23, 2024
@Byron
Copy link

Byron commented Feb 23, 2024

and gitoxide is in the same ballpark when the ASM version of the implementation is used

Do you have a link to that ASM code, would love to check out how they do it, after simplifying the code a bit, I think there is probably more that can be done, but adding an optimised version certainly can come later.

These can be activated by adding the asm feature to the sha1 crate, at least in version 0.10. I had a feeling these don't exist like this anymore in 0.11. And I definitely agree, optimization can come later.
At this performance loss, gitoxide will probably work hard to get the SHA256 transition going 😅.

@dignifiedquire
Copy link
Member Author

These can be activated by adding the asm feature to the sha1 crate, at least in version 0.10.

Ah yeah, I know more about these than I want to actually 🤣 , on 0.11 it's all auto detected by default.
I thought there might be another hash implementation somewhere 😅

The regular ASM implementation could be ported to do detection I believe, as it does rounds manually (just checked). But the overall work is still around 1.5x at least, soo...
The current code could be probably tweaked a bit more to make the rust compiler happier which should help some.

@Byron
Copy link

Byron commented Feb 23, 2024

Ah yeah, I know more about these than I want to actually 🤣 , on 0.11 it's all auto detected by default.
I thought there might be another hash implementation somewhere 😅

There can only be one ;)! But in all seriousness, it's great to hear as it would remove one reason for incompatibility that was plaguing gitoxide and required a workaround. I hope it will still be as fast, and will definitely be checking it :)..

From my tests, I have filled them into the original comment, performance (with collision-checks) looks like close to 1/3rd of what it is with ASM, while the latest version, 0.11 with ASM feature detection, seems to not kick in ASM on my machine (M1) or with the --release build settings I was using. So either way, performance was greatly reduced.

But there is no way around biting that bullet, even though I will definitely put in an escape hatch for those who want to live like it's 2016😅.

In any case, and in case it's not clear over my ramblings about performance: Absolutely stunning work, I am loving it! (it's just terrible to realize that a lot of the advertised performance came from not comparing apples to apples)


In case it matters, I also had to make these unrelated changes because the Rust compiler didn't like it anymore when using the code in this PR. I have no explanation.

commit 798b11a57977e353ecb53cd7b85f0525e74c15d0
Author: Sebastian Thiel <sebastian.thiel@icloud.com>
Date:   Fri Feb 23 19:00:41 2024 +0100

    TBD

diff --git a/gix-object/src/commit/message/body.rs b/gix-object/src/commit/message/body.rs
index 0da6d1d14..6e84f3e2e 100644
--- a/gix-object/src/commit/message/body.rs
+++ b/gix-object/src/commit/message/body.rs
@@ -33,7 +33,7 @@ pub struct TrailerRef<'a> {
 
 fn parse_single_line_trailer<'a, E: ParserError<&'a [u8]>>(i: &mut &'a [u8]) -> PResult<(&'a BStr, &'a BStr), E> {
     *i = i.trim_end();
-    let (token, value) = separated_pair(take_until(1.., b":".as_ref()), b": ", rest).parse_next(i)?;
+    let (token, value) = separated_pair(take_until(1.., &b":"[..]), b": ", rest).parse_next(i)?;
 
     if token.trim_end().len() != token.len() || value.trim_start().len() != value.len() {
         Err(winnow::error::ErrMode::from_error_kind(i, ErrorKind::Fail).cut())
diff --git a/gix-pack/src/data/input/bytes_to_entries.rs b/gix-pack/src/data/input/bytes_to_entries.rs
index 7450e9134..e089250eb 100644
--- a/gix-pack/src/data/input/bytes_to_entries.rs
+++ b/gix-pack/src/data/input/bytes_to_entries.rs
@@ -138,7 +138,7 @@ where
 
         let crc32 = if self.compressed.crc32() {
             let mut header_buf = [0u8; 12 + gix_hash::Kind::longest().len_in_bytes()];
-            let header_len = entry.header.write_to(bytes_copied, &mut header_buf.as_mut())?;
+            let header_len = entry.header.write_to(bytes_copied, &mut header_buf.as_mut_slice())?;
             let state = gix_features::hash::crc32_update(0, &header_buf[..header_len]);
             Some(gix_features::hash::crc32_update(state, &compressed))
         } else {
diff --git a/gix-pack/src/data/input/entry.rs b/gix-pack/src/data/input/entry.rs
index 7d3d9b3cb..cb5c2af80 100644
--- a/gix-pack/src/data/input/entry.rs
+++ b/gix-pack/src/data/input/entry.rs
@@ -33,7 +33,7 @@ impl input::Entry {
         let mut header_buf = [0u8; 12 + gix_hash::Kind::longest().len_in_bytes()];
         let header_len = self
             .header
-            .write_to(self.decompressed_size, &mut header_buf.as_mut())
+            .write_to(self.decompressed_size, &mut header_buf.as_mut_slice())
             .expect("write to memory will not fail");
         let state = gix_features::hash::crc32_update(0, &header_buf[..header_len]);
         gix_features::hash::crc32_update(state, self.compressed.as_ref().expect("we always set it"))
diff --git a/gix-worktree/src/stack/state/mod.rs b/gix-worktree/src/stack/state/mod.rs
index 30e3c609f..07bffbfbc 100644
--- a/gix-worktree/src/stack/state/mod.rs
+++ b/gix-worktree/src/stack/state/mod.rs
@@ -96,7 +96,7 @@ impl State {
         let a1_backing;
         #[cfg(feature = "attributes")]
         let a2_backing;
-        let names = match self {
+        let names: &[(&bstr::BStr, Option<_>)] = match self {
             State::IgnoreStack(ignore) => {
                 a1_backing = [(
                     ignore.exclude_file_name_for_directories.as_bytes().as_bstr(),

@dignifiedquire
Copy link
Member Author

Two questions, (1) do you happen to have performance numbers for the version git is using? (2) do you know which configuration for the collision detection they are running?

@Byron
Copy link

Byron commented Feb 23, 2024

Regarding 1): No, and it's not truly comparable as Git has a lot of extra overhead and it doesn't scale with cores at all. Maybe in single-core mode it can be somewhat comparable though, especially with a profiler to see where it spends time.

git verify-pack $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx takes 51s (but only uses 3 cores), maybe it does more as well, I don't know.

If that is normalized to a single core with git -c pack.threads=1 verify-pack $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx it takes 1m25s, pretty good still, whereas gix (with collisions enabled) takes 3m13s (./target/release/gix -t1 free pack verify $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx). Interestingly, the current highest performing build can do it (with a single core) in 1m1s.

So I think the 1.5x (better 1.4x) number for the ASM version seems to be what Git has, more or less, assuming all else equal. It's a respectable result that Git presents on a single core.

Regarding 2) All I know is the source in sha1.c which I linked earlier and I think you have seen already. There can always be compile-time tricks though that I don't understand, and the local version I am using is git version 2.39.3 (Apple Git-145), whatever magic sauce that comes with.

Just to be sure it wasn't laced with too much fairy dust, I built e02ecfcc534e2021aae29077a958dd11c3897e4c of git myself and ran /Users/byron/dev/github.com/git/git/git -c pack.threads=1 verify-pack $PWD/tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx in 1m 24s, so it's achieving the same result. Maybe it's just that C code compiling to something very, very fast?

@dignifiedquire
Copy link
Member Author

dignifiedquire commented Feb 23, 2024

have you tried building with rustflags="target-cpu=native"
this makes quite a big difference in these scenarios

@Byron
Copy link

Byron commented Feb 24, 2024

That's a valid point! It doesn't make a difference for this branch, it's virtually unchanged. After updating to the most recent version in this branch I did notice a 5% speedup though.
When retrying the latest master in this repository, its 820MB/s SHA1 performance still couldn't get close to the 1.7GB I get with the current implementation.

I think what I will end up with is to use v0.10 for the performance-toggle, and the checked version here for everything else.

Also, I couldn't resist to make measurements on Git's hashing alone, using a 200MB file as input.

❯ hyperfine -N -w1 'git hash-object -t blob tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx'
Benchmark 1: git hash-object -t blob tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.idx
  Time (mean ± σ):     113.6 ms ±   2.3 ms    [User: 92.8 ms, System: 14.5 ms]
  Range (min … max):   112.4 ms … 124.2 ms    26 runs

If that is extrapolated, we get to a whopping 1.5GB/s - 1.7GB/s.

With a bigger file of 1326MB, I get:

❯ hyperfine -N -w1  'git hash-object --literally tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.pack'
Benchmark 1: git hash-object --literally tests/fixtures/repos/linux.git/objects/pack/pack-b73c0dd365b8ab881a163a1d342dac7416f52591.pack
  Time (mean ± σ):     806.9 ms ±   8.6 ms    [User: 554.0 ms, System: 180.9 ms]
  Range (min … max):   795.6 ms … 820.0 ms    10 runs

Which puts it at around 1.64GB/s .

And because I couldn't believe that a bunch of C in a file with a casual build can be this fast, I profiled the run with the 1.3GB file, and here is the answer to all of our questions.

Screenshot 2024-02-24 at 09 42 28

The entrypoint in Git seems to be here, which clearly talks about platform specific implementations. All the configuration for these seems to come from here.

Maybe they deal with shattered, maybe they don't? I don't know how to validate that as unfortunately, git hash-object will add the Git object header to any input it's provided. But since the software fallback for Sha1 contains the mitigation, I presume they wouldn't use a platform-specific acceleration if it wouldn't have the same qualities.

That was an interesting excursion, I learned something. Maybe in the end, in order to be competitive, I'd need a gix-sha1 crate that uses sha1 with mitigration as fallback, like Git, but links to the platform variants when possible.

@dignifiedquire
Copy link
Member Author

Okay, that hash is not using collision detection, that is using the regular accelerated sha1 routines, which is why it can be so fast. I suspect what git is doing is that it only uses the hash collision version on very specific points, and not internally, to avoid the slow down.

@dignifiedquire
Copy link
Member Author

The regression comes from here: #542 the asm versions are actually fully removed :(

I guess it really is time to port the assembly to asm!..

@Byron
Copy link

Byron commented Feb 24, 2024

Okay, that hash is not using collision detection, that is using the regular accelerated sha1 routines, which is why it can be so fast. I suspect what git is doing is that it only uses the hash collision version on very specific points, and not internally, to avoid the slow down.

I looked for 15 minutes and could only conclude that this seems to be a compile-time-only affair. Git doesn't always get compiled with its own implementation, but if enabled it would detect collisions and die when one was found. But I might well be wrong with that. It feels a bit like the mitigation was quickly made available, but over time was supplanted by faster implementations again.

Git also doesn't have a choice for this beyond compile time it seems (or I can't find it), which makes me think that if, for some reason, the runtime switching doesn't work or reduces performance, I could also go back to a compile time switch and probably would reduce complexity that way. And I am saying this in case it would simplify things here as well to go compile-time only.

@dkg
Copy link

dkg commented Feb 27, 2024

This is really great work. Thank you both for the thoughtful back-and-forth, for documenting your sleuthing here, and for valuing correctness over speed.

Comment on lines 4 to 7
//! Copyright 2017 Marc Stevens <marc@marc-stevens.nl>, Dan Shumow <danshu@microsoft.com>
//! Distributed under the MIT Software License.
//! See accompanying file LICENSE.txt or copy at
//! https://opensource.org/licenses/MIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is MIT only which slightly changes the licensing situation for this crate, which is otherwise Apache 2.0+MIT.

We could note that the code for the collision feature is MIT-only in the toplevel README.md, or potentially ask the original authors if they would be okay with the resulting translation being dual licensed as Apache 2.0 in addition to MIT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cr-marcstevens according to the license file you are the original license holder, would you be okay with us relicensing under apache 2.0 + MIT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also @shumow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Friedel, Tony,
Thanks for your nice work making our code also available in rust!
And I'm certainly fine with dual licensing Apache 2.0+MIT (with the proper attribution and copyright notices).
But wouldn't the same situation also apply to the rust translation of our sha1.c code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm certainly fine with dual licensing Apache 2.0+MIT (with the proper attribution and copyright notices).

awesome, thank you!

But wouldn't the same situation also apply to the rust translation of our sha1.c code?

Yes very much, I will be doing a final pass of notes for attribution on the relevant files in a bit. I'll ping you when it is ready for review.

Copy link
Member Author

@dignifiedquire dignifiedquire Mar 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a detailed reference to src/checked.rs and pointers to this and the original files in both checked/ubc_check.rs and checked/compress.rs. Please let me know if this is good for you

sha1/src/lib.rs Outdated Show resolved Hide resolved
sha1/src/lib.rs Outdated Show resolved Hide resolved
@tarcieri tarcieri changed the title Implement sha1 collision detection and mitigation sha1: implement collision detection and mitigation Feb 29, 2024
@wiktor-k
Copy link

I have only minor nits. After those are fixed and digest is downgraded to v0.10, I think we can merge it.

Hi! Is there anything left to get this merged? Sadly some of us are stuck with SHA-1 and this PR would make it a little bit less... explosive. Thanks for your time!

@newpavlov
Copy link
Member

Sorry, I forgot to return to this PR. I will merge it and release the crate shortly.

@newpavlov newpavlov merged commit e766aec into RustCrypto:master Mar 27, 2024
221 checks passed
@dignifiedquire dignifiedquire deleted the feat-sha1-collision branch March 27, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding collision detection
7 participants