Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] adding BLAKE3 support #121

Closed
eleitl opened this issue Jan 12, 2020 · 21 comments · Fixed by #147
Closed

[enhancement] adding BLAKE3 support #121

eleitl opened this issue Jan 12, 2020 · 21 comments · Fixed by #147

Comments

@eleitl
Copy link

eleitl commented Jan 12, 2020

BLAKE3 https://github.com/BLAKE3-team/BLAKE3 is a major improvement (https://medium.com/asecuritysite-when-bob-met-alice/blake3-3716708235ac https://news.ycombinator.com/item?id=22021769 https://news.ycombinator.com/item?id=22003315) on BLAKE2 in regards to speed.

There's a pure Go implementation https://github.com/lukechampine/blake3 though it's not optimized yet. An optimized version will become available in x/crypto at some point.

As an enhancement proposal, it would be nice to have at least preliminary BLAKE3 support in multihash so that downstream like IPFS can start using it.

@Stebalien
Copy link
Member

I'd like to see some people try to break it first. Much of its speed comes from the assumption that current symmetric algorithms are too paranoid and have two many rounds.

@lzsaver
Copy link

lzsaver commented Jun 3, 2020

@divinity76
Copy link

divinity76 commented Jul 30, 2020

btw there's a fast optimized Go-implementation available here https://github.com/zeebo/blake3

@cryptoquick
Copy link

Is there any way to get a sense of priority on this? It'd be very helpful for my use-case.

Blake3 digests don't need to finish hashing a data stream before it can tell if the data is incorrect. If one single byte is off anywhere in the stream, it'll immediately throw an exception. This also allows IPLD MerkleDAG blocks to be safely made much larger.

@Stebalien
Copy link
Member

@cryptoquick the codec is 0x1e. Feel free to add support (we can disable it by default).

Blake3 digests don't need to finish hashing a data stream before it can tell if the data is incorrect. If one single byte is off anywhere in the stream, it'll immediately throw an exception

Source? That sounds impossible of the "the hash function would be broken if that were the case" kind.

@cryptoquick
Copy link

cryptoquick commented Mar 31, 2021

@cryptoquick the codec is 0x1e. Feel free to add support (we can disable it by default).

I'm not a Go developer, but I'm glad there's a Multicodec for it.

Source? That sounds impossible of the "the hash function would be broken if that were the case" kind.

Yeah, it really cool, check it out!
https://github.com/oconnor663/bao

To get verified streaming, this would probably require changes to the IPLD implementation, also.

@aschmahmann
Copy link
Contributor

aschmahmann commented Mar 31, 2021

Blake3 digests don't need to finish hashing a data stream before it can tell if the data is incorrect. If one single byte is off anywhere in the stream, it'll immediately throw an exception

This seems like a bit of an exaggeration, but since Blake3 is a tree hash that means that it can have parallelism in computing the hash which generally means more flexibility (e.g. throwing more cores at the hash means faster detection of certain problems, verified streaming, etc.).

This also allows IPLD MerkleDAG blocks to be safely made much larger.

Absolutely, although the "safety" is related to the assumptions of the data transfer layer and the transfer layer will have to be made smart enough to take advantage of verified streaming.

As an FYI protocol/beyond-bitswap#29 is a description of how a protocol like Bitswap might be augmented to support verified streaming along with an example of how to create a (worse than Blake3, but backwards compatible with most of the internet) verified streaming solution out of Merkle-Damgard hashes like SHA2-256.

@Stebalien
Copy link
Member

To get verified streaming, this would probably require changes to the IPLD implementation, also.

Yeah, you can't just verify raw data while hashing. You still need the rest of the hash tree.

@cryptoquick
Copy link

@aschmahmann Thanks for providing a link to that PR! I'm glad someone already mentioned verified streaming. Handling arbitrarily large block sizes poses a significant technical challenge, so simply adding Blake3 multihashes alone wouldn't solve it for free, but I think it'd be helpful for efforts like that to work towards supporting Blake3, which I guess is why I bring it up, is all.

Do you think supporting Blake3 at some point will be a helpful follow-on to a SHA2-256-only implementation?

@aschmahmann
Copy link
Contributor

Do you think supporting Blake3 at some point will be a helpful follow-on to a SHA2-256-only implementation?

IMO yes having support for some tree hash (e.g. Blake3 or KangarooTwelve) seems like a good idea. I think both of those functions (which AFAIK are the two most popular candidates for a tree based hash) use reduced hash rounds leading to @Stebalien's comment #121 (comment). I'm not sure when we reach the "confidence" point of accepting Blake3 (or KangarooTwelve) by default though (@Stebalien any thoughts?).

SHA2-256 is just nice because it's already widely used and allows me to grab a SHA2-256 checksum of some random file on the internet (e.g. a Linux ISO) and have verified streaming of it over a p2p network. However, as noted in that proposal, it has some unfortunate downsides such as only being able to stream backwards which make it not ideal compared to a tree based hash.

@cryptoquick
Copy link

A very good point! Another feather in SHA-2's cap is that there's relatively widespread hardware acceleration available:
https://en.wikipedia.org/wiki/Intel_SHA_extensions

That said, these things take time, and Protocol Labs is no stranger to futureproofing. I think Blake3 support should be prioritized, even if the priority is quite low. :)

@markg85
Copy link

markg85 commented Aug 27, 2021

A very good point! Another feather in SHA-2's cap is that there's relatively widespread hardware acceleration available:
https://en.wikipedia.org/wiki/Intel_SHA_extensions

That said, these things take time, and Protocol Labs is no stranger to futureproofing. I think Blake3 support should be prioritized, even if the priority is quite low. :)

To be fair here, it's great to have CPU extensions. But if the newcomer (BLAKE3 in this case) is simply downright faster then SHA2-256 (even with the use of those CPU extensions) then there is a real potential here for a just a superior hasher. Definitely faster (better is still up for debate).

For some numbers. I tried BLAKE3 (b3sum) and SHA2-256 (sha256sum) on a ODROID-XU4 (it's still 32 bit) and a Ryzen 4800u. The results surprised me a lot!
1.6GB file on the ODROID-XU4
BLAKE3: 21 seconds.
SHA2-256: 45 seconds.

5.8GB file on the Rezen 4800u
BLAKE3: 2.2 seconds
SHA2-256: 3.7 seconds

Note that subsequent BLAKE3 (b3sum) runs somehow were much faster with a mere ~0.5 seconds. I'm suspecting some caching going on there. So i took the first and slowest timing.

That's a full on win for BLAKE3 in two wildly different test scenarios, even though SHA2-256 has the benefit of dedicated CPU extensions.

@whyrusleeping
Copy link
Member

Is the only thing missing to close this issue a verdict on 'include this for real'. If thats the case, I think the world has generally agreed that blake3 is a good thing, and i've seen no indication from any cryptographer that theres anything wrong with it.

cc Chief Blake3 FanBoy @zookozcash

@zookozcash
Copy link

In my experience a cryptographer will never tell you "There's nothing wrong with it." or "It's safe." or anything like that. The best you can do is say something like "Can you point out a problem with it." and they won't. This is basically the exact same process you have to use with good lawyers. "Never ask a lawyer if doing a thing is safe, for the answer is always no. Instead ask, what's the safest way to do the thing, and what risks would remain."

Anyway, yeah, as far as I know there has been no cryptographic result indicating any weakness in BLAKE3. And since BLAKE3 is based on BLAKE2 (2012), which is based on BLAKE (2008) (the most thoroughly studied algorithm from the SHA3 competition and one of two algorithms with the biggest security margin), which is based on ChaCha20 (2008), which is based on Salsa20 (2005) (the winner of the eSTREAM competition), then it definitely has a pedigree.

Of course that doesn't mean that the changes didn't accidentally insert a weakness! I would never say it is safe. ;-) But I can't point to any flaws in it…

@whyrusleeping
Copy link
Member

From twitter:
image

@divinity76
Copy link

@markg85

Note that subsequent BLAKE3 (b3sum) runs somehow were much faster with a mere ~0.5 seconds. I'm suspecting some caching going on there. So i took the first and slowest timing.

Sha256 was CPU-bound, so the IO speed had little-to-no effect on sha256 hashing, while blake3 was IO-bound, so blake3 ran much faster when the file was already in the IO-cache

@markg85
Copy link

markg85 commented Aug 28, 2021

Sha256 was CPU-bound, so the IO speed had little-to-no effect on sha256 hashing, while blake3 was IO-bound, so blake3 ran much faster when the file was already in the IO-cache

Ahh, so that's what's happening!
I was assuming it to be something like that but didn't bother figuring it out. Thank you for confirming!

The mere fact that this happens does imho mean that BLAKE3 is just superior for today's hardware.

@markg85
Copy link

markg85 commented Jan 19, 2022

Reviving this to keep all info in one place.
I do have go-ipfs 0.11 which has this has supported.

ipfs add --nocopy --hash blake3 <somefile>

Fails with:
Error: potentially insecure hash functions not allowed

Is that intentional?

@divinity76
Copy link

Is that intentional?

no that must be a bug, but probably not a bug in go-multihash..

@markg85
Copy link

markg85 commented Jan 19, 2022

After some searching it looks like this is causing it: https://github.com/ipfs/go-verifcid/blob/master/validate.go
BLAKE3 isn't in the list.

It does have if code >= mh.BLAKE2B_MIN+19 && code <= mh.BLAKE2B_MAX which i don't fully get the logic off.. It "seems" like an attempt to allow blake hashes too? not entirely sure.

The repository (go-verifycid) states that it's a temporary one, but searching for it's use in go-ipfs turns out that it's well used: https://github.com/search?q=org%3Aipfs+go-verifcid&type=code

@divinity76
Copy link

@markg85 yeah i'd recommend filing a bugreport in the go-verifcid repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants