Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow blake3 #8650

Closed
3 tasks done
titusz opened this issue Jan 10, 2022 · 26 comments
Closed
3 tasks done

Allow blake3 #8650

titusz opened this issue Jan 10, 2022 · 26 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up

Comments

@titusz
Copy link

titusz commented Jan 10, 2022

Checklist

Installation method

ipfs-update or dist.ipfs.io

Version

go-ipfs version: 0.11.0
Repo version: 11
System version: amd64/windows
Golang version: go1.16.12

Config

irrelevant

Description

This command gives an error message:

ipfs add --hash=blake3 some.file
Error: potentially insecure hash functions not allowed

While using sha1 produces a valid CIDv1

ipfs add --hash=sha1 some.file

sha1 is known to be broken, blake3 is not. So it would make sense to either remove the security warning for blake3 or add one for sha1.

@titusz titusz added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Jan 10, 2022
@Jorropo
Copy link
Contributor

Jorropo commented Jan 12, 2022

sha1 is known to be broken

See this line:
https://github.com/ipfs/go-verifcid/blob/52ec7f147db0a13afb5bd64fa80026b9a88d00a6/validate.go#L30

This is needed because of git being a thing IPLD support. (note you shouldn't use it to host repos, to host repos on IPFS use unixfs with a safe algo with upacked packs)

I belive the goal isn't really to protect you from everything because that a lost cause, if sha2 gets broken all non updated nodes will still follow the old thing, that not an effective thing anyway.
I belive the goal is to get you to not use MD5 (which unlike sha1 which require good computation efforts MD5 is trivially broken).

blake3 is not

As you can see in the list, it's an optin list.
It's not that someone merged "blake3 is bad", but no one ever merged "blake3 is safe".

If you lookup the last commit date, you can see that this code is older than blake3.

Opinions

I don't actually think we want to include blake3 yet, it's a quite different algorithm than anything widely used, blake also has a special reputation and need to be dealt carefully.

We already allow blake2 which with the current implementations have comparable speeds.
https://github.com/ipfs/go-verifcid/blob/52ec7f147db0a13afb5bd64fa80026b9a88d00a6/validate.go#L39-L46

And blake2 unlike blake3 has seen good amounts of wide spread wild use. Blake2 is also mainly just a block cipher with chacha based mixing so nothing too crazy like infinite scaling.

Optout option

I belive a --force or something that disable this check would be usefull, if you want to use something custom why not ? (no garentees it work with other IPFS nodes tho).

@aschmahmann
Copy link
Contributor

aschmahmann commented Jan 12, 2022

+1 to the reasons given by @Jorropo explaining what's going on with SHA1 and why the output says that blake3 is unsafe (i.e. it was not explicitly marked as safe).

Marking blake3 as safe (when it has long enough outputs) seems fine by me per the discussion in multiformats/go-multihash#121. The main thing I'd want to be careful of is making sure that blake3 works properly in relevant edge cases since unlike most other hash functions (and all the ones currently supported) it allows for variable length outputs. If some (sharness) tests are added to go-ipfs exercising the transfer of variable length blake3 hashes that would likely be enough to make it usable by go-ipfs.


As an aside I'm looking forward to people exploring the possibilities of how blake3 can be utilized with IPFS. There seem like a number of pretty interesting possibilities since blake3 is itself a merkle-tree based hashing scheme

@titusz
Copy link
Author

titusz commented Jan 12, 2022

Maybe all it needs is to change the misleading error message. Instead of asserting that a hash function is "potentially insecure" if it is not in the goodset it would be more appropriate to state something like: "unsupported hash function not allowed".

Adding a --force option would be great for experimentation. I personally would like to experiment with the verified streaming capabilities of blake3 (https://github.com/oconnor663/bao)

Regarding variable length outputs: blake3 - as per specification - has a default output size of 256 bits. The multicodec code for blake3 does not indicate a length so the assumption would be that it is the 256 bits default length.

@Jorropo
Copy link
Contributor

Jorropo commented Jan 12, 2022

The multicodec code for blake3 does not indicate a length so the assumption would be that it is the 256 bits default length.

Multihash support variable sized hash functions (note, must be byte alligned, so only thing with multiple of 8 bits).
Each multihash encodes the length of the digest (even for static sized hash functions), so we aren't forced to enforce 256 bits to everyone.

@aschmahmann aschmahmann changed the title Allow blake3 or disallow sha1 Allow blake3 Jan 14, 2022
@RubenKelevra
Copy link
Contributor

Blake3 made decisions that offer performance over security. It's not safe enough to be trusted with data integrity checking on a level that you would expect from IPFS.

I did some performance testing and hashing doesn't really seem to be the blocker either. Blake2b should be performant enough for the usual applications - and is considered safe.

@RubenKelevra wrote:

BLAKE3 is extremely new and did security downgrades to archive a higher performance. I wouldn't trust it for data integrity. Especially since Blake was released like 9 years ago and has a good security record and is based on ChaCha which has 13 years of good security record.

#4143 (comment)

@titusz
Copy link
Author

titusz commented Jan 22, 2022

I understand that security is a sensitive topic. In spite of reducing compression rounds from 10 to 7 BLAKE3 claims to have the same 128-bit security level as BLAKE2s. "That is, 128-bit security against (second-)preimage, collision, or differentiability attacks.". I am not a crypto-analyst and correct me if I am wrong, but SHA-1 is clearly less secure than BLAKE3. My argument is that for clarity and consistency IPFS should either add a warning when using SHA-1 or remove the warning when trying to use BLAKE3.

@Jorropo
Copy link
Contributor

Jorropo commented Jan 22, 2022

I don't think the security debate is particularly usefull.
The goal is to protect you from MD5 level of security, if you want to use unsafe hashes, why not ?

We support SHA1 which has well known complete colisions, see this: shattered.io.
I don't particularly trust blake3 yet however I don't think it has worst security than sha1.

Something that might be usefull is a feature where go-ipfs would error or warn you if you have some blocks linking to different hashes.
For example someone could make a SHA256 block linking a SHA1 one hiding some unsafe links. I do see issues with this, at first glance that looks secure, but that aint.

@lidel lidel added kind/enhancement A net-new feature or improvement to an existing feature and removed kind/bug A bug in existing code (including security flaws) labels Jan 28, 2022
@ipfs ipfs deleted a comment from welcome bot Jan 28, 2022
@lidel
Copy link
Member

lidel commented Jan 28, 2022

SHA-1 is clearly less secure than BLAKE3 [..]
My argument is that for clarity and consistency IPFS should [..] add a warning when using SHA-1

Agree. Filled #8703 for this.

@aschmahmann aschmahmann added P2 Medium: Good to have, but can wait until someone steps up and removed need/triage Needs initial labeling and prioritization labels Feb 25, 2022
@laudiacay
Copy link
Contributor

laudiacay commented Jun 30, 2022

Adding a pro-BLAKE3 voice to the conversation: similar to @titusz, we're looking at implementing the BLAKE3 verified streaming feature over IPFS. Would love if it was supported, that cleans our architecture up quite a bit.

@Jorropo
Copy link
Contributor

Jorropo commented Jun 30, 2022

we're looking at implementing the BLAKE3 verified streaming feature over IPFS

Just send a PR then 🙂.

@laudiacay
Copy link
Contributor

laudiacay commented Jun 30, 2022

Not sure what needs to be done as far as bumping versions and getting this integrated- if there's anything I can do to make that easier and get this accepted, please let me know. But here's the PR, I think you can just directly import the updated version of the go-verifcid code into go-ipfs and you'll be all set. I can make a PR doing the integration into IPFS as well, but am awaiting advice re: preferences on version bumps so I don't mess up ur system :)

ipfs/go-verifcid#15

edit: wait, don't look at that, i am really confused about these dependencies... do I need to patch go-cid? looking into it now.

@Jorropo
Copy link
Contributor

Jorropo commented Jun 30, 2022

Not sure what needs to be done as far as bumping versions and getting this integrated

First bump the go-verify cid version (just edit the version.json, for this change you need to bump the second number and reset the third one and get that merged).

Then update go-ipfs to include it:

  • go get the new verify version.
  • (I think maybe we already do it), register the blake3 hasher into go-ipfs (you need to import blank _ "github.com/multiformats/go-multihash/register/blake3" somewhere in go-ipfs, (choose a place almost everyone uses)).
  • Add a shareness that add a blake3 file and ensure you get the expected hash (note use --raw-leaves on a small file and compare against a ference blake3 hashing utility).

@aschmahmann
Copy link
Contributor

aschmahmann commented Jun 30, 2022

It'd also be great to have tests (Go tests or sharness) that deal with different sized blake3 hashes. Blake3 is the only variable length hash function we have other than identity which is frequently treated specially.

Would be sad if you could cause me problems downloading DAGs with blake3 hashes.

Reiterating #8650 (comment)

@Jorropo
Copy link
Contributor

Jorropo commented Jun 30, 2022

It'd also be great to have tests (Go tests or sharness) that deal with different sized blake3 hashes. Blake3 is the only variable length hash function we have other than identity which is frequently treated specially.

We already have prior work that "just" assume blake3 is a fix sized 32 bytes big hash function, I think that a sane default to begin with and require less refactoring than supporting variable sizes easly.
https://github.com/multiformats/go-multihash/blob/89bf86c75b6b66226f63e22db775a72172970360/register/blake3/multihash_blake3.go#L21

@aschmahmann
Copy link
Contributor

If you're sticking with a fixed length, then we'd want to test that behavior (i.e. that different sizes don't work) so that we'll notice if there are changes there.

@Jorropo
Copy link
Contributor

Jorropo commented Jun 30, 2022

true.
However I think we already support variable sha256 (and others) and we just truncate the hash to the under value (so you can have sha256-128 just by changing the length field). Which we should disallow too.

@aschmahmann
Copy link
Contributor

I don't think so. Those have been around for a while and you can check them out and also see why they exist in the multiformats repos.

I'm more concerned with really long hashes breaking things than short ones since we have some basic protections in go-verifcid here and these have been possible for a while.

@laudiacay
Copy link
Contributor

laudiacay commented Jun 30, 2022

what do you prefer me to do re: this? honestly i think the correct way to go (and this is edgy, sorry!) is CIDvn+1 where you encode a length as well. I'm adding support for the 32-bytes blake3 into multihash right now (as necessary to add support in verifCID)

edit: whoopsies silly me multihashes already have that :)

@laudiacay
Copy link
Contributor

will add tests briefly as well

@laudiacay
Copy link
Contributor

right now i think go-multihash supports creating one length but will parse any length (but does error handling correctly if the lengths are inconsistent), the right place to validate whether a file "matches" a blake3 length-16 hash (it never should) is probably go-ipfs or go-cid- investigating now.

@Jorropo
Copy link
Contributor

Jorropo commented Jul 1, 2022

@laudiacay I think you are touching a critical point here.

Just FYI, the core problem and the thing you need to target is that ipfs get MUST not use unsafe hashes (also applies to other downloading functions, cat, gateway, dag export, ...).
The theory is someone could create a legit looking block with a sha256 CID, and internally use a MD4 link, we don't want that.

That is important to remember because that will change where you need to put thoses length checks.

@laudiacay
Copy link
Contributor

another two prerequisite PRs to getting this changed in go-ipfs. need the multihash one accepted before any others can be accepted.
multiformats/go-multihash#158
ipfs/go-cid#140

ok. i think the right place to do this is proooobably in multihash parsing...? like... adding something saying that if the code is for BLAKE3, the length must be 32? you know this stack better than me, let me know if im being silly.

@Jorropo
Copy link
Contributor

Jorropo commented Jul 1, 2022

ok. i think the right place to do this is proooobably in multihash parsing...? like... adding something saying that if the code is for BLAKE3, the length must be 32?

No, the parser must support insecure hashes, we have consumers (like trie keying), using xxhash64 for example.
I think you should add in verify CID which is already called by the relevant parts inside go-ipfs.

I would add it to this function:
https://github.com/ipfs/go-verifcid/blob/9c75188417e708d2415f2c9dbd3a88a1eae89b1c/validate.go#L51-L62

BTW I notice we already have a minimum of 20 bytes in place. pref.MhLength < minimumHashLength const minimumHashLength = 20

@aschmahmann
Copy link
Contributor

triage update, some things that we'd like to see. Largely stemming from multihash lengths being used differently by blake3 than the other (non-identity) multihashes.

Sharness tests for blake3 (I'd just add a test/sharness/tXXXX-blake3.sh).

  • block put --mhtype=blake3 --cid-codec=raw (and block get)

  • dag put --input-codec=raw --store-codec=raw --hash=blake3 and dag get to verify

    • results should match block put/get
  • add --hash=blake3 --raw-leaves and cat

    • try both small data (e.g. <256KiB) which should match the above and larger data (e.g. 10MiB) which won't.
  • for each size: <0, 16 bytes, 32 bytes, 64 bytes, 1024, 1025, 2^64-1>

    • block put --mhtype=blake3 --mhlen=<size> --cid-codec=raw
    • block get on the CID to verify and check that it matches what you'd expect
      • block tests for inspiration/copy-pasting: test/sharness/t0050-block.sh
      • bitswap smoke test passing around blocks with blake3
        • spin up two nodes, add the blocks to one and request the blocks from another
          • node transfer tests for inspiration test/sharness/t0125-twonode.sh (you don't need anything too complicated or worrying about which transports are used)
        • of the tests these are the most complicated, if you're having trouble leave them out
  • Add a maximum hashlen in go-verifcid of 128 bytes (AFAIK this matches the biggest hash length we have so far)

    • People on that PR might debate wanting a larger one, but we can start with that for now and see if anyone's got use cases for something bigger

Thanks @laudiacay for pushing on this 🙏

@laudiacay
Copy link
Contributor

do these still need doing, considering that we're handling length in more or less the same way?

@aschmahmann
Copy link
Contributor

aschmahmann commented Aug 16, 2022

Closed by #9187. Thanks @laudiacay and @Jorropo 🎉. Currently only 32 byte blake3 hashes are generatable, if demand (and PRs) for that show up then those can get tackled too. You can follow the PR trail for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants