-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fr32-sha2-256-trunc254-padded-binary-tree multihash #331
base: master
Are you sure you want to change the base?
Conversation
From the FRC abstract:
This is how I think it should be done. The size doesn't really matter for the hash. The size information is additional context that is needed for some application. To me, this is not what Multihash is for. I've discussed with a lot of people about how to transmit additional context with CIDs at the LabWeek/IPFSCamp in 2022. I think there should be a way, but I still believe it shouldn't be part of the CID, but something separate. |
Yes/No. The situation as it currently stands makes referring to the data needed by CID not particularly useful. It was acknowledged that the v1 Piece CIDs are not a real CIDs which is why the codecs are labeled as Fixing mistakes is IMO precisely what multiformats is for and realizing that Filecoin Pieces might need real Content IDentifiers (CIDs) so they can be used in other contexts (e.g. IPFS p2p transfer of data by piece CID) and that a new multihash is necessary to do so is the cost of learning (and the benefit of having multiformats). |
I talked with various people about it, I now have a better understanding. I've only thought about the producing side of things. You put in some data and you get a specific Merkle tree out. Though if you think about CIDs also as a way to retrieve the data, then the height matters. You would retrieve data and would want to verify that the data is the one you expected. Without the height, you could get a higher level of the Merkle tree (random bytes and not the data you expected) and it would still verify correctly. This means that it now makes sense to me to include the height. |
From an IPLD perspective (i.e. if these are multihashes intended for CIDs) then I think the answer would come down to what we are addressing. The original "bmt" entry we had, which I think was even thought to be useful for bitcoin binary merkle addressing, turns out to be not so useful because you're addressing the entire base set of data, which is not really what you typically want. From the IPLD perspective you typically want to navigate through links to individual pieces of these things. For CommP, if we want it to address the entire base data being "hashed", then viewing the merkle process as a hash function itself makes more sense. When we start doing things like inclusion proofs then it gets a bit more iffy. Does https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0058.md muddy these waters a bit? Does data-segment neatly fold up into this new conception of CommP? Or do we keep those concerns entirely separate from this? Also: you're going to have to work on table formatting (see CI) .. sorry but your long name is going to make that tricky, or you might have to shorten it. |
And broadly this is +1 from me, just with some general questions about data-segment, and also the need to update the table formatting. |
Can be merged in draft at least? We've been using this in web3.storage for a long time now. |
can be merged when it's mergeable; table formatting needs fixing up at least |
@@ -146,6 +146,7 @@ transport-bitswap, transport, 0x0900, draft, Bits | |||
transport-graphsync-filecoinv1, transport, 0x0910, draft, Filecoin graphsync datatransfer | |||
transport-ipfs-gateway-http, transport, 0x0920, draft, HTTP IPFS Gateway trustless datatransfer | |||
multidid, multiformat, 0x0d1d, draft, Compact encoding for Decentralized Identifers | |||
fr32-sha2-256-trunc254-padded-binary-tree, multihash, 0x1011, draft, A balanced binary tree hash used in Filecoin Piece Commitments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fr32-sha2-256-trunc254-padded-binary-tree, multihash, 0x1011, draft, A balanced binary tree hash used in Filecoin Piece Commitments | |
fr32-sha2-256-trunc254-padbintree,multihash, 0x1011, draft, A balanced binary tree hash used in Filecoin Piece Commitments |
Suggestion for shortened name as per https://github.com/filecoin-project/FIPs/pull/808/files#r1361196071 . The huge realignment needed is an implicit flag that the proposed name is obnoxiously long.
Accepting above suggestion fixes the alignment requested by @rvagg as a side effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that pass the validator script? I thought it checked for spaces - but also happy to have the script edited to make accommodations for particularly large columns if someone wants to come up with an algorithm as I don't really like the idea of shuffling the whole table for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some additional shortening options: sha2-256
-> sha256
, trunc254
-> trunc
This PR reserves a single multihash code. The request here comes from the Filecoin ecosystem which is already using the underlying hashes here in relation to the codes reserved in #170 and #172.
However, it turns out the approach taken in the reservation of those codes was insufficient and as a result there’s a new spec proposal for a different multihash to better represent the data being referred to.
At a high level there were three approaches for describing these trees given here. We went with option 2, but that turns out to have been a mistake so now we’re going with option 3.
Why is this a good idea when you already had a code here?
You live and you learn. It turns out that for the particular data being dealt with the approach taken originally was problematic (and is IIUC largely why the related codec entries were tagged as
filecoin
rather than IPLD) and this one makes life easier. The reason CID and multihash exist is to make it easier to evolve through mistakes, this is one example of that.Should other applications of merkle-tree hashes lean on custom multihashes rather than reusing existing ones?
As with my comment around IPLD codecs from a while ago I think the answer lies in what the application gain/loses out on by not reusing existing multihashes.
Is it reasonable to have multiple codes dealing with effectively the same data?
Situationally, yes. In this case it turns out there are better and worse ways to describe the same data and people might even disagree on what those are. The table allows us to support these options.
A related example I’ve heard mentioned in the past was if someone wanted to register codes for the internal components of the blake3 tree so that data could be referred to as either a Blake3 multihash of a 1GiB file, or a more specific Blake3 internal-node multihash for the root of the tree whose leaves are the 1GiB file. On the surface this seems like a reasonable request as well.
cc @ribasushi @willscott @rvagg @vmx
(p.s. I hope Rod and Volker forgive me for making them relive #161, #170 and #172)