CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly #835

kderme · 2024-06-03T22:03:42Z

This pr tries to simplify and improve CIP 100 and its follow-ups 108 and 119, by avoiding the canonicalisation step when hashing the contents of the metadata. It is the result of a discussion here and here

Even though the canonicalisation algorithm has been introduced to avoid ambiquities related to different json parsing, in fact it tries to solve a problem that it itself creates: Without the need to canonicalise the content, we don't even need to parse the content as json during validation.

Some other advantages of this change:

it reduces the steps needed to create and validate the hash
it follows the existing paradigm set by pool off-chain metadata
It make CIP-100 a natural extention of CIP-1694. Application that only aim to be compatible with CIP-1694 and do a lightweight validation of the offchain data are not directly incompatible with CIP-100
it avoids issues of the json-ld canonicalisation algorithm mentioned here

Note that this dependency is not fully eliminated with this pr, since it still exists in the body signing algorithm. For this a separate pr will be created

rphair · 2024-06-04T15:07:22Z

@kderme I'm not qualified to review this but will make sure it gets introduced & initially discussed at the next CIP meeting (https://hackmd.io/@cip-editors/90).

Sworzen1 · 2024-06-11T21:23:27Z

Does hashing uncanonized jsons cause problems with comparing hashes and exchanging data between different systems ?

gitmachtl · 2024-06-11T21:31:01Z

Does hashing uncanonized jsons cause problems with comparing hashes and exchanging data between different systems ?

No, because its just handled like a binary file for the hashing. Like we do with all the other files that we hash to calculate a uniq stamp/identifier. A hash should be calculated on the raw binary version of a file IMO, so i fully support this PR.

Changing contents of files before calculating a hash can cause many kind of problems. I have seen similar attempts before for "text" files like json. And you end up with the issue that the "information" itself within the file may be the same. But than you may have other problems like linebreaks, ascii text that was saved as utf, etc.

So imo its the best and commonly used method to handle information in a file in the raw binary format, and do that hash on that.

Sworzen1 · 2024-06-11T22:39:03Z

Take a look on example in this RP: We have example.json and hash from raw file -> 04af1b48bccbf7cf9b3e3b7952dfbdde0cc851ccb87ae6643521672cc381b00d.
I replace the data in this file, for example -> move body propertie "comment" with value "This is a test vector for CIP-100" before "references" propertie. Result is -> the data is the same but hash is totally different -> 5e40a477b7fd2f195371eb440063e0823b53fdefc2189a85a887c2b49746f29d

gitmachtl · 2024-06-12T10:05:13Z

Take a look on example in this RP: We have example.json and hash from raw file -> 04af1b48bccbf7cf9b3e3b7952dfbdde0cc851ccb87ae6643521672cc381b00d. I replace the data in this file, for example -> move body propertie "comment" with value "This is a test vector for CIP-100" before "references" propertie. Result is -> the data is the same but hash is totally different -> 5e40a477b7fd2f195371eb440063e0823b53fdefc2189a85a887c2b49746f29d

yes, thats expected, its different data. thats the reason we should only hash raw files, independent from the data.

Sworzen1 · 2024-06-12T10:12:04Z

I don't agree with you. The data is the same, only a structure of the file changes. In my opinion, we should check the user data, because, the user data is most important, not file.

x = y;
y = x;
Its the same data man.

Additionally, different languages, applications and systems may format files differently, but a data remains the same no matter how a file is read.

gitmachtl · 2024-06-12T17:29:35Z

I don't agree with you. The data is the same, only a structure of the file changes. In my opinion, we should check the user data, because, the user data is most important, not file.

x = y; y = x; Its the same data man.

Additionally, different languages, applications and systems may format files differently, but a data remains the same no matter how a file is read.

Well, i agree to disagree with you. :-) It makes it unnecessary complicated to check data integrity for tools. Is there a universal tool/lib available for all 3rd party applications? There is one which is doing just a blake2 hashing on binary data, thats an easy pick.

Sworzen1 · 2024-06-12T17:35:22Z

Yes, blake2 is great. We need to have a data in order, which is provided by the json canonicalisation described so far. Otherwise, a structure of the files matters, and as I wrote above, a files can be interpreted in any way, with the result that reordering a data affects hash validation. We should be able to get the same hash regardless of the arrangement of data in the file. The hash should validate the correctness of the data, not the order of the data.

gitmachtl · 2024-06-12T17:45:31Z

Than why going with jsonld in the first place? We have a CDDL for every certificate, etc. on cardano. Why not directly going with CBOR binary data?

Sworzen1 · 2024-06-12T17:53:52Z

I think I didnt understand you completely. Please describe me which problem solve CBOR representation of json ?, is the same data but in another representation for optimization. In CBOR a data order still matters regardless of the correctness of the data.

gitmachtl · 2024-06-12T18:20:13Z

With all the tools i built like cardano-signer, i was coming from the CDDL cbor side. Where you have a numeric map entry for a specific data and thats it. These methods now trying to convert data over from readable text in any sorting order and charset to be converted to end up again like a cbor in canonilized format like we use all the time on cardano. But i am a bit late to the party for those CIPs, so i will not interfere with any progress here.

If it gets implemented within cardano-cli including the signing/verification part, i can update the spo scripts to reflect that change.

Btw, what method is used? URDNA2015?
If i find the time i will do some testing on the cli with nodeJS.

Sworzen1 · 2024-06-12T19:35:21Z

Yes - we are using jsond-ld lib in JavaScript to canonization and URDNA2015 algorithm as default. Im using it in GovTool because of CIP-100 requirement. I dont know why - about "Than why going with jsonld in the first place?".

Naturally, if you have a better tool that will check the correctness of the data, I will be happy to check it.
I know what cbor is and what parsing looks like, but without examples to your solution, I don't really know what you are talking about in the previous comment. Do you mean that your library (cardano-signer) canonicalizes cbor ?

Only things I can comment on are changes made to this PR, which I don't like very much for the reasons described above.

kderme · 2024-06-12T22:35:04Z

The data is the same, only a structure of the file changes. In my opinion, we should check the user data, because, the user data is most important, not file.

x = y; y = x; Its the same data man.

The example you provide contains the effect of an insignificant change to a file. However, at #783 (comment) I've provided examples that are not that harmless. They show how the content of a file can be maliciously manipulated and have a different meaning, while maintaining the same hash. A good hashing function is collision resistant, which URDNA2015 is not.

Additionally, different languages, applications and systems may format files differently, but a data remains the same no matter how a file is read.

The only requirement is that the editor of the file has control over its content, which is a very reasonable requirement. Note that no json parser is involved in the validation, so there is nothing that can cause ambiguity.

If it gets implemented within cardano-cli including the signing/verification part, i can update the spo scripts to reflect that change.

Afaik cli doesn't plan to support URDNA2015

disassembler · 2024-06-14T08:21:50Z

I'm in agreement with @kderme and @gitmachtl . We don't want to canonicalize json because it can cause collisions. This was a major point @dcoutts made when we stopped using canonical json after Byron. It's a security risk to massage data before hashing.

gitmachtl · 2024-06-14T17:49:20Z

may i ask a question: if the goal is to have a signature proof. why can't we just hash the binary file as it is. sign this hash with the wanted key - like the drep signing key - and put this information within the transaction aside with the action itself? can be added via metadata f.e. like we are doing catalyst registrations etc. this can all be done via cip8/30 message signing now. also hardware wallets support this with the upcoming firmware.

using cose_sign1 format again would simplify it also for 3rd party tools.

gitmachtl · 2024-06-14T18:07:20Z

hardware-wallets can't just sign a message or a hash. it must be embedded within the CIP8 structure. so relying on the function that signing keys needs to just sign a hash would not work for hardware-wallets. which would make them obsolete for dreps and committee members in combination with governance metadata signing.

EDIT: looks like cip108 has support for this witness type

gitmachtl · 2024-06-21T14:08:59Z

@kderme @Sworzen1

about the anchor data-hash issue for conway ... can we settle down on the fact that the hash on the chain which is linked to a file should be the hash of the file and not some content hashing stuff? checking the validity of the content of a file has nothing to do with the filehash itself. checking the content of a file about its validity/signatures/etc can be done afterwards imo. the constitution f.e. is a textfile. so we would have different things on chain what a "hash" means. we are hashing pool metadata since the beginning about its binary data and not the content.

the anchor-hash should be the hash of the binary file. like we have with all file hashes on the internet. i have never seen a provided hash on a website that was built upon some internal content of a file, rather than the hash of the binary file to download itself.

Sworzen1 · 2024-06-23T17:58:07Z

Yes no problem. The important thing is, that we want to check the correctness of the data or only file. If in fact it is enough to check the file, I agree. However, if the correctness and order of the data is important, another mechanism would be useful.

rphair · 2024-06-25T09:44:34Z

We need to commit to a decision so related projects can be clear how to proceed, so this will be top priority for final review at the CIP meeting tonight (https://hackmd.io/@cip-editors/91).

Thanks @kderme + @gitmachtl @Sworzen1 for your presentations so far... please come to the meeting to contribute to the discussion, with the intention of deciding there (cc @Quantumplation @Ryun1 @Thomas-Upfield).

cc @disassembler @Crypto2099

rphair

CIP meeting in last hour has reviewed the rationale for making this change & agreed without dispute to go ahead with the changes exactly as proposed.

Crypto2099

The rationale for maintaining consistency and compatibility with downstream tooling such as db-sync make this a logical modification that I'm happy to see us get implemented and standardized before we move into the Conway ledger era.

Hash raw content directly for CIP-100, CIP-108, CIP-119

9836f22

kderme mentioned this pull request Jun 4, 2024

off_chain_vote_data is not populated with data IntersectMBO/cardano-db-sync#1716

Closed

rphair added Update Adds content or significantly reworks an existing proposal Category: Metadata Proposals belonging to the 'Metadata' category. labels Jun 4, 2024

rphair changed the title ~~Hash the raw content directly for CIP-100, CIP-108, CIP-119~~ CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly Jun 4, 2024

rphair added the State: Last Check Review favourable with disputes resolved; staged for merging. label Jun 25, 2024

rphair mentioned this pull request Jun 25, 2024

CIP 100 | Provide directions on how to create signatures for the body without circular dependencies. #783

Closed

rphair approved these changes Jun 25, 2024

View reviewed changes

rphair requested a review from Crypto2099 June 25, 2024 16:38

Crypto2099 approved these changes Jun 26, 2024

View reviewed changes

Crypto2099 merged commit 4640b74 into cardano-foundation:master Jun 26, 2024

CarlosLopezDeLara mentioned this pull request Jul 1, 2024

Remove check for DRep metadata size IntersectMBO/cardano-api#569

Merged

3 tasks

This was referenced Jul 1, 2024

Voter metadata sign the initial document #850

Closed

Vote metadata sign the initial document IntersectMBO/cardano-db-sync#1752

Closed

Ryun1 mentioned this pull request Aug 12, 2024

docs: small fixes and tweaks on CLI governance pages cardano-foundation/developer-portal#1303

Merged

3 tasks

rphair removed the State: Last Check Review favourable with disputes resolved; staged for merging. label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly #835

CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly #835

kderme commented Jun 3, 2024

rphair commented Jun 4, 2024

Sworzen1 commented Jun 11, 2024

gitmachtl commented Jun 11, 2024 •

edited

Loading

Sworzen1 commented Jun 11, 2024 •

edited

Loading

gitmachtl commented Jun 12, 2024

Sworzen1 commented Jun 12, 2024 •

edited

Loading

gitmachtl commented Jun 12, 2024

Sworzen1 commented Jun 12, 2024 •

edited

Loading

gitmachtl commented Jun 12, 2024

Sworzen1 commented Jun 12, 2024 •

edited

Loading

gitmachtl commented Jun 12, 2024 •

edited

Loading

Sworzen1 commented Jun 12, 2024 •

edited

Loading

kderme commented Jun 12, 2024

disassembler commented Jun 14, 2024

gitmachtl commented Jun 14, 2024

gitmachtl commented Jun 14, 2024 •

edited

Loading

gitmachtl commented Jun 21, 2024 •

edited

Loading

Sworzen1 commented Jun 23, 2024

rphair commented Jun 25, 2024 •

edited

Loading

rphair left a comment

Crypto2099 left a comment

CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly #835

CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly #835

Conversation

kderme commented Jun 3, 2024

rphair commented Jun 4, 2024

Sworzen1 commented Jun 11, 2024

gitmachtl commented Jun 11, 2024 • edited Loading

Sworzen1 commented Jun 11, 2024 • edited Loading

gitmachtl commented Jun 12, 2024

Sworzen1 commented Jun 12, 2024 • edited Loading

gitmachtl commented Jun 12, 2024

Sworzen1 commented Jun 12, 2024 • edited Loading

gitmachtl commented Jun 12, 2024

Sworzen1 commented Jun 12, 2024 • edited Loading

gitmachtl commented Jun 12, 2024 • edited Loading

Sworzen1 commented Jun 12, 2024 • edited Loading

kderme commented Jun 12, 2024

disassembler commented Jun 14, 2024

gitmachtl commented Jun 14, 2024

gitmachtl commented Jun 14, 2024 • edited Loading

gitmachtl commented Jun 21, 2024 • edited Loading

Sworzen1 commented Jun 23, 2024

rphair commented Jun 25, 2024 • edited Loading

rphair left a comment

Choose a reason for hiding this comment

Crypto2099 left a comment

Choose a reason for hiding this comment

gitmachtl commented Jun 11, 2024 •

edited

Loading

Sworzen1 commented Jun 11, 2024 •

edited

Loading

Sworzen1 commented Jun 12, 2024 •

edited

Loading

Sworzen1 commented Jun 12, 2024 •

edited

Loading

Sworzen1 commented Jun 12, 2024 •

edited

Loading

gitmachtl commented Jun 12, 2024 •

edited

Loading

Sworzen1 commented Jun 12, 2024 •

edited

Loading

gitmachtl commented Jun 14, 2024 •

edited

Loading

gitmachtl commented Jun 21, 2024 •

edited

Loading

rphair commented Jun 25, 2024 •

edited

Loading