Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly #835

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

kderme
Copy link
Contributor

@kderme kderme commented Jun 3, 2024

This pr tries to simplify and improve CIP 100 and its follow-ups 108 and 119, by avoiding the canonicalisation step when hashing the contents of the metadata. It is the result of a discussion here and here

Even though the canonicalisation algorithm has been introduced to avoid ambiquities related to different json parsing, in fact it tries to solve a problem that it itself creates: Without the need to canonicalise the content, we don't even need to parse the content as json during validation.

Some other advantages of this change:

  • it reduces the steps needed to create and validate the hash
  • it follows the existing paradigm set by pool off-chain metadata
  • It make CIP-100 a natural extention of CIP-1694. Application that only aim to be compatible with CIP-1694 and do a lightweight validation of the offchain data are not directly incompatible with CIP-100
  • it avoids issues of the json-ld canonicalisation algorithm mentioned here

Note that this dependency is not fully eliminated with this pr, since it still exists in the body signing algorithm. For this a separate pr will be created

@rphair rphair added Update Adds content or significantly reworks an existing proposal Category: Metadata Proposals belonging to the 'Metadata' category. labels Jun 4, 2024
@rphair rphair changed the title Hash the raw content directly for CIP-100, CIP-108, CIP-119 CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly Jun 4, 2024
@rphair
Copy link
Collaborator

rphair commented Jun 4, 2024

@kderme I'm not qualified to review this but will make sure it gets introduced & initially discussed at the next CIP meeting (https://hackmd.io/@cip-editors/90).

@Sworzen1
Copy link

Does hashing uncanonized jsons cause problems with comparing hashes and exchanging data between different systems ?

@gitmachtl
Copy link
Contributor

gitmachtl commented Jun 11, 2024

Does hashing uncanonized jsons cause problems with comparing hashes and exchanging data between different systems ?

No, because its just handled like a binary file for the hashing. Like we do with all the other files that we hash to calculate a uniq stamp/identifier. A hash should be calculated on the raw binary version of a file IMO, so i fully support this PR.

Changing contents of files before calculating a hash can cause many kind of problems. I have seen similar attempts before for "text" files like json. And you end up with the issue that the "information" itself within the file may be the same. But than you may have other problems like linebreaks, ascii text that was saved as utf, etc.

So imo its the best and commonly used method to handle information in a file in the raw binary format, and do that hash on that.

@Sworzen1
Copy link

Sworzen1 commented Jun 11, 2024

Take a look on example in this RP: We have example.json and hash from raw file -> 04af1b48bccbf7cf9b3e3b7952dfbdde0cc851ccb87ae6643521672cc381b00d.
I replace the data in this file, for example -> move body propertie "comment" with value "This is a test vector for CIP-100" before "references" propertie. Result is -> the data is the same but hash is totally different -> 5e40a477b7fd2f195371eb440063e0823b53fdefc2189a85a887c2b49746f29d

@gitmachtl
Copy link
Contributor

Take a look on example in this RP: We have example.json and hash from raw file -> 04af1b48bccbf7cf9b3e3b7952dfbdde0cc851ccb87ae6643521672cc381b00d. I replace the data in this file, for example -> move body propertie "comment" with value "This is a test vector for CIP-100" before "references" propertie. Result is -> the data is the same but hash is totally different -> 5e40a477b7fd2f195371eb440063e0823b53fdefc2189a85a887c2b49746f29d

yes, thats expected, its different data. thats the reason we should only hash raw files, independent from the data.

@Sworzen1
Copy link

Sworzen1 commented Jun 12, 2024

I don't agree with you. The data is the same, only a structure of the file changes. In my opinion, we should check the user data, because, the user data is most important, not file.

x = y;
y = x;
Its the same data man.

Additionally, different languages, applications and systems may format files differently, but a data remains the same no matter how a file is read.

@gitmachtl
Copy link
Contributor

I don't agree with you. The data is the same, only a structure of the file changes. In my opinion, we should check the user data, because, the user data is most important, not file.

x = y; y = x; Its the same data man.

Additionally, different languages, applications and systems may format files differently, but a data remains the same no matter how a file is read.

Well, i agree to disagree with you. :-) It makes it unnecessary complicated to check data integrity for tools. Is there a universal tool/lib available for all 3rd party applications? There is one which is doing just a blake2 hashing on binary data, thats an easy pick.

@Sworzen1
Copy link

Sworzen1 commented Jun 12, 2024

Yes, blake2 is great. We need to have a data in order, which is provided by the json canonicalisation described so far. Otherwise, a structure of the files matters, and as I wrote above, a files can be interpreted in any way, with the result that reordering a data affects hash validation. We should be able to get the same hash regardless of the arrangement of data in the file. The hash should validate the correctness of the data, not the order of the data.

@gitmachtl
Copy link
Contributor

Than why going with jsonld in the first place? We have a CDDL for every certificate, etc. on cardano. Why not directly going with CBOR binary data?

@Sworzen1
Copy link

Sworzen1 commented Jun 12, 2024

I think I didnt understand you completely. Please describe me which problem solve CBOR representation of json ?, is the same data but in another representation for optimization. In CBOR a data order still matters regardless of the correctness of the data.

@gitmachtl
Copy link
Contributor

gitmachtl commented Jun 12, 2024

With all the tools i built like cardano-signer, i was coming from the CDDL cbor side. Where you have a numeric map entry for a specific data and thats it. These methods now trying to convert data over from readable text in any sorting order and charset to be converted to end up again like a cbor in canonilized format like we use all the time on cardano. But i am a bit late to the party for those CIPs, so i will not interfere with any progress here.

If it gets implemented within cardano-cli including the signing/verification part, i can update the spo scripts to reflect that change.

Btw, what method is used? URDNA2015?
If i find the time i will do some testing on the cli with nodeJS.

@Sworzen1
Copy link

Sworzen1 commented Jun 12, 2024

Yes - we are using jsond-ld lib in JavaScript to canonization and URDNA2015 algorithm as default. Im using it in GovTool because of CIP-100 requirement. I dont know why - about "Than why going with jsonld in the first place?".

Naturally, if you have a better tool that will check the correctness of the data, I will be happy to check it.
I know what cbor is and what parsing looks like, but without examples to your solution, I don't really know what you are talking about in the previous comment. Do you mean that your library (cardano-signer) canonicalizes cbor ?

Only things I can comment on are changes made to this PR, which I don't like very much for the reasons described above.

@kderme
Copy link
Contributor Author

kderme commented Jun 12, 2024

The data is the same, only a structure of the file changes. In my opinion, we should check the user data, because, the user data is most important, not file.

x = y; y = x; Its the same data man.

The example you provide contains the effect of an insignificant change to a file. However, at #783 (comment) I've provided examples that are not that harmless. They show how the content of a file can be maliciously manipulated and have a different meaning, while maintaining the same hash. A good hashing function is collision resistant, which URDNA2015 is not.

Additionally, different languages, applications and systems may format files differently, but a data remains the same no matter how a file is read.

The only requirement is that the editor of the file has control over its content, which is a very reasonable requirement. Note that no json parser is involved in the validation, so there is nothing that can cause ambiguity.

If it gets implemented within cardano-cli including the signing/verification part, i can update the spo scripts to reflect that change.

Afaik cli doesn't plan to support URDNA2015

@disassembler
Copy link
Contributor

I'm in agreement with @kderme and @gitmachtl . We don't want to canonicalize json because it can cause collisions. This was a major point @dcoutts made when we stopped using canonical json after Byron. It's a security risk to massage data before hashing.

@gitmachtl
Copy link
Contributor

may i ask a question: if the goal is to have a signature proof. why can't we just hash the binary file as it is. sign this hash with the wanted key - like the drep signing key - and put this information within the transaction aside with the action itself? can be added via metadata f.e. like we are doing catalyst registrations etc. this can all be done via cip8/30 message signing now. also hardware wallets support this with the upcoming firmware.

using cose_sign1 format again would simplify it also for 3rd party tools.

@gitmachtl
Copy link
Contributor

gitmachtl commented Jun 14, 2024

hardware-wallets can't just sign a message or a hash. it must be embedded within the CIP8 structure. so relying on the function that signing keys needs to just sign a hash would not work for hardware-wallets. which would make them obsolete for dreps and committee members in combination with governance metadata signing.

EDIT: looks like cip108 has support for this witness type

@gitmachtl
Copy link
Contributor

gitmachtl commented Jun 21, 2024

@kderme @Sworzen1

about the anchor data-hash issue for conway ... can we settle down on the fact that the hash on the chain which is linked to a file should be the hash of the file and not some content hashing stuff? checking the validity of the content of a file has nothing to do with the filehash itself. checking the content of a file about its validity/signatures/etc can be done afterwards imo. the constitution f.e. is a textfile. so we would have different things on chain what a "hash" means. we are hashing pool metadata since the beginning about its binary data and not the content.

the anchor-hash should be the hash of the binary file. like we have with all file hashes on the internet. i have never seen a provided hash on a website that was built upon some internal content of a file, rather than the hash of the binary file to download itself.

@Sworzen1
Copy link

Yes no problem. The important thing is, that we want to check the correctness of the data or only file. If in fact it is enough to check the file, I agree. However, if the correctness and order of the data is important, another mechanism would be useful.

@rphair
Copy link
Collaborator

rphair commented Jun 25, 2024

We need to commit to a decision so related projects can be clear how to proceed, so this will be top priority for final review at the CIP meeting tonight (https://hackmd.io/@cip-editors/91).

Thanks @kderme + @gitmachtl @Sworzen1 for your presentations so far... please come to the meeting to contribute to the discussion, with the intention of deciding there (cc @Quantumplation @Ryun1 @Thomas-Upfield).

cc @disassembler @Crypto2099

@rphair rphair added the State: Last Check Review favourable with disputes resolved; staged for merging. label Jun 25, 2024
Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CIP meeting in last hour has reviewed the rationale for making this change & agreed without dispute to go ahead with the changes exactly as proposed.

@rphair rphair requested a review from Crypto2099 June 25, 2024 16:38
Copy link
Collaborator

@Crypto2099 Crypto2099 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale for maintaining consistency and compatibility with downstream tooling such as db-sync make this a logical modification that I'm happy to see us get implemented and standardized before we move into the Conway ledger era.

@Crypto2099 Crypto2099 merged commit 4640b74 into cardano-foundation:master Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Metadata Proposals belonging to the 'Metadata' category. Update Adds content or significantly reworks an existing proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants