-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly #835
CIP-0100, CIP-0108, CIP-0119 | Hash raw content directly #835
Conversation
@kderme I'm not qualified to review this but will make sure it gets introduced & initially discussed at the next CIP meeting (https://hackmd.io/@cip-editors/90). |
Does hashing uncanonized jsons cause problems with comparing hashes and exchanging data between different systems ? |
No, because its just handled like a binary file for the hashing. Like we do with all the other files that we hash to calculate a uniq stamp/identifier. A hash should be calculated on the raw binary version of a file IMO, so i fully support this PR. Changing contents of files before calculating a hash can cause many kind of problems. I have seen similar attempts before for "text" files like json. And you end up with the issue that the "information" itself within the file may be the same. But than you may have other problems like linebreaks, ascii text that was saved as utf, etc. So imo its the best and commonly used method to handle information in a file in the raw binary format, and do that hash on that. |
Take a look on example in this RP: We have example.json and hash from raw file -> 04af1b48bccbf7cf9b3e3b7952dfbdde0cc851ccb87ae6643521672cc381b00d. |
yes, thats expected, its different data. thats the reason we should only hash raw files, independent from the data. |
I don't agree with you. The data is the same, only a structure of the file changes. In my opinion, we should check the user data, because, the user data is most important, not file. x = y; Additionally, different languages, applications and systems may format files differently, but a data remains the same no matter how a file is read. |
Well, i agree to disagree with you. :-) It makes it unnecessary complicated to check data integrity for tools. Is there a universal tool/lib available for all 3rd party applications? There is one which is doing just a blake2 hashing on binary data, thats an easy pick. |
Yes, blake2 is great. We need to have a data in order, which is provided by the json canonicalisation described so far. Otherwise, a structure of the files matters, and as I wrote above, a files can be interpreted in any way, with the result that reordering a data affects hash validation. We should be able to get the same hash regardless of the arrangement of data in the file. The hash should validate the correctness of the data, not the order of the data. |
Than why going with jsonld in the first place? We have a CDDL for every certificate, etc. on cardano. Why not directly going with CBOR binary data? |
I think I didnt understand you completely. Please describe me which problem solve CBOR representation of json ?, is the same data but in another representation for optimization. In CBOR a data order still matters regardless of the correctness of the data. |
With all the tools i built like cardano-signer, i was coming from the CDDL cbor side. Where you have a numeric map entry for a specific data and thats it. These methods now trying to convert data over from readable text in any sorting order and charset to be converted to end up again like a cbor in canonilized format like we use all the time on cardano. But i am a bit late to the party for those CIPs, so i will not interfere with any progress here. If it gets implemented within cardano-cli including the signing/verification part, i can update the spo scripts to reflect that change. Btw, what method is used? URDNA2015? |
Yes - we are using jsond-ld lib in JavaScript to canonization and URDNA2015 algorithm as default. Im using it in GovTool because of CIP-100 requirement. I dont know why - about "Than why going with jsonld in the first place?". Naturally, if you have a better tool that will check the correctness of the data, I will be happy to check it. Only things I can comment on are changes made to this PR, which I don't like very much for the reasons described above. |
The example you provide contains the effect of an insignificant change to a file. However, at #783 (comment) I've provided examples that are not that harmless. They show how the content of a file can be maliciously manipulated and have a different meaning, while maintaining the same hash. A good hashing function is collision resistant, which URDNA2015 is not.
The only requirement is that the editor of the file has control over its content, which is a very reasonable requirement. Note that no json parser is involved in the validation, so there is nothing that can cause ambiguity.
Afaik cli doesn't plan to support URDNA2015 |
I'm in agreement with @kderme and @gitmachtl . We don't want to canonicalize json because it can cause collisions. This was a major point @dcoutts made when we stopped using canonical json after Byron. It's a security risk to massage data before hashing. |
may i ask a question: if the goal is to have a signature proof. why can't we just hash the binary file as it is. sign this hash with the wanted key - like the drep signing key - and put this information within the transaction aside with the action itself? can be added via metadata f.e. like we are doing catalyst registrations etc. this can all be done via cip8/30 message signing now. also hardware wallets support this with the upcoming firmware. using cose_sign1 format again would simplify it also for 3rd party tools. |
hardware-wallets can't just sign a message or a hash. it must be embedded within the CIP8 structure. so relying on the function that signing keys needs to just sign a hash would not work for hardware-wallets. which would make them obsolete for dreps and committee members in combination with governance metadata signing. EDIT: looks like cip108 has support for this witness type |
about the anchor data-hash issue for conway ... can we settle down on the fact that the hash on the chain which is linked to a file should be the hash of the file and not some content hashing stuff? checking the validity of the content of a file has nothing to do with the filehash itself. checking the content of a file about its validity/signatures/etc can be done afterwards imo. the constitution f.e. is a textfile. so we would have different things on chain what a "hash" means. we are hashing pool metadata since the beginning about its binary data and not the content. the anchor-hash should be the hash of the binary file. like we have with all file hashes on the internet. i have never seen a provided hash on a website that was built upon some internal content of a file, rather than the hash of the binary file to download itself. |
Yes no problem. The important thing is, that we want to check the correctness of the data or only file. If in fact it is enough to check the file, I agree. However, if the correctness and order of the data is important, another mechanism would be useful. |
We need to commit to a decision so related projects can be clear how to proceed, so this will be top priority for final review at the CIP meeting tonight (https://hackmd.io/@cip-editors/91). Thanks @kderme + @gitmachtl @Sworzen1 for your presentations so far... please come to the meeting to contribute to the discussion, with the intention of deciding there (cc @Quantumplation @Ryun1 @Thomas-Upfield). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIP meeting in last hour has reviewed the rationale for making this change & agreed without dispute to go ahead with the changes exactly as proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale for maintaining consistency and compatibility with downstream tooling such as db-sync make this a logical modification that I'm happy to see us get implemented and standardized before we move into the Conway ledger era.
This pr tries to simplify and improve CIP 100 and its follow-ups 108 and 119, by avoiding the canonicalisation step when hashing the contents of the metadata. It is the result of a discussion here and here
Even though the canonicalisation algorithm has been introduced to avoid ambiquities related to different json parsing, in fact it tries to solve a problem that it itself creates: Without the need to canonicalise the content, we don't even need to parse the content as json during validation.
Some other advantages of this change:
Note that this dependency is not fully eliminated with this pr, since it still exists in the body signing algorithm. For this a separate pr will be created