Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cardano doesn't use a deterministic mechanism for CBOR encoding. The same block / tx data can be encoded in practically infinite variations, each one resulting in a different hash.
The process of decoding CBOR into a Rust struct involves loosing the information about these slight variations in encoding (eg: indefinite / definite arrays, int size, map orders, etc). So far, Pallas has been dealing with this problem by enriching the structures with extra data that allows to replicate the same variation when re-encoding a struct.
After dealing with several edge cases I thought I had won the battle. We actually managed to process the whole mainnet history without a mismatch. I felt a nice sense of accomplishment, I fought against the odds and survived...
... but I was wrong, CBOR had more tricks under the sleeve. A testnet Tx was found that had a mismatch in the resulting hash: txpipe/oura#307
This time, the problem was that a Tx input tuple (hash, index) was encoded as a CBOR indefinite array. Why on earth would someone encode a fixed tuple as an indefinite array? I don't know, but it was clear that this is not a war that can be won.
It is now 100% clear to me that the only way to maintain consistency on the hash generation process is to retain the original CBOR data, something that @NicolasDP has been telling me since quite a while now.
The question is how do we accomplish this without an impact on memory.
In this PR I introduce a generic structure called
KeepRaw<T>
that wraps an inner CBOR-encodable structure while tracking the start / end positions that represent the segment of the original bytestream relevant to that particular inner struct:In this way, we can start wrapping ledger primitives, allowing us to retain the original CBOR and hash accordingly:
I'm not ashamed to say that CBOR has won, it was an honourable fight.