LIP: 0032
Title: Replace payload hash with Merkle tree root in block header
Author: Alessandro Ricottone <alessandro.ricottone@lightcurve.io>
Discussions-To: https://research.lisk.com/t/replace-payload-hash-with-merkle-tree-root-in-block-header/214
Status: Active
Type: Standards Track
Created: 2020-02-19
Updated: 2021-12-01
Requires: 0019, 0031
This LIP proposes to remove the payloadHash
value stored in each block header and replace it with transactionRoot
, the root of a Merkle tree built from the transactions included in the payload.
This LIP is licensed under the Creative Commons Zero 1.0 Universal.
A Merkle tree is an authenticated data structure that allows for efficient proof-of-inclusion, where a Verifier can verify the presence of a certain data block in a dataset by quiring only Log(N) hashes from a Prover, where N is the length of the dataset. To check that the data block is part of the dataset, the Verifier recursively calculates the hash of the original data block with the hashes provided by the Prover, and checks that the final result is equal to the root of the Merkle tree.
In the current protocol, the payloadHash
is calculated by taking the SHA-256 hash of the block payload, given as the array of concatenated serialized transactions. Thus, to verify that a transaction has been included in a block, it is necessary to download the full block payload. With the new proposal to change to byte-based limit for block size, this amounts to downloading up to 15kb of data. By storing the root of the transaction Merkle tree in the block header as transactionRoot
instead of the payloadHash
, it will be possible, using a proof-of-inclusion, to check whether a transaction is included in a block downloading at most 7 different 32-bytes-long hashes, with a corresponding proof of size 232 bytes (see "Proof serialization" in LIP 0031 "Introduce Merkle trees and inclusion proofs").
In this LIP, we define how the transactionRoot
of a block is calculated as the root of the Merkle tree built from the transactions included in it.
The general specifications to build a Merkle tree in the Lisk protocol are given in LIP 0031. In this LIP we specify which changes are required to store the root of the transaction Merkle tree as the transactionRoot
in a block header and to verify the validity of new blocks after this change.
To forge a new block, a delegate selects a set of transactions to be included in the block payload. Currently, these transactions are sorted using the sortTransactions
function and then serialized. The payloadHash
is calculated as the SHA-256 of the concatenated serialized transactions.
Once LIP 0026 "Establish block validity by applying transactions sequentially" is implemented, transactions will be sorted by the block forger in the order in which they are applied to the blockchain, and included in the same order in the block payload. The serialized transaction IDs in the transaction application order are used to build a Merkle tree according to the specifications defined in LIP 0031 "Introduce Merkle trees and inclusion proofs"). The transactionRoot
is set to the resulting Merkle root and included in the block header instead of payloadHash
. Notice that transactionRoot
has the same type and size as payloadHash
.
To verify the validity of a block, first the Merkle root is calculated from the transactions included in the block payload and checked against the transactionRoot
stored in the block header. The same set of transactions sorted in a different way would result in a different Merkle root, and consequently an invalid block. In this sense, the transactionRoot
contains information about the order in which the transactions were applied to the blockchain. Then, the transactions are verified and applied in the same order in which they have been inserted in the payload.
To reproduce the same Merkle root consistently, the transactions have to be sorted in a consistent way. We consider two viable options:
- Sort by transaction ID: transactions are sorted according to their transaction ID in ascending order. Using this sorting method, it is easy to give a proof-of-absence for a certain transaction by giving a proof-of-inclusion for two transactions next to each other in the tree and whose IDs bound the target transaction ID.
- Sort by application order: after LIP 0026 comes into effect, transactions will be applied to the blockchain state in a specific order, chosen by the forger. Therefore, transactions will be already sorted in this way in the block payload. Using this sorting method to build the Merkle tree and consequently calculate the
transactionRoot
allows to implicitly make the application order an immutable part of the block header. This approach does not allow for an efficient proof-of-absence, however this is only a marginal downside as proofs-of-absence do not have relevant use-case scenarios in the current protocol.
As mentioned above, we choose option 2.
We assume LIP 0019 "Use full SHA-256 hash of transaction as transactionID" is already effective, and transaction IDs are 32 bytes long.
Given a Merkle tree and corresponding Merkle root merkleRoot
built from the array transactionIDsArray
according to the specifications defined in LIP 0031, the transactionRoot
is set as transactionRoot = merkleRoot
. Here transactionIDsArray
is an array of bytes
containing the serialized transaction IDs. The transactionRoot
is then stored in the block header instead of payloadHash
.
To verify the validity of the block, transactionIDsArray
is again built from the serialized transactions stored in the payload hash, keeping the same order, and used to calculate the Merkle root merkleRoot
. The transactionRoot
is valid if transactionRoot == merkleRoot
.
This proposal introduces a hard fork. Blocks forged with the new transactionRoot
are not valid in the current protocol and vice-versa.