Namespaced Merkle Tree (NMT) is one of the core components of the Celestia blockchain. Transactions in Celestia are associated with a namespace which signifies the application they belong to. Nodes interested in a specific application only need to download transactions of a certain namespace. The Namespaced Merkle Tree (NMT) was introduced in the LazyLedger article to organize transactions in Celestia blocks based on their namespaces. The NMT allows for efficient and verifiable queries of application-specific transactions by accessing based on the block header which contains the NMT root. This specification explains the NMT data structure and provides an overview of its current implementation in this repository. The NMT library, which is the implementaion of NMT data structure, is explained in the NMT Library document.
Namespaced Merkle Tree, at its core, is a normal Merkle tree that employs a modified hash function, namely a namespaced hash to ensure each node in the tree encompasses the range of namespaces of its descendants' messages.
Messages stored in the NMT leaves are all namespace-prefixed with the format <NsID>||<Message Data>
and arranged in ascending order based on their namespaces.
All namespace identifiers have a fixed and known size.
NMT utilizes a namespaced hash function i.e., NsH()
, which in addition to the normal digest calculation, it returns the range of namespaces covered by a node's children. The hash output is formatted as minNs||maxNs||h(.)
, where minNs
is the lowest namespace among all the node's descendants, maxNs
is the highest, and h(.)
represents the hash digest of .
(e.g., SHA256(.)
).
Leaf Nodes: Each leaf in the tree represents the namespaced hash of a namespaced message d = <NsID>||<Message Data>
.
The hash is computed as follows:
NsH(d) = d.NsID || d.NsID || h(0x00, d)
The inclusion of the 0x00
value in the hash calculation serves as a leaf prefix and is done to conform to RFC 6962.
Non-leaf Nodes: For an intermediary node n
of the NMT with children
l
= l.minNs || l.maxNs || l.hash
and
r
= r.minNs || r.maxNs || r.hash
the namespaced hash is calculated as
NsH(n) = min(l.minNs, r.minNs) || max(l.maxNs, r.maxNs) || h(0x01, l, r)
The inclusion of the 0x01
value in the hash calculation serves as a non-leaf prefix and is done to conform to RFC 6962.
In an NMT data structure, the minNs
and maxNs
values of the root node denote the minimum and maximum namespaces, respectively, of all messages within the tree.
An example of an NMT is shown in the figure below which utilizes SHA256 as the underlying hash function and namespace size of 1
byte.
The code snippets necessary to create this tree are provided in NMT Library documentation, and the data items and tree nodes are represented as hex strings.
For the sake of brevity, we have only included the first 7 hex digits of SHA256 for each namespace hash.
00 03 b1c2cc5 Tree Root
/ \
/ \
NsH() NsH()
/ \
/ \
00 00 ead8d25 01 03 52c7c03 Non-Leaf Nodes
/ \ / \
NsH() NsH() NsH() NsH()
/ \ / \
00 00 5fa0c9c 00 00 52385a0 01 01 71ca46a 03 03 b4a2792 Leaf Hashes
| | | |
NsH() NsH() NsH() NsH()
| | | |
00 6c6561665f30 00 6c6561665f31 01 6c6561665f32 03 6c6561665f33 Leaves with namespaces
0 1 2 3 Leaf Indices
Figure 1.
NMT supports standard Merkle tree functionalities, including inclusion and range proofs, and offers namespace querying and proof generation. The following enumerated cases explain potential outcomes when querying an NMT for a namespace.
When the queried namespace NS
has corresponding items in the tree with root T
, the query is resolved by a
namespace inclusion proof which consists of:
- The starting index
start
and the ending indexend
of the leaves that matchNS
. - Nodes of the tree that are necessary for the regular Merkle range proof of
[start, end)
toT
. In specific, the nodes include 1) the namespaced hash of the left siblings for the Merkle inclusion proof of thestart
leaf and 2) the namespaced hash of the right siblings of the Merkle inclusion proof of theend-1
leaf. Nodes are sorted according to the in-order traversal of the tree.
For example, the NMT proof of NS = 0
in Figure 1 would be [start = 0, end = 2)
and the Merkle inclusion proof embodies one single tree node i.e., 01 03 52c7c03
.
If the namespace being queried falls within the range of namespaces in the tree root, but there is no corresponding message, an absence proof will be generated.
An absence proof asserts that no message in the tree matches the queried namespace NS
.
The absence proof consists of:
- The index of a leaf of the tree that
- its namespace is the smallest namespace larger than
NS
and - the namespace of the leaf to the left of it is smaller than
NS
.
- its namespace is the smallest namespace larger than
- A regular Merkle inclusion proof for the said leaf to the tree root
T
. - The hash of the said leaf denoted by
LeafHash
.
Note that the proof only requires the hash of the leaf, not its underlying message.
This is because the aim of the proof is to demonstrate the absence of NS
.
In Figure 1, if we query for NS = 2
, we will receive an absence proof since there is no matching item for it.
The index of the leaf included in the proof will be 3
, which corresponds to the node 03 03 b4a2792
.
The Merkle inclusion proof for this leaf consists of the following nodes, in the order they appear in an in-order traversal of the tree: 00 00 ead8d25
and 01 01 71ca46a
.
If the requested namespace falls outside the namespace range represented by the tree root T
, the query will be resolved with an empty namespace proof.
In Figure 1, a query for NS=6
would be responded by an empty proof as it falls outside the root's namsespace range.
An NMT inclusion proof is deemed valid if it meets the following:
- The namespace of the leaves in the returned range
[start, end)
match the queriedNS
. - Inclusion: The supplied Merkle proof for the range
[start, end)
is valid for the given tree rootT
. - Completeness: There are no other leaves matching
NS
that do not belong to the returned range[start, end)
.
Proof inclusion can be verified via a regular Merkle range proof verification.
However, completeness of the proof requires additional checks.
Specifically, 1) the maximum namespace of the nodes in the proof that are on the left side of the branch connecting the start
leaf to the root must be less than the provided namespace (NS
), and 2) the minimum namespace of the nodes in the proof that are on the right side of the branch connecting the
end-1
leaf to the root must be greater than the provided namespace (NS
).
As an example, the namespace proof for NS = 0
for the NMT of Figure 1 (which consists of one single node i.e., 01 03 52c7c03
) is complete. This is because the node 01 03 52c7c03
is located on the right side of the branch connecting the leaf at index end = 1
to the root and its minNs
value is 01
, which is greater than NS = 0
.
An NMT absence proof is deemed valid if it meets the following:
- The minimum namespace of the leaf in the proof is greater than the queried
NS
. - The verification of Merkle inclusion proof of the returned leaf is valid.
- It satisfies the proof completeness as explained in the Verification of NMT Inclusion Proof. Note that hash of the leaf does not have to be verified against the underlying message.
If the queried NS
falls outside the namespace range of the tree root, or the namespace tree is empty, then an empty NMT proof is valid by definition.
NMTs also support regular index-based Merkle Proof that allows for both Merkle inclusion proof for a single leaf index and Merkle range proof for a range of leaf indices [start, end)
,
where end
is greater than start
.
The Merkle inclusion proof for a single leaf is actually a special case of the Merkle range proof where the range represents a single leaf index, or in other words, when end
is equal to start+1
.
As such, we only focus on the Merkle range proof in this section.
The start
and end
are zero-based indices of the leaves in the tree, where start
is inclusive and end
is exclusive.
The start
ranges from 0
to n-1
, where n
is the number of leaves in the tree, while end ranges from 1
to n
.
A range query for a range of [start, end)
is answered by a proof
consisting of two pieces of data: nodes
and start
/end
.
The proof generation and verification logic explained here is based on the assumption that all the leaves within the queried range have the same namespace.
The assumption is to conform to the current implementation of the NMT library, however, it is not a limitation imposed by the NMT data structure.
-
The
nodes
data is a set of tree nodes that constitutes the Merkle range proof of[start, end)
to the tree with root T. This includes the left siblings of the nodes along the branch connecting thestart
leaf to the NMT root and the right siblings of the nodes along the branch connecting theend-1
leaf to the root. The nodes are sorted based on the in-order traversal of the tree. -
The
start
andend
data represent the starting and ending index of the retrieved leaves, respectively, whereend
is non-inclusive. This is the same as the queried range. -
In the event that the queried range is outside the range of the NMT leaves indices, an empty
proof
is returned. This emptyproof
consists of an empty set ofnodes
and an empty range, wherestart
is0
andend
is also0
.
Let leaves
refer to the namespaced messages of the leaves in the queried range [start, end)
, while T
represents the root of the tree against which the proof
is being verified.
The proof
can be verified by taking the following steps:
- Verify that all the leaves are namespaced messages with the same namespace.
- Compute the tree root
T'
using the leaves and theproof.nodes
. If the computed rootT'
is equal to the expected rootT
of the tree, then theproof
is valid. To compute the tree rootT'
, the namespaced hash function should be utilized.
The short namespace absence proof is a more efficient variant of the regular namespace absence proof.
It differs from the original namespace absence proof definition where instead of providing the inclusion proof of the LeafHash
to the root (as proof.nodes
),
a short absence proof supplies the Merkle inclusion proof of one of the predecessors of the LeafHash
.
This predecessor is located along the branch connecting the LeafHash
to the root.
Importantly, the namespace range of this predecessor does not overlap with the queried namespace i.e., the absent namespace.
As this predecessor is located closer to the root compared to the LeafHash
, it will have shorter Merkle inclusion proof i.e., lower number of elements in the proof.nodes
.
At present, the NMT library does not support the generation of short namespace absence proofs. However, it is capable of correctly verifying such proofs.
More formally, the short namespace absence proof consists of the following components:
SubtreeHash
: To compute theSubtreeHash
, the following steps should be followed:- Find the index of a leaf in the tree that meets two conditions:
- Its namespace is the smallest namespace greater than
NS
. - The namespace of the leaf to its left is smaller than
NS
.
- Its namespace is the smallest namespace greater than
- Traverse up the branch connecting that leaf to the root and locate one of the parents/grandparents of that leaf whose namespace range does not overlap with the queried namespace.
The
SubtreeHash
is the hash of that node.
- Find the index of a leaf in the tree that meets two conditions:
start
andend
range: These represent the indices of theSubtreeHash
within its respective level. Nodes at each level are indexed from left to right starting at index0
.nodes
: This set comprises the index-based Merkle inclusion proof of theSubtreeHash
to the tree rootT
.
Below, we illustrate the short namespace absence proof for namespace NS = 02
in an 8-leaf tree:
The namespace 03
is the smallest namespace larger than 02
.
By traversing the branch from the leaf with namespace 03
to the root, we find a node with hash 03 04 52c7c03
whose namespace range doesn't overlap with 02
.
This node is the highest such node along the branch.
The SubtreeHash
is the hash of that node, which is 03 04 52c7c03
.
The start
and end
indices indicate its position in the respective level.
In this case, start = 1
and end = 2
.
Note that node indices start at 0
from left to right at each level.
The nodes
form the index-based Merkle inclusion proof of the SubtreeHash
to the tree root T
.
The nodes
set includes 00 00 ead8d25
, the left sibling of 03 04 52c7c03
.
In summary, the short namespace absence proof for NS = 02
in this tree consists of SubtreeHash = 03 04 52c7c03
, start = 1
, end = 2
, and the nodes
set containing 00 00 ead8d25
.
00 04 b1c2cc5 Tree Root
/ \
/ \
NsH() NsH()
/ \
/ \
/ -------------
00 00 ead8d25 |03 04 52c7c03| Non-Leaf Nodes
/ \ -------------
/ \ / \
NsH() NsH() NsH() NsH()
/ \ / \
00 00 5fa0c9c 00 00 52385a0 03 03 71ca46a 04 04 b4a2792 Leaf Hashes
| | | |
NsH() NsH() NsH() NsH()
| | | |
00 6c6561665f30 00 6c6561665f31 03 6c6561665f32 04 6c6561665f33 Leaves with namespaces
0 1 2 3 Leaf Indices
A short namespace absence proof is deemed valid if it meets the following:
- The minimum namespace of the
SubtreeHash
in theproof
is greater than the queriedNS
. It is important to note that there can be multiple candidates, each belonging to a different level of the tree, that satisfy this requirement for theSubtreeHash
. All such candidates are deemed valid from the verification perspective. - The verification of Merkle inclusion proof of the returned
SubtreeHash
is valid. - The
proof
satisfies the proof completeness as explained in the Verification of NMT Inclusion Proof. That is, all nodes on the left side of the branch connectingSubtreeHash
to the rootT
have a maximum namespace less than the queriedNS
. Conversely, all nodes on the right side are expected to have a minimum namespace larger thanNS
.
- Al-Bassam, Mustafa. "Lazyledger: A distributed data availability ledger with client-side smart contracts." arXiv preprint arXiv:1905.09274 (2019).
- The outdated specification of NMT https://github.com/celestiaorg/celestia-specs/blob/master/src/specs/data_structures.md#namespace-merkle-tree
- NMT library https://github.com/celestiaorg/nmt