Implement an alternative to TSMT #243

bobbinth · 2024-01-05T23:21:12Z

As mentioned in 0xPolygonMiden/miden-vm#1136, our TieredSmt turned out to be quite complex, and in the end, the performance benefits we get from it don't justify this complexity (in my opinion, at least). In my mind, there are two alternatives worth considering:

Something along the lines of a fully compacted SMT (e.g., Jellyfish Merkle Tree). This will likely to be as complex and performance within the VM would probably be worse (at lest in terms of cycle count), but at least we get full compaction and near-optimal efficiency outside of the VM.
Something much simpler and more efficient in terms of cycle count within the VM, though requiring more hashing outside of the VM. This approach could be very close to our SimpleSmt.

I think for now it makes sense to go for the simpler approach, even if it is somewhat less efficient outside of the VM. So, I'll focus on the second approach.

The end goal is to provide a data structure which emulates a key-value map with 4-element keys and 4-element values (we could consider values of arbitrary size, but for the first implementation, I'd probably stick with 4-element values). So, the interface could probably remain pretty close to what we have for TieredSmt.

The internal implementation can be just a simple SMT of depth 64 which behaves as follows:

When a key-value pair is inserted into the tree, a new leaf is created at depth 64. The index of the leaf is defined by the most significant element of the key. The pre-image of this leaf is hash(key || value).
If another key-value pair is inserted such that the key has the same most significant element as the previous key, the leaf pre-image becomes a hash of the sorted list of key-value pairs contained in that leaf (the list is sorted by key). For example, it could look like hash(key_0 || value_0 || key_1 || value_1).

A few notes:

I'm not sure if we need to use domain separation between the case of single-key leaf vs. multi-key leaf. Seems like we do not, but some more thinking is needed here.
We should optimize the implementation for the single-key case as it would be by far the most dominant.
For now, we could keep the implementation as simple as possible (probably can reuse most of SimpleSmt), but in the future we can optimize this quite a bit to reduce storage requirements. One example of potentially relevant optimizations is described here.

The text was updated successfully, but these errors were encountered:

plafer · 2024-01-09T18:59:25Z

Since the implementation is going to be close to SimpleSmt, I looked for ways to abstract the commonalities into a trait. I call NewSmt the new smt that we're designing here.

Note that this design wouldn't directly support future optimizations such as compressing Merkle proofs, but is flexible enough to allow for it in the future.

Theoretical perspective

The current SimpleSmt data structure can be generalized if we abstract away a few things. We see the SimpleSmt as implementing a function Key -> Value. We also use LeafIndex to mean "the index of a leaf as a u64" (an assumption widely used in our codebase due to NodeIndex).

Data types

Key: the domain of the function
- SimpleSmt: $\{ 0, ..., 2^{depth}-1 \}$
- NewSmt: Word
Value: The codomain of the function
- SimpleSmt and NewSmt: Word
Leaf: the type of a leaf
- SimpleSmt: Word
- NewSmt: enum { (LeafIndex, Word), Vec<(LeafIndex, Word)> }

Functions

In words, the abstract implementation is a map LeafIndex -> Leaf, where each concrete implementation defines how to derive a LeafIndex from a Key.

KeyToLeafIndex: Key -> LeafIndex: the function that converts a key into a leaf index infallibly
- SimpleSmt: this is the identity function $\{ 0, ..., 2^{depth}-1 \} \rightarrow LeafIndex$
- NewSmt: this function Word -> LeafIndex is implemented as returning key[0] (i.e. the most significant u64 of the key)

In words, the SimpleSmt (as implemented today) restricts its domain to be $\{ 0, ..., 2^{depth}-1 \}$, and maps keys directly to leaf indices (e.g. key 5 maps to leaf index 5). In contrast, the NewSmt takes the full domain Word, and maps to the smaller LeafIndex by looking at the first u64 in the Word, ultimately mapping many keys to the same LeafIndex. This then explains the chosen values for the Leaf type. SimpleSmt can store a Word directly, because the KeyToLeafIndex function is injective. In contrast, NewSmt's KeyToLeafIndex is not injective, so the leaf must store potentially multiple values.

Next, we'll then need to define how a Leaf hashes, how to insert a Value into a Leaf, etc. But these are relatively straightforward, so we omit them from this section.

Implementation perspective

In Rust, we would model things roughly as follows. Note that I used dummy names. Also, when the implementation is omitted, assume that today's SimpleSmt's implementation is to be used (with small modifications to fit the new abstractions).

type LeafIndex = u64;

trait SimpleSmtTrait {
    type Key: Into<LeafIndex>;
    type Value;
    type Leaf;

    // ABSTRACT METHODS
    // ---------------------------------------------------------------------------------------------

    /// The depth of the tree
    fn depth(&self) -> u8;

    /// The root of the tree
    fn root(&self) -> RpoDigest;

    /// Retrieves a leaf a the given index
    /// NOTE: Requires a change in current `SimpleSmt` (will need to use `unwrap_or_default()`)
    fn get_leaf(&self, leaf_index: LeafIndex) -> Option<&Self::Leaf>;

    /// Retrieves an inner node at the given index
    fn get_inner_node(&self, index: NodeIndex) -> BranchNode;

    /// Inserts a leaf node, and returns the value at the key if already exists
    fn insert_leaf_node(&self, key: Self::Key, value: Self::Value) -> Option<Self::Value>;

    /// Returns the hash of a leaf
    fn hash_leaf(v: Self::Leaf) -> RpoDigest;

    // PROVIDED METHODS
    // ---------------------------------------------------------------------------------------------

    fn get_node(
        &self,
        index: NodeIndex,
    ) -> Result<RpoDigest, MerkleError> {
        todo!()
    }

    fn insert(
        &mut self,
        key: Self::Key,
        value: Self::Value,
    ) -> Result<Option<Self::Value>, MerkleError> {
        todo!()
    }
}

// SIMPLE SMT
// =================================================================================================

// Assumption: we always know the depth of our trees at compile time
struct SimpleSmt<const DEPTH: u8> {
    root: RpoDigest,
    leaves: BTreeMap<u64, Word>,
    branches: BTreeMap<NodeIndex, BranchNode>,
}

impl<const DEPTH: u8> SimpleSmtTrait for SimpleSmt<DEPTH> {
    type Key = SimpleSmtKey<DEPTH>;
    type Value = Word;
    type Leaf = Word;

    fn depth(&self) -> u8 {
        DEPTH
    }

    fn root(&self) -> RpoDigest {
        self.root
    }

    fn get_leaf(&self, leaf_index: LeafIndex) -> Option<&Self::Leaf> {
        self.leaves.get(&leaf_index)
    }

    fn get_inner_node(&self, index: NodeIndex) -> BranchNode {
        // Same impl as in current `SimpleSmt`
        todo!()
    }

    fn insert_leaf_node(&self, key: Self::Key, value: Self::Value) -> Option<Self::Value> {
        todo!()
    }

    fn hash_value(v: Self::Value) -> RpoDigest {
        // hash(v)
        todo!()
    }

}

/// Encodes a value with constraints 0 < value < DEPTH,
/// where DEPTH is the depth of the `SimpleSmt`
struct SimpleSmtKey<const DEPTH: u8> {
    value: u64
}

impl<const DEPTH: u8> SimpleSmtKey<DEPTH> {
    pub fn new(value: u64) -> Result<Self, MerkleError> {
        if value < 2_u64.pow(DEPTH.into()) {
            Ok(Self { value })
        } else {
            Err(MerkleError)
        }
    }
}

// We can convert infaillibly to u64 because we checked that value is consistent with DEPTH when creating the `SimpleSmtKey`
impl<const DEPTH: u8> From<SimpleSmtKey<DEPTH>> for u64 {
    fn from(value: SimpleSmtKey<DEPTH>) -> Self {
        value.value
    }
}

// NEW SMT
// =================================================================================================

struct NewSmtKey(Word);

impl From<NewSmtKey> for LeafIndex {
    fn from(key: NewSmtKey) -> Self {
        // return key[0] (i.e. most significant felt)
        todo!()
    }
}

enum NewSmtLeaf {
    Single((u64, Word)),
    Multiple(Vec<(u64, Word)>)
}

// Assumption: we always know the depth of our trees at compile time
struct NewSmt<const DEPTH: u8> {
    root: RpoDigest,
    leaves: BTreeMap<u64, NewSmtLeaf>,
    branches: BTreeMap<NodeIndex, BranchNode>,
}

impl<const DEPTH: u8> SimpleSmtTrait for NewSmt<DEPTH> {
    type Key = NewSmtKey;
    type Value = Word;
    type Leaf = NewSmtLeaf;

    fn depth(&self) -> u8 {
        DEPTH
    }

    fn root(&self) -> RpoDigest {
        self.root
    }

    fn get_leaf(&self, leaf_index: LeafIndex) -> Option<&Self::Leaf> {
        self.leaves.get(&leaf_index)
    }

    fn get_inner_node(&self, index: NodeIndex) -> BranchNode {
        // Same impl as in current `SimpleSmt`
        todo!()
    }

    fn insert_leaf_node(&self, key: Self::Key, value: Self::Value) -> Option<Self::Value> {
        // If value already exists, convert to `NewSmtLeaf::Multiple`, etc.
        todo!()
    }

    fn hash_value(v: Self::Value) -> RpoDigest {
        // hash(key0 || value0 || key1 || value1 || ...)
        todo!()
    }

}

plafer · 2024-01-22T20:18:34Z

Closed by #254

bobbinth assigned plafer Jan 5, 2024

bobbinth added this to the v0.8 milestone Jan 5, 2024

bobbinth mentioned this issue Jan 5, 2024

Tracking PR for v0.8.0 release #194

Merged

12 tasks

plafer mentioned this issue Jan 10, 2024

Introduce SparseMerkleTree trait #245

Merged

7 tasks

bobbinth mentioned this issue Jan 11, 2024

Alternatives to TSMT 0xPolygonMiden/miden-vm#1136

Closed

Dominik1999 mentioned this issue Jan 15, 2024

Tiered SMT implementation for 64-bit keys #186

Closed

plafer closed this as completed Jan 22, 2024

plafer mentioned this issue Jan 22, 2024

Remove TieredSmt #261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement an alternative to TSMT #243

Implement an alternative to TSMT #243

bobbinth commented Jan 5, 2024

plafer commented Jan 9, 2024 •

edited

Loading

plafer commented Jan 22, 2024

Implement an alternative to TSMT #243

Implement an alternative to TSMT #243

Comments

bobbinth commented Jan 5, 2024

plafer commented Jan 9, 2024 • edited Loading

Theoretical perspective

Data types

Functions

Implementation perspective

plafer commented Jan 22, 2024

plafer commented Jan 9, 2024 •

edited

Loading