Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an alternative to TSMT #243

Closed
Tracked by #194
bobbinth opened this issue Jan 5, 2024 · 2 comments
Closed
Tracked by #194

Implement an alternative to TSMT #243

bobbinth opened this issue Jan 5, 2024 · 2 comments
Assignees
Milestone

Comments

@bobbinth
Copy link
Contributor

bobbinth commented Jan 5, 2024

As mentioned in 0xPolygonMiden/miden-vm#1136, our TieredSmt turned out to be quite complex, and in the end, the performance benefits we get from it don't justify this complexity (in my opinion, at least). In my mind, there are two alternatives worth considering:

  1. Something along the lines of a fully compacted SMT (e.g., Jellyfish Merkle Tree). This will likely to be as complex and performance within the VM would probably be worse (at lest in terms of cycle count), but at least we get full compaction and near-optimal efficiency outside of the VM.
  2. Something much simpler and more efficient in terms of cycle count within the VM, though requiring more hashing outside of the VM. This approach could be very close to our SimpleSmt.

I think for now it makes sense to go for the simpler approach, even if it is somewhat less efficient outside of the VM. So, I'll focus on the second approach.

The end goal is to provide a data structure which emulates a key-value map with 4-element keys and 4-element values (we could consider values of arbitrary size, but for the first implementation, I'd probably stick with 4-element values). So, the interface could probably remain pretty close to what we have for TieredSmt.

The internal implementation can be just a simple SMT of depth 64 which behaves as follows:

  • When a key-value pair is inserted into the tree, a new leaf is created at depth 64. The index of the leaf is defined by the most significant element of the key. The pre-image of this leaf is hash(key || value).
  • If another key-value pair is inserted such that the key has the same most significant element as the previous key, the leaf pre-image becomes a hash of the sorted list of key-value pairs contained in that leaf (the list is sorted by key). For example, it could look like hash(key_0 || value_0 || key_1 || value_1).

A few notes:

  • I'm not sure if we need to use domain separation between the case of single-key leaf vs. multi-key leaf. Seems like we do not, but some more thinking is needed here.
  • We should optimize the implementation for the single-key case as it would be by far the most dominant.
  • For now, we could keep the implementation as simple as possible (probably can reuse most of SimpleSmt), but in the future we can optimize this quite a bit to reduce storage requirements. One example of potentially relevant optimizations is described here.
@bobbinth bobbinth added this to the v0.8 milestone Jan 5, 2024
@plafer
Copy link
Contributor

plafer commented Jan 9, 2024

Since the implementation is going to be close to SimpleSmt, I looked for ways to abstract the commonalities into a trait. I call NewSmt the new smt that we're designing here.

Note that this design wouldn't directly support future optimizations such as compressing Merkle proofs, but is flexible enough to allow for it in the future.

Theoretical perspective

The current SimpleSmt data structure can be generalized if we abstract away a few things. We see the SimpleSmt as implementing a function Key -> Value. We also use LeafIndex to mean "the index of a leaf as a u64" (an assumption widely used in our codebase due to NodeIndex).

Data types

  • Key: the domain of the function
    • SimpleSmt: $\{ 0, ..., 2^{depth}-1 \}$
    • NewSmt: Word
  • Value: The codomain of the function
    • SimpleSmt and NewSmt: Word
  • Leaf: the type of a leaf
    • SimpleSmt: Word
    • NewSmt: enum { (LeafIndex, Word), Vec<(LeafIndex, Word)> }

Functions

In words, the abstract implementation is a map LeafIndex -> Leaf, where each concrete implementation defines how to derive a LeafIndex from a Key.

  • KeyToLeafIndex: Key -> LeafIndex: the function that converts a key into a leaf index infallibly
    • SimpleSmt: this is the identity function $\{ 0, ..., 2^{depth}-1 \} \rightarrow LeafIndex$
    • NewSmt: this function Word -> LeafIndex is implemented as returning key[0] (i.e. the most significant u64 of the key)

In words, the SimpleSmt (as implemented today) restricts its domain to be $\{ 0, ..., 2^{depth}-1 \}$, and maps keys directly to leaf indices (e.g. key 5 maps to leaf index 5). In contrast, the NewSmt takes the full domain Word, and maps to the smaller LeafIndex by looking at the first u64 in the Word, ultimately mapping many keys to the same LeafIndex. This then explains the chosen values for the Leaf type. SimpleSmt can store a Word directly, because the KeyToLeafIndex function is injective. In contrast, NewSmt's KeyToLeafIndex is not injective, so the leaf must store potentially multiple values.

Next, we'll then need to define how a Leaf hashes, how to insert a Value into a Leaf, etc. But these are relatively straightforward, so we omit them from this section.

Implementation perspective

In Rust, we would model things roughly as follows. Note that I used dummy names. Also, when the implementation is omitted, assume that today's SimpleSmt's implementation is to be used (with small modifications to fit the new abstractions).

type LeafIndex = u64;

trait SimpleSmtTrait {
    type Key: Into<LeafIndex>;
    type Value;
    type Leaf;

    // ABSTRACT METHODS
    // ---------------------------------------------------------------------------------------------

    /// The depth of the tree
    fn depth(&self) -> u8;

    /// The root of the tree
    fn root(&self) -> RpoDigest;

    /// Retrieves a leaf a the given index
    /// NOTE: Requires a change in current `SimpleSmt` (will need to use `unwrap_or_default()`)
    fn get_leaf(&self, leaf_index: LeafIndex) -> Option<&Self::Leaf>;

    /// Retrieves an inner node at the given index
    fn get_inner_node(&self, index: NodeIndex) -> BranchNode;

    /// Inserts a leaf node, and returns the value at the key if already exists
    fn insert_leaf_node(&self, key: Self::Key, value: Self::Value) -> Option<Self::Value>;

    /// Returns the hash of a leaf
    fn hash_leaf(v: Self::Leaf) -> RpoDigest;

    // PROVIDED METHODS
    // ---------------------------------------------------------------------------------------------

    fn get_node(
        &self,
        index: NodeIndex,
    ) -> Result<RpoDigest, MerkleError> {
        todo!()
    }

    fn insert(
        &mut self,
        key: Self::Key,
        value: Self::Value,
    ) -> Result<Option<Self::Value>, MerkleError> {
        todo!()
    }
}

// SIMPLE SMT
// =================================================================================================

// Assumption: we always know the depth of our trees at compile time
struct SimpleSmt<const DEPTH: u8> {
    root: RpoDigest,
    leaves: BTreeMap<u64, Word>,
    branches: BTreeMap<NodeIndex, BranchNode>,
}

impl<const DEPTH: u8> SimpleSmtTrait for SimpleSmt<DEPTH> {
    type Key = SimpleSmtKey<DEPTH>;
    type Value = Word;
    type Leaf = Word;

    fn depth(&self) -> u8 {
        DEPTH
    }

    fn root(&self) -> RpoDigest {
        self.root
    }

    fn get_leaf(&self, leaf_index: LeafIndex) -> Option<&Self::Leaf> {
        self.leaves.get(&leaf_index)
    }

    fn get_inner_node(&self, index: NodeIndex) -> BranchNode {
        // Same impl as in current `SimpleSmt`
        todo!()
    }

    fn insert_leaf_node(&self, key: Self::Key, value: Self::Value) -> Option<Self::Value> {
        todo!()
    }

    fn hash_value(v: Self::Value) -> RpoDigest {
        // hash(v)
        todo!()
    }

}

/// Encodes a value with constraints 0 < value < DEPTH,
/// where DEPTH is the depth of the `SimpleSmt`
struct SimpleSmtKey<const DEPTH: u8> {
    value: u64
}

impl<const DEPTH: u8> SimpleSmtKey<DEPTH> {
    pub fn new(value: u64) -> Result<Self, MerkleError> {
        if value < 2_u64.pow(DEPTH.into()) {
            Ok(Self { value })
        } else {
            Err(MerkleError)
        }
    }
}

// We can convert infaillibly to u64 because we checked that value is consistent with DEPTH when creating the `SimpleSmtKey`
impl<const DEPTH: u8> From<SimpleSmtKey<DEPTH>> for u64 {
    fn from(value: SimpleSmtKey<DEPTH>) -> Self {
        value.value
    }
}

// NEW SMT
// =================================================================================================

struct NewSmtKey(Word);

impl From<NewSmtKey> for LeafIndex {
    fn from(key: NewSmtKey) -> Self {
        // return key[0] (i.e. most significant felt)
        todo!()
    }
}

enum NewSmtLeaf {
    Single((u64, Word)),
    Multiple(Vec<(u64, Word)>)
}

// Assumption: we always know the depth of our trees at compile time
struct NewSmt<const DEPTH: u8> {
    root: RpoDigest,
    leaves: BTreeMap<u64, NewSmtLeaf>,
    branches: BTreeMap<NodeIndex, BranchNode>,
}

impl<const DEPTH: u8> SimpleSmtTrait for NewSmt<DEPTH> {
    type Key = NewSmtKey;
    type Value = Word;
    type Leaf = NewSmtLeaf;

    fn depth(&self) -> u8 {
        DEPTH
    }

    fn root(&self) -> RpoDigest {
        self.root
    }

    fn get_leaf(&self, leaf_index: LeafIndex) -> Option<&Self::Leaf> {
        self.leaves.get(&leaf_index)
    }

    fn get_inner_node(&self, index: NodeIndex) -> BranchNode {
        // Same impl as in current `SimpleSmt`
        todo!()
    }

    fn insert_leaf_node(&self, key: Self::Key, value: Self::Value) -> Option<Self::Value> {
        // If value already exists, convert to `NewSmtLeaf::Multiple`, etc.
        todo!()
    }

    fn hash_value(v: Self::Value) -> RpoDigest {
        // hash(key0 || value0 || key1 || value1 || ...)
        todo!()
    }

}

@plafer
Copy link
Contributor

plafer commented Jan 22, 2024

Closed by #254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants