Compact Sparse Merkle Trees


#1

A novel approach to SMTs which provides efficient non-membership proofs, allowing for the creation of new and faster blockchains similar to Plasma Cash.

You can find the full paper here https://osf.io/8mcnh/.

To achieve this C-SMT, we need to augment the tree nodes to contain a parameter called
max-key.
So for every non-leaf node, it’s two children will have max-key values representing the maximum key in their respective subtree. An incoming key to be inserted in this SMT would get put in the subtree for which it’s binary distance is closest. And it will recursively go down the tree following this approach until it reaches a leaf node.
The key gets inserted at this level and the hashes and max-key values are adjusted as we recurse back.
The path which the key follows down the tree is called minimum distance path and we call this approach for inserting a key as Minimum distance path algorithm.
You can read more about the approach in the paper link provided above.


#2

I’m not sure I fully understand the construction. In a normal sparse Merkle tree, if you’re deciding whether to include a value in the left or right subtree, you simply check the first bit of the key, and if it’s 0 go left and if it’s 1 go right. Here, it seems like you’re doing something different.

In this code:

left = root.left
right = root.right
l_dist = distance(k, left.key)
r_dist = distance(k, right.key)

It seems like each subtree has a parameter called a “key”. Is this the same as the maxkey? What happens if the right subtree is empty; what’s the maxkey then?

Also, what is the argument that it’s nearly balanced? What happens if I introduce, in order, 0xfeeee…ee, 0xffeee…ee, 0xfffee…ee, 0xffffe…ee … 0xfffff…e, 0xfffff…ff? Wouldn’t that create a tree of depth ~256 with ~256 elements?


#3

Hi @vbuterin, thanks for going through the paper.

The construction here indeed is different from a normal sparse Merkle tree.
The idea is to place every key in it’s correct subtree. To achieve that I calculate a parameter called distance which is defined as the binary separation between two keys. This parameter will tell us whether incoming key will lie in either of the subtrees or will lie in it’s own subtree. For e.g, we have values 1 and 3 in the left and right node so our tree looks some thing like this.

        h(h(1)+h(3))
          /        \
         /          \
     h(1)          h(3)

Now consider an incoming key 2, for left subtree distance(1,2) is 2 while for right subtree distance(3,2) is 1. So the key 2 lies in right subtree.

An incoming key 4 will have distance(1,4) and distance(3,4) equal to 3. So in this case the key 4 does not lie in either of the subtree. And as 4 is greater than max-key 3, the root will become the left child and 4 will become the right child. For a key 0, the reverse will happen.

The parameter key for leaf node is the actual key. For inner nodes, the key parameter is max of it’s children’s keys. I have used the same parameter to make it more succinct.

No case occurs where one of the subtrees is empty, it’s either a leaf node, or a completer inner node so either 2 children or 0.

The tree gets nearly balanced as the hash function SHA256 behaves as an ideal hash function. The hash function SHA256 will take an input and output a value in the range of 0 to
2^256. This assumption is called random oracle model in cryptography.

The following explaination is taken from here.

The random oracle model is a model where all parties (e.g. algorithms, adversaries) have oracle access to a (uniformly) random function
RO : \{0, 1\}^∗ → \{0, 1\}^n

(we can be flexible about the output length, but for now let’s just insist on n bits.)
We can think of this as whenever a fresh value x is queried, the oracle chooses a random
output y. The next time that x is queried, the oracle gives back the same y as previously.

Hope this answers your questions. Please let me know if you have any other comments, I’ll appreciate that.


#4

What does correct subtree mean in that regards is it just the condition that the Leaves are Ordered by key (in respect to a in-order Traversal), or is their more that defines the correct subtree?

Will this construction return the Same root hash when the tree contains the same elements or is the root hash dependent on the insertion order of the elements?

If seen that the construction supports Membership and Non-Membership Proofs is it possible to Proof a correct insertion or correct deletion to someone only having the root hash?


#5

The correct subtree for a key to be inserted is the closest subtree in which the key lies.
Yes if you do an in-order traversal and exclude all non-leaf nodes, you’ll get an ordered list of keys.

Consider this tree, inner-nodes specify the max-key values and the leaf nodes contain the actual key.

           7
       /       \
     3           7
    / \         / \
  1     3     5     7
             / \   / \
            4   5 6   7

So if we want to insert 2, the correct subtree lies with 3

      x
    /  \
   2    3

So the algorithm will recursively reach till 3, and pair up 2 with 3, so the tree will look like this now

           7
       /       \
     3           7
    / \         /  \ 
   1    3      5    7
       / \    / \   / \
      2   3  4   5 6   7

It will return the same root hash regardless of the order of insertion of elements. Because all element in both the case will end up at the same places.

Yes it is possible to prove an element is in the tree, the membership does exactly that. To prove an element does not exist in the tree, the non-membership proof gives you proofs of two closest key which bound the key for which it is being queried.


#6

@vbuterin @tawarien

I have released a framework which includes a module for Compact Sparse Merkle trees.