Optimizing sparse Merkle trees

But then that doesn’t distinguish between x existing at different positions…

I was speaking in terms of an SMT that commits to unordered sets, in which case it never makes sense to place the same value at different paths. Here is one way to do it for a key-value mapping.

Let the key-value map be \{(k_i, v_i) | i \in I\} where and i \ne j \implies k_i \ne k_j). Let H be keccak-256. Create a complete binary merkle tree of depth 256 and store (k_i, v_i) at the H(k_i)-th leaf from the left. Replace each subtree which contains exactly one non-empty leaf with the leaf itself. This results in a binary tree that is not complete. Replace empty leaf with 0 and each non-empty leaf x with (1, x). Replace every non-leaf node by the hash of its two children, forming a merkle tree.

An inclusion proof that k maps to v is a merkle inclusion proof of (1, k, v), whose path is some prefix of H(k), i.e., an integer t \le 256, a path p \in \{0,1\}^{t} and t intermediate nodes of type bytes32; the checker verifies this proof by starting with the value (1, k, v) hashing with the provided intermediate nodes either on the left or right as directed by p, and comparing it to the root. An exclusion proof is either a merkle inclusion proof of 0 or an inclusion proof of (1, k', v') whose path is some prefix of k and where k' \ne k.

1 Like

Hi everyone,
I’d like to share 2 implementations that I hope you might find useful as they seem similar to ideas expressed above : a standard SMT and a modified SMT of height log(N)

https://github.com/aergoio/SMT
This is a standard binary SMT implementation of height 256 with the following characteristics :

  • Node batching (1 db transaction loads 30 nodes : ie a subtree of height 4 with 16 leaves). Batch nodes are loaded into an array and can be concurrently updated.
  • Reduced data storage (if a subtree contains only 1 key then the branch is not stored and the root points to the KV pair directly
  • Parallel update of multiple sorted keys

https://github.com/aergoio/aergo/tree/master/pkg/trie
This implementation modifies the SMT in the following way : the value of a key is not stored at height 0 but instead at the highest subtree containing only that key. The leaf node of keys is [key, value, height].
The benefit here is that the tree height is log(N) and updating a key requires log(N) hashing operations (256 hashing operations becomes too slow if the tree is used to update thousands of keys/sec).

It also has node batching and parallel updates like the standard SMT implementation.

The size of a proof of inclusion and non-inclusion is log(N).
A proof of non-inclusion can be of 2 types :

  • A proof that another key’s leaf node is already on the path of the non-included key
  • Or a proof that an empty subtree is on the path of the non-included key

Optimization to come : using H(0,0) = 0

I actually think that this is equivalent to a simple sparse Merkle tree using the following hash function:

H(0, 0) = 0
H(0, (k, v)) = (k+``1", v)
H((k, v), 0) = (k+``0", v)
H(x \ne 0, y \ne 0) = sha3(x, y)

When putting values into the tree, a value v is replaced by (``", v). This hash function is collision-resistant, which you can prove piecewise and then finish by showing domain independence (cases 2 and 3 clearly cannot collide with each other, cases 2/3 cannot collide with case 4 because they give outputs longer than 32 bytes,and case 1 cannot collide with anything because cases 2 and 3 can’t give a 0 because by preimage resistance finding a value that hashes to 0 is infeasible.

The only argument I have against it is that it’s somewhat uglier, because the values aren’t cleanly 32 bytes anymore, instead they go up to 64 bytes, and because we need to deal with encodings for arbitrary-bit-length strings. I guess it depends on just how expensive hashes are.

3 Likes

Agreed, although the update algorithm is different because when adding a key in the aergo trie implementation if an empty subtree is reached, a Leaf = Hash(key,value,height) is created and there is no need to iterate the branch.
key and value are stored in place of the imaginary left and right subtree roots of the Leaf node for easy serialization
The readme has diagrams :slight_smile:

Is it different? The result of iterating the branch is simple, it’s just (binarystring(key), value). So I suppose you just get an additional shortcut over doing ~256-log(N) loops.

Would this tweaked hashing still fit for non-membership proving?

Yep! Don’t see why not.

@vbuterin I understand that this would allow for the mekle proof to be of variable number of nodes.
So if i use a Bulletproofs circuit to prove knowledge of a leaf in the tree, the proof size can give an approximate idea of where the leaf can be in tree. Am i wrong on this?

Actually this makes it so that you can have a Merkle proof always have the exact same number of nodes (256), using compression at a higher layer to bring the scheme back to O(log(N)) efficiency.

@vbuterin I think you are referring to using compress_proof and decompress_proof on a naive sparse merkle tree. I was referring to implementations in new_bintrie_optimized.py and new_bintrie_hex.py.

Those implementations are just changes to how the data is stored in the database, they don’t actually change the verification that gets executed. new_bintrie_optimized.py, new_bintrie_hex.py and the simple SMT are all different algorithms to implement the exact same thing. So in either case you’d be doing a bulletproof over 256 hashes.

The number of proof nodes is quite different in case of simple SMT and new_bintrie_optimized.py. Since number of hash calls only depends on the number of proof nodes, there are much less hash calls in new_bintrie_optimized.py. I did some basic benchmarking on the forked code and for the same number of leaves, the simple SMT always has 256 proof nodes as expected whereas the optimized one has 10-15. You can find the benchmark code here.

That’s because each proof node in the more complex implementations is a “virtual proof node” that often stands in for many “actual proof nodes”.

1 Like

there is an implementation of the optimized tree in JS and Solidity here:


We used it to store quadratic vote balances in ERC-1948 tokens at a pop-up democracy.

Looks like Facebook Libra has a similar optimization:

Together with our partner Deora we at LeapDAO took a deep dive into Quadratic Voting over the last year.
Today we want to present you with our findings after analyzing the outcomes of two very different events, at the first convention of Volt Germany and ETHTurin.
Additionally you will find a short overview of our technical implementation including sparse Merkle trees in our blog post here: https://ipfs.leapdao.org/blog/quadratic-voting/

1 Like

In case someone is looking at this in 2021: This bitmap for indicating where non-zero proofs are needed might not be necessary if your root generation function anyways ingests the leaf index that is supposed to be written and fills up the tree from left to right.

Because I didn’t know, I first ended up implementing the optimization compression proof outlined in the original post of this thread. My commit: Optimizing sparse merkle tree · rugpullindex/indexed-sparse-merkle-tree@dac73bf · GitHub

But then, after looking at the ETH2 deposit contract implementation, I realized that here simply the “size” AKA highest leaf index in the tree is used to determine if a partial proof is a “non-zero” leaf or not: consensus-specs/deposit_contract.sol at 34fc0a5d09fae6649e0c6ac7a0cb09ff5a999957 · ethereum/consensus-specs · GitHub

So I ended up imitating that strategy in my SMT library and also went with using the index parameter to primarily safe gas.

You can find all of the results here: GitHub - rugpullindex/indexed-sparse-merkle-tree

@jbaylina

We now also ship a gas benchmark file: indexed-sparse-merkle-tree/.gas-benchmark at 36ae144f78fb8ffc9111fd45bd7f024b01ad7aa7 · rugpullindex/indexed-sparse-merkle-tree · GitHub

One thing that hasn’t been discussed yet is that both our index and proofBits variable actually take quite some gas, particularly when having to store and load it from and to memory many times.

To reduce gas costs further, I was discussing with @pinkiebell to sort the proofs array. But it’s tough to wrap my head around it momentarily. If anyone knows of a Solidity implementation of it, that’d be helpful.

I found your implementation quite interesting. Particularly the updateMany function. I’m wanting to implement it as well once I have time.

Iota has a good merkle tree structure.

It is so efficient they’re on the way of zero fee transactions.