Optimizing sparse Merkle trees

paouvrard · October 21, 2018, 8:46am

Hi everyone,
I’d like to share 2 implementations that I hope you might find useful as they seem similar to ideas expressed above : a standard SMT and a modified SMT of height log(N)

https://github.com/aergoio/SMT
This is a standard binary SMT implementation of height 256 with the following characteristics :

Node batching (1 db transaction loads 30 nodes : ie a subtree of height 4 with 16 leaves). Batch nodes are loaded into an array and can be concurrently updated.
Reduced data storage (if a subtree contains only 1 key then the branch is not stored and the root points to the KV pair directly
Parallel update of multiple sorted keys

https://github.com/aergoio/aergo/tree/master/pkg/trie
This implementation modifies the SMT in the following way : the value of a key is not stored at height 0 but instead at the highest subtree containing only that key. The leaf node of keys is [key, value, height].
The benefit here is that the tree height is log(N) and updating a key requires log(N) hashing operations (256 hashing operations becomes too slow if the tree is used to update thousands of keys/sec).

It also has node batching and parallel updates like the standard SMT implementation.

The size of a proof of inclusion and non-inclusion is log(N).
A proof of non-inclusion can be of 2 types :

A proof that another key’s leaf node is already on the path of the non-included key
Or a proof that an empty subtree is on the path of the non-included key

Optimization to come : using H(0,0) = 0