# An optimized compacted sparse merkle tree without pre-calculated hashes

A week ago, I posted an SMT construction on A nearly-trivial-on-zero-inputs 32-bytes-long collision-resistant hash function

The advantage of this construction encourage me to post it again:

• this construction does not lose any security compare to others
• no pre-calculated hash set, which is perfect for contracts.

After a week of work, I optimized the storage, and now it takes `2n + log(n)` size to store an SMT; `log(n)` times hashing and inserting for key-value updating.

The code is implemented in Rust https://github.com/jjyr/sparse-merkle-tree

Pros:

• no pre-calculated hash set

• support both exist proof and non-exists proof

• efficient storage and key updating

This is just another construction of SMT, no magic improvement compare to other constructions, see this comment:

Trick 1: Optimized hash function for 0 value.

We define the hash merge function as below:

1. if L == 0, return R

2. if R == 0, return L

3. otherwise sha256(L, R)

Follow this rule; an empty SMT is constructed by zeros nodes; it brings an advantage: we do not need a pre-calculated hash set.

This function has one issue, `merge(L, 0) == merge(0, L)` , which can easily construct a conflicted merkle root from a different set of leaves.

To fix this, instead of use `hash(value)` , we compute a `leaf_hash = hash(key, value)` as leafâ€™s value to merge with its sibling.

(we store `leaf_hash -> value` in a map to indexed to the original value).

Security proof:

• Since the key is hashing into the leaf_hash, no matter what the `value` is, the leavesâ€™ hashes are unique.

• Since all leaves have a unique hash, nodes at each height will either merged by two different hashes or merged by a hash with a zero(for a non-zero parent, since the child hash is unique, the parent is surely unique at the height).

• For the root, if the tree is empty, we got zero, or if the tree is not empty, the root must merge from two hashes or a hash with a zero, itâ€™s still unique.

So from construction, this SMT is security because we canâ€™t get a collision root hash.

Trick 2: A node structure to compress the tree.

The classical node structure is `Node {left, right}`, it works fine if we insert every node from root of the tree to bottom, but mostly node is duplicated, we want our tree only store unique nodes.

The idea is simple: for a one leaf SMT, we only store the leaf itself, when inserting new leaves, we magically extract location info from internal tree node, and decide how to merge nodes.

The key to this problem is the â€śkeyâ€ť. Each key in the SMT can be seen as a path from the root of the tree to leaves, so the idea is we store the key in node, and when we need to merge two nodes, we extract the location info from the keys:

We can calculate the common height of two keys, which is exactly the height the two nodes be merged.

``````
fn common_height(key1, key2) {

for i in 255..0 {

if key1.get_bit(i) != key2.get_bit(i) {

// common height

return i;

}

}

return 0;

}

``````

The node structure `BranchNode { fork_height, key, node, sibling}`, can use one unique node to expression all duplicated nodes on the â€śkeyâ€ť path between height `[node.fork_height, 255]`.

• `fork_height` is the height when the node created; for a leaf, it is 0.

• `key` is the key of a childâ€™s key when constructing the node. for a leaf node, the key is leafâ€™s key.

• `node` and `sibling` is like the `left` and `right` in the classical structure; the only difference is their position is not fixed.

To get a left child of a node in height `H`:

1. check `H`-th bit of key

2. if it is `1` means the `node` is on the right at height `H`, so `sibling` is the left child

3. if the `0` means the `node` is on the left

``````
// get children at height

// return value is (left, right)

fn children(branch_node, height) {

let is_rhs = branch_node.key.get_bit(height);

if is_rhs {

return (branch_node.sibling, branch_node.node)

} else {

return (branch_node.node, branch_node.sibling)

}

}

``````

To get a key from SMT, we walk down the tree from root to bottom:

``````
fn get(key) {

let node = root;

// path order by height

let path = BTreeMap::new();

loop {

let branch_node = match map.get(node) {

Some(b) => b,

None => break,

}

// common height may be lower than node.fork_height

let height = max(common_height(key, node.key), node.fork_height);

if height > node.fork_height {

// node is sibling, end search

path.push(heignt, node);

break;

}

// node is parent

// extract children position from branch

let (left, right) = children(branch_node, height);

// extract key positon

let is_right = key.get_bit(height);

if is_right {

path.push(height, left);

node = right;

} else {

path.push(height, right);

node = left;

}

}

return self.leaves[node];

}

``````

There is nothing special for other operations: updating, merkle proof. It just works as expected.

Your scheme doesnâ€™t seem to offer meaningful changes over the one discussed here.

1. This construction saves 256 * 32 = 8192 bytes pre-calculated merkle branches, it is not trivial for a contract.
2. The geting and updating operation is log(n) vs constant 256.

At least it is not completely meaningless IMO.

1. The scheme I linked to also doesnâ€™t need to do non-trivial hash pre-computation (see this comment specifically).
2. The scheme I linked to only needs O(\lg n) disk reads (but still requires 256 hashes). In this comment further down however, an optimization is presented that reduces the number of hashes to O(\lg n) (hint: itâ€™s an isomorphism of the optimization you propose here, which youâ€™d know if you had bothered to read the prior art).
1 Like

Apologize for my ignorance!