Binary trie format

poemm · July 4, 2020, 11:03pm

It seems that everyone agrees to use binary radix trees. We just haven’t decided on a merkleization rule.

Here is @gballet 's merkleization rule (loosely), ignoring RLP which I oppose keeping.

M1.
internal_node_hash = hash( prefix_length || prefix_length%8 || prefix_bits || left_child_hash || right_child_hash )
leaf_hash = hash( prefix_length || prefix_length%8 || prefix_bits || leaf_value )

And here is @vbuterin 's merkleization rule (loosely).

M2.
if prefix is empty:
internal_node_hash = hash( left_child_hash || right_child_hash )
leaf_hash = hash( 0 || leaf_value )
if prefix is non-empty:
internal_node_hash = hash( prefix_bits || hash( left_child_hash || right_child_hash ))
leaf_hash = hash( prefix_bits || hash( 0 || leaf_value ) )

(For M2, distinguishing between empty prefixes, nonempty prefixes, and hash outputs remains a problem.)

I propose that we explore dropping prefix bits from merkleization. They are awkward, require bit-twiddling, and add witness overhead. Here is a third proposed merkleization rule.

M3.
internal_node_hash = hash( left_child_hash || right_child_hash || prefix_length)
leaf_hash = hash( leaf_key || hash( leaf_value ) || prefix_length )

Note that we dropped prefix_bits, but added leaf_key which is cryptographically related to prefix_bits. And because prefix_length is usually 0x00, this merkleization rule usually meets @vbuterin 's goal of “two 32-byte chunks -> one 32-byte chunk” format. We also hash leaf_value because it is greater than 32-bytes for account data. Rule M3 makes proofs-of-exclusion include a neighboring leaf, but this overhead may be smaller than the overhead of including prefix_bits in the witness, not to speak of the awkwardness of including prefix_bits.

I am not sure which merkleization rule is best for statelessness.