Binary trie format

For clarity, the rule is this:
node_hash = hash(prefix_length%8 || prefix_bits || left_child_hash || right_child_hash || value)

I like this because it is pretty much the multiproof approach, and one can use a flat DB model. Then keys can be stored sequentially and the key length eventually doesn’t matter.

Practically, though, things get a bit more complicated: For example, when doing an insert, one needs to “guess” what the neighboring values are because bits aren’t stored along the path to get there.
It can be done, and I can think of a few hacks in the case of geth. I am not yet convinced it can be done by every client, though.

I am currently running tests on the “M2” approach to see how much the performance and size are impacted, so that I can then compare how much of a gain M3 is, I’m curious.

Your current implementation should work for M1, M2, and M3, you just hash differently. For example M3 omits prefix_bits when hashing. You are correct that M3 hashing include leaf_key at each leaf, but witnesses must have access to that anyway.

I’m not sure which merkleization rule is best.

Is there anything wrong with banning keys from being prefixes of other keys? I don’t think we need that possibility for a tree model; eg. I was thinking:

  • address + 0x00 for balance
  • address + 0x01 for nonce
  • address + 0x02 + chunk_id for code
  • address + 0x03 + storage_key for storage.
1 Like

jinx – I’m still thinking through the storage size implications, but it’s cool that we could get rid of encoding the account with rlp. You can see the ghost of the idea in how I skipped 0x01 here:

Ah, I meant if code merkelization gets internalized into the account trie before/during this proposal. Otherwise, it would be something like:

Full bytecode at key = hash(address) + 0x03

Well, I think that you don’t need to hash the storage slot, in this case. Anyway, I discourage the use of a nested storage slot after account key: storage is better to be considered as a separate concern, and this way, even allow to share same keys and values between storages of different accounts. Having the storage slot nested inside the account address key, implies, someway, that updating n storage slots after block execution, implies to create MORE intermediate nodes to update the trie, than in the current implementation, where the storage trie has less number of levels.

BTW I support that keys could be prefixes of other keys. IE in a storage trie, without hashing keys, we could save values under keys 0x01, 0x0101, 0x010101 and so on, removing the initial zeroes in many cases. The only remaining use for hashed keys for storage are the mappings in Solidity

Quikc update: I have run M1, M2 and M3 on mainnet (account trie only).

Type Time Size
M1 43min 13G
M2 42min 12G
M3 45min 13G

I’m waiting for one PR to be merged before doing the same thing with the storage tries.

Prototype code:

Sorry, I am a bit late to the party, it took me a long time to get some fragments of time to understand what is going on. I find something like M3 variant proposed @poemm close to ideal. I do not really understand the concern with “guessing” what the neighbouring values are. As the implementer, you are free to chose which data structure you need for this. For example, we would be implementing this without needing any trees and inserts (we use approach based on the “prefix groups” and streaming computation of the state root). But if you are using trees, then why can’t this tree give you the information about the neighbouring values if you need it?

During our last conversation, @AlexeyAkhunov pointed out that removing the concept of extension nodes was addressing all the issues that I had with M2/M3. The structure becomes much simpler:

M4

node_hash = hash(left_child_hash || right_child_hash)
leaf_hash = hash(0 || leaf_value)

Pointing out of the most radical changes from M2/M3:

  • There are no extension nodes in the merkelization rule
  • The key value is encoded through the tree structure
  • There is no more need for extracting key bits

Clients and witnesses are still welcome to use extension nodes and explicit keys to store their data.

For example, and in order to save space, a client can store the data using extension nodes. Single-child internal nodes are generated on the fly while merkelizing the stored extension node.

1 Like

Probably this is a dumb question, but since binarification requires all this recalculation, how about considering a switch from keccak256 to sha256 for the state trie?

The reasoning of Eth2 for using sha256 can be read here, including a claim about security.

I can think of two potential, but questionable, benefits:

  1. Apparently one of the main benefits of sha256 is interoperability with other chains – could this potentially enable other chains to more easily verify proofs of Eth1 blocks? I don’t think it should make a huge difference anyhow.

  2. Making Eth1 and Eth2 closer in terms of cryptographic primitives. Again, this is very slim, given there are more substantial differences between the two, starting with bls12-381 vs secp256k1.

Would there be any (other) benefit? Probably no significant ones, but maybe it is worth briefly entertaining the idea and discussing it.

1 Like

Would there be any (other) benefit? Probably no significant ones, but maybe it is worth briefly entertaining the idea and discussing it.

I can see three other potential benefits:

  1. sha256 is available in many more languages and platforms and a better coverage from security audits
  2. sha256 is faster than Keccak256
  3. keccak256 has a security flaw that sha256 doesn’t
  1. is a given, so I don’t think there is a big consensus risk to switching. Still, unless either 2) or 3) is true I would say that it’s better not to fix something that isn’t broken.

I don’t expect 2) to be obvious, and indeed the performance is exactly the same when I tried to use sha256 in M4.

I was thinking: instead of adding an extra byte in order to access the account’s properties, and thus ending with 33-byte and 65-byte keys, how about dropping the last two bits of the hashed address (as in “secure trie” address) and using them as an item selector?

account_balance_addr = hash(account_addr)[0..253] ++ 0b00
account_nonce_addr = hash(account_addr)[0..253] ++ 0b01
account_code_chunk_addr = hash(account_addr)[0..253] ++ 0b10 ++ chunk_addr
storage_slot_addr = hash(account_addr)[0..253] ++ 0b11 ++ hash(storage_key)

2 bits are lost here, which doesn’t matter for this use case: hashing is intended to make sure that values are evenly spread out in order to keep the trie balanced. So the right-most the bit is, the least it helps with balancing the trie. Furthermore, all these values will still be stored together in the DB so it won’t have any impact on the balance.

A quick check as of block ~8M doesn’t show any collision if the last 2 bits are chopped off. It is quite unlikely (a one in 2x10²⁸ chance) that there will be a collision.

It’s nice to be able to verify the state trie from within the EVM, which is currently cheaper to do with keccak256. Although maybe a shift to sha256 can come with the appropriate repricing measures to make it feasible to verify state roots on chain.

I suggest you use the same mapping as RSK uses, which has many benefits.

account_record = hash(account_addr)[0..79] ++ account_addr

This key->value mapping has the huge benefit that it stores the key inside the trie, and yet it randomizes the key position.
Also it’s faster, because you can generate 80 bits of Keccack hash digest faster than 256 bits.
Finally it’s requires only 30 bytes for keys, 2 bytes less than the 32 bytes proposed here.

First, two desired properties for the binary tree structure and merkleization rule. Then, claims that neither M1, M2, M3, nor M4 satisfy both of these desired properties.

Property: Access list-witness correspondence. For a given state tree, each access list uniquely determines a minimal witness. In particular, minimal witnesses are independent of block execution.

Property: Efficiently computable. The cost to merkleize a tree is less than 50n hashes in practical (i.e. average tree depth<50) cases, where n is the number of leaves to be merkleized.

Claim. Binary radix trees like M1, M2, and M3 don’t satisfy the access list-witness correspondence property.
Proof sketch: Consider an access list from a binary radix state tree. When a leaf in the access list is deleted from state after block execution, if the deleted leaf’s parent node has an extension (“prefix”, “edge label of length >1”) as its remaining child, then the extension is extended to preserve the radix property. Information about whether this remaining child is an extension is needed in a witness. Any leaves can be removed, and the access list does not specify which leaves are removed, so a witness must include this extra extension info for all hash nodes which may be extensions and may be extended due to leaf deletions. This extra data makes witnesses non-minimal. So M1, M2, and M3 don’t satisfy the access list-witness correspondence property.

Claim. Sparse trees like M4 don’t satisfy the efficiently computable property.
Proof sketch: In a sparse tree, a leaf requires 256 hashes to merkleize. And n leaves require ~256n hashes to merkleize. And ~256n>50n. So sparse trees are not efficiently computable.

Are there any proposals for a tree structure and merkleization rule which satisfy both of these desired properties?

I definitely agree that M4 does not have the efficiency property. In order to be efficient and secure, there will have to be some kind of cryptographic encoding of the tree structure within the tree itself.

I don’t quite understand why you are saying that M3, say, does not have the access list determines minimal witness.

In your definition of M3, you say that prefix_length is usually 0x00. This is a bit confusing to me. Are you defining prefix_length of an internal node A to be

  • The length of the prefix shared by all descendants of A which is not shared by all descendants of A’s parent, or
  • The length of the largest sequence of bits which is a prefix of all leaf keys that descend from A?

It seems like your definition of M3 uses the former, but I think the latter definition solves this minimal witness issue.

As a diagram, consider the following

 A
/ \
   \
    \
     B
    /  \
   C    D

In both versions, when leaf D is deleted, its parent B is deleted and its grandparent A replaces B with C as its (right side) direct child. But in the former version, C’s prefix_length must be updated to what was previously the sum of A’s prefix_length and C’s prefix_length, which requires not just the hash of C but the data of C itself.

In the second version, C still has the same set of leaf descendants, and A has descendants on the left, and descendants of C on the right, so the “largest prefix of all descendants” is the same. Since C is unaffected, we can simply slot its hash in where the hash of B used to be in A.

1 Like

The word prefix is an abuse of notation. My use of prefix in M3 is consistent with the posts above it which use prefix to describe an extension – so your first definition, I think.

Your definition of prefix_length gives nicer properties for node deletion. But your M3-like rule is bad for proofs-of-exclusion, where it breaks the access list-witness correspondence property. And your M3-like rule always breaks the “two 32-byte chunks -> one 32-byte chunk” property.

A proposal by Vitalik may satisfy both properties above, but we would have define the word “minimal” carefully.

A proposal for satisfying both properties above.

Aumasson’s Too Much Crypto is already making impacts in practical cryptography by motivating BLAKE3 to reduce the number of hashing rounds from ten to seven. Aumasson also concludes that SHA3 could use 10 rounds instead of 24 rounds without losing safety.

I propose that we consider using M4, but with Keccak reduced to 10 rounds. While this does not quite achieve the efficiently computable property, it may be fast enough for our purposes.

Furthermore, I propose that we evaluate reducing the number of rounds even further. Aumasson assumes a block chaining[1] mode of operation, but we only need one 136-byte block. Also, we use a tree mode of operation, with a sparse tree of depth 256 with likely \geq 10 rounds before we get to the first branch node. So I suspect that we can reduce the number of rounds even further. Conjecture: we can use Keccak with one round in our 256-sparse-tree mode of operation without losing safety.

More speedups can come from (i) choosing a faster hash function and (ii) parallelization.

[1] The term “block chaining” is from cipher block chaining, which I suspect is the origin of the word “blockchain”.

Interesting! It does seem like standard hash functions do not fare very well in the “64 -> 32 bytes” use case, which binary trees depend on so heavily. Certainly could be worthwhile to lead and propose a candidate.

Conjecture: we can use Keccak with one round in our 256-sparse-tree mode of operation without losing safety.

I can see how this preserves preimage resistance, but it would break collision resistance, no? Need to define “safety” carefully here, as Merkle trees do require collision resistance!

1 Like