Binary trie format

Probably this is a dumb question, but since binarification requires all this recalculation, how about considering a switch from keccak256 to sha256 for the state trie?

The reasoning of Eth2 for using sha256 can be read here, including a claim about security.

I can think of two potential, but questionable, benefits:

  1. Apparently one of the main benefits of sha256 is interoperability with other chains – could this potentially enable other chains to more easily verify proofs of Eth1 blocks? I don’t think it should make a huge difference anyhow.

  2. Making Eth1 and Eth2 closer in terms of cryptographic primitives. Again, this is very slim, given there are more substantial differences between the two, starting with bls12-381 vs secp256k1.

Would there be any (other) benefit? Probably no significant ones, but maybe it is worth briefly entertaining the idea and discussing it.

1 Like

Would there be any (other) benefit? Probably no significant ones, but maybe it is worth briefly entertaining the idea and discussing it.

I can see three other potential benefits:

  1. sha256 is available in many more languages and platforms and a better coverage from security audits
  2. sha256 is faster than Keccak256
  3. keccak256 has a security flaw that sha256 doesn’t
  1. is a given, so I don’t think there is a big consensus risk to switching. Still, unless either 2) or 3) is true I would say that it’s better not to fix something that isn’t broken.

I don’t expect 2) to be obvious, and indeed the performance is exactly the same when I tried to use sha256 in M4.

I was thinking: instead of adding an extra byte in order to access the account’s properties, and thus ending with 33-byte and 65-byte keys, how about dropping the last two bits of the hashed address (as in “secure trie” address) and using them as an item selector?

account_balance_addr = hash(account_addr)[0..253] ++ 0b00
account_nonce_addr = hash(account_addr)[0..253] ++ 0b01
account_code_chunk_addr = hash(account_addr)[0..253] ++ 0b10 ++ chunk_addr
storage_slot_addr = hash(account_addr)[0..253] ++ 0b11 ++ hash(storage_key)

2 bits are lost here, which doesn’t matter for this use case: hashing is intended to make sure that values are evenly spread out in order to keep the trie balanced. So the right-most the bit is, the least it helps with balancing the trie. Furthermore, all these values will still be stored together in the DB so it won’t have any impact on the balance.

A quick check as of block ~8M doesn’t show any collision if the last 2 bits are chopped off. It is quite unlikely (a one in 2x10²⁸ chance) that there will be a collision.

It’s nice to be able to verify the state trie from within the EVM, which is currently cheaper to do with keccak256. Although maybe a shift to sha256 can come with the appropriate repricing measures to make it feasible to verify state roots on chain.

I suggest you use the same mapping as RSK uses, which has many benefits.

account_record = hash(account_addr)[0..79] ++ account_addr

This key->value mapping has the huge benefit that it stores the key inside the trie, and yet it randomizes the key position.
Also it’s faster, because you can generate 80 bits of Keccack hash digest faster than 256 bits.
Finally it’s requires only 30 bytes for keys, 2 bytes less than the 32 bytes proposed here.

First, two desired properties for the binary tree structure and merkleization rule. Then, claims that neither M1, M2, M3, nor M4 satisfy both of these desired properties.

Property: Access list-witness correspondence. For a given state tree, each access list uniquely determines a minimal witness. In particular, minimal witnesses are independent of block execution.

Property: Efficiently computable. The cost to merkleize a tree is less than 50n hashes in practical (i.e. average tree depth<50) cases, where n is the number of leaves to be merkleized.

Claim. Binary radix trees like M1, M2, and M3 don’t satisfy the access list-witness correspondence property.
Proof sketch: Consider an access list from a binary radix state tree. When a leaf in the access list is deleted from state after block execution, if the deleted leaf’s parent node has an extension (“prefix”, “edge label of length >1”) as its remaining child, then the extension is extended to preserve the radix property. Information about whether this remaining child is an extension is needed in a witness. Any leaves can be removed, and the access list does not specify which leaves are removed, so a witness must include this extra extension info for all hash nodes which may be extensions and may be extended due to leaf deletions. This extra data makes witnesses non-minimal. So M1, M2, and M3 don’t satisfy the access list-witness correspondence property.

Claim. Sparse trees like M4 don’t satisfy the efficiently computable property.
Proof sketch: In a sparse tree, a leaf requires 256 hashes to merkleize. And n leaves require ~256n hashes to merkleize. And ~256n>50n. So sparse trees are not efficiently computable.

Are there any proposals for a tree structure and merkleization rule which satisfy both of these desired properties?

I definitely agree that M4 does not have the efficiency property. In order to be efficient and secure, there will have to be some kind of cryptographic encoding of the tree structure within the tree itself.

I don’t quite understand why you are saying that M3, say, does not have the access list determines minimal witness.

In your definition of M3, you say that prefix_length is usually 0x00. This is a bit confusing to me. Are you defining prefix_length of an internal node A to be

  • The length of the prefix shared by all descendants of A which is not shared by all descendants of A’s parent, or
  • The length of the largest sequence of bits which is a prefix of all leaf keys that descend from A?

It seems like your definition of M3 uses the former, but I think the latter definition solves this minimal witness issue.

As a diagram, consider the following

/ \
    /  \
   C    D

In both versions, when leaf D is deleted, its parent B is deleted and its grandparent A replaces B with C as its (right side) direct child. But in the former version, C’s prefix_length must be updated to what was previously the sum of A’s prefix_length and C’s prefix_length, which requires not just the hash of C but the data of C itself.

In the second version, C still has the same set of leaf descendants, and A has descendants on the left, and descendants of C on the right, so the “largest prefix of all descendants” is the same. Since C is unaffected, we can simply slot its hash in where the hash of B used to be in A.

1 Like

The word prefix is an abuse of notation. My use of prefix in M3 is consistent with the posts above it which use prefix to describe an extension – so your first definition, I think.

Your definition of prefix_length gives nicer properties for node deletion. But your M3-like rule is bad for proofs-of-exclusion, where it breaks the access list-witness correspondence property. And your M3-like rule always breaks the “two 32-byte chunks -> one 32-byte chunk” property.

A proposal by Vitalik may satisfy both properties above, but we would have define the word “minimal” carefully.

A proposal for satisfying both properties above.

Aumasson’s Too Much Crypto is already making impacts in practical cryptography by motivating BLAKE3 to reduce the number of hashing rounds from ten to seven. Aumasson also concludes that SHA3 could use 10 rounds instead of 24 rounds without losing safety.

I propose that we consider using M4, but with Keccak reduced to 10 rounds. While this does not quite achieve the efficiently computable property, it may be fast enough for our purposes.

Furthermore, I propose that we evaluate reducing the number of rounds even further. Aumasson assumes a block chaining[1] mode of operation, but we only need one 136-byte block. Also, we use a tree mode of operation, with a sparse tree of depth 256 with likely \geq 10 rounds before we get to the first branch node. So I suspect that we can reduce the number of rounds even further. Conjecture: we can use Keccak with one round in our 256-sparse-tree mode of operation without losing safety.

More speedups can come from (i) choosing a faster hash function and (ii) parallelization.

[1] The term “block chaining” is from cipher block chaining, which I suspect is the origin of the word “blockchain”.

Interesting! It does seem like standard hash functions do not fare very well in the “64 -> 32 bytes” use case, which binary trees depend on so heavily. Certainly could be worthwhile to lead and propose a candidate.

Conjecture: we can use Keccak with one round in our 256-sparse-tree mode of operation without losing safety.

I can see how this preserves preimage resistance, but it would break collision resistance, no? Need to define “safety” carefully here, as Merkle trees do require collision resistance!

1 Like

It is not clear to me why this property is desirable enough to justify the outright discarding of M3/M5. Could you please clarify?

How is it bad? And how bad is it? :wink:

A proof of exclusion need only contain the hash of the sibling nodes, which have to be present in the access list anyway in M4.

I have implemented it last week and it works just fine.

@holiman suggested two weeks ago to use BLAKE2, I agree it’s a good idea.

Parallelism, however, is going to greatly increase the complexity both in memory usage and in locking. I’m pretty sure this complexity is going to offset all the simplicity gains from M4, and then some.

EDIT: I’m really no fan of reducing the number of rounds: this means maintaining our own version of a crypto library. Even assuming Vitalik’s concern about collision resistance are addressed, does saving M4 justify this?

BLAKE2s seems to be the only popular hash function which is “64->32 bytes”. I am curious whether parallelized BLAKE2s with seven rounds would be fast enough for M4.

Agreed. Under statelessness, collision resistance breaks because a witness can provide arbitrary witness hashes, so an attacker can craft a witness hash which produces a collision in one round, and this bad hash could result in a bad post-state root, breaking the system. So the conjecture is false.

I guess you are saying that the M3-like rule proofs-of-exclusion do not satisfy the access-list-witness property because to prove exclusion, one needs to descend the tree to a leaf and see the key there to know the prefixes.

Another way to fix this, you could just use M2, but switch to the “Longest prefix of all descendant leaf keays” definition of prefix.

One would have to make clear exactly how the prefix was represented. For example, you could do

internal_node_hash = hash( hash(prefix_followed_by_0s || length_of_prefix) || hash( left_child_hash || right_child_hash ))
leaf_hash = hash( prefix_bits || hash( 0 || leaf_value ) )

Here, to prove exclusion of a key, one only has to descend the tree to the lowest node that would contain that key in its descendants if it were present, and verify that both children of that node do not contain the key.

I’m not sure that I entirely understand what you mean by the access-list-witness property for proofs of exclusion though.