A nearly-trivial-on-zero-inputs 32-bytes-long collision-resistant hash function

musalbas · June 20, 2019, 10:04am

The Libra whitepaper released yesterday also suggests a strategy for preventing the need to compute 256 hashes per Merkle proof in a sparse Merkle tree. Basically, subtrees consisting of exactly one leaf are replaced with a single node.

This does arguably does increase the complexity of the sparse Merkle tree interface; their implementation is 460 lines of code.

tawarien · June 20, 2019, 10:23am

In Compact Sparse Merkle Tree this approach is discussed as well

AlexAtNear · June 24, 2019, 7:35pm

The proposed approach has two suboptimalities:

It still computes the hash every 16 steps.
The hash technically only uses 30 bytes, not 32.

How about the following construction instead:

For each node we compute a tuple (H, A), and at the root (assuming root always has both children) A is discarded. << below means cyclical shift.

If l \ne 0 \land r \ne 0 then (H, A) = (sha256(H(l) << A(l), H(r) << A(r)), 1)
If l = r = 0 then (H, A) = (0, 1)
If l = 0 then (H, A) = (H(r), 2A(r))
If r = 0 then (H, A) = (H(l), 2A(l) + 1)

It only produces shareable hashes (A = 1 and thus can be discarded) for nodes that have both children, which is generally the case for root I assume, and we usually don’t share hashes that are not the root of the tree?

A is effectively a bitmask of left and right turns taken going up on the chain of single children. Such a mask naturally cannot exceed 256 bits (since we take at most 255 steps as we go up to the root, and there’s leading 1), so shifting by A is always possible.

A particular hash that we feed into sha256 in (1) is the last computed sha256 shifted by such bitmask. For collision to occur here two sha256's need to be cyclical shifts of each other, which appears unlikely.

vbuterin · June 25, 2019, 2:55am

That has the issue that your hash is effectively 64 bytes, so it doesn’t fit neatly into the format of a 32-byte merkle tree. That’s roughly the direction I was going for before I figured this newer approach out

I don’t think reducing hash security to 2**120 (or even 2**112 if we want to cut the hash overhead further from 16 to 8) is a particularly big deal. The main thing I dislike about this scheme is just the fact that it requires using a “different hash function” so it doesn’t cleanly with eg. other SSZ structures.

HarryR · July 7, 2019, 6:27am

Lets look at matrix operations as the basis of a ‘hash function’ then.

You want to verify that H(x,0) != H(0,x), or … do you want that to be true? Where the hash function retains the associative and commutative properties of some underlying field.

Using matrix operations we can construct a function which returns the same result regardless of which side the input is on, if we say Ax is a matrix-vector-product, where you have some uniformly randomly chosen matrix A within a field.

Then you have x and y, which represent x=(a,0) and y=(0,a), you want Ax = Ay… This means that both of the inputs may be independent, but the sum of the two inputs equal the output.

Lets simplify this first, and provide a clearer example:

a(0) + b(x) = a(x) + b(0)

Which implies a and b are identical, where the sum evaluated at two points is also equal for either. Where a(0) = b(0) = 0 rite, and a(x) = b(x) = x?

So… lets take this back to matrix operations, where we want to directly translate the above statements into a problem which is ‘hard to solve’, and hopefully cryptographically secure…

Lets say x and y are our two inputs, where x is left and y is right. We need a matrix where the vector product relation exists such that 0 || x = y || 0 = x || 0 = 0 || y.

So, A\cdot({\vec{x}+\vec{0}}) = A\cdot({\vec{0}+\vec{x}})

This preserves the commutative relationship, albeit over a matrix and a vector, which is kinda what we need… Even if the matrix A is a Cauchy matrix, MDS matrix, or pretty much any other kind of matrix we retain the commutative and associativity laws when operating over a finite field.

Take, for example, the following Sage code:

from random import randint
q = 1244142437461793964053
n = m = 123 # for example
G = GF(q)
A = MatrixSpace(G, n, m).random_element()
RandomBinaryVector = lambda length: vector(G, [randint(0,1) for _ in range(length)])
x, y = RandomBinaryVector(n), RandomBinaryVector(n)
zero = vector(G, [0] * n)
assert A*(zero + x) == A*x
assert A*(y + zero) == A*y

However, the problem is… how do we make this not only cryptographically secure, but efficient…

At the moment it is neither…

HarryR · July 7, 2019, 6:44am

With a hash function you would normally want to preserve the order of the inputs, so H(a,b) != H(b,a), but with a function which is cryptographically secure and preserves the commutative and/or associative relationships… you’re seeking a primitive operation which isn’t widely used… not only is it not widely used, but my personal analogy is that you might as well be juggling chainsaws when you’re dealing with polynomials and any hope of ‘security’ - especially when all commutative and associative relationships are preserved.

You can truncate a matrix down to a single field element, which gives you a transform of n\times m \to n \to 1 reduction, where the last step is the first element from the result, and the previous is a sum of some kind…

But… surely the whole ‘security argument’ about Poseidon and newer polynomial/matrix functions is that they disrupt the associative and commutative relationships, in the name of security.

vbuterin · July 7, 2019, 1:18pm

I definitely want H(a, b) != H(b, a), with it being computationally infeasible to find exceptions.

jjyr · February 18, 2020, 2:03pm

Another simple method to optimize SMT without security loss.

The hash merge function:

if L == 0, return R
if R == 0, return L
otherwise sha256(L, R)

This function has one problem, merge(L, 0) == merge(0, L), which can easily construct a conflicted merkle root from a different set of leaves.

To fix this, instead of use hash(value), we compute a leaf_hash = hash(key, value) as leaf’s value to merge with its sibling.

(we can store leaf_hash -> value in a map to indexed to the original value).

Proof:

Since the key is hashing into the leaf_hash, no matter what the value is, the leaves’ hashes are unique. Since all leaves have unique hash, nodes at each height will either merged by two different hashes or merged by a hash with a zero(for a non-zero parent, since the child hash is unique, the parent is surely unique at the height), until the root, if the tree is empty we got zero, or if the tree is not empty, the root must merge from two hashes or a hash with a zero, it’s still unique.

more details: https://github.com/jjyr/sparse-merkle-tree

adlerjohn · February 18, 2020, 7:49pm

This looks like it would greatly increase the size of the pre-computed table of hashes of empty subtrees. Not great for clients, a deal-breaker for on-chain proof verification.

vbuterin · February 29, 2020, 4:48pm

This looks like it would greatly increase the size of the pre-computed table of hashes of empty subtrees. Not great for clients, a deal-breaker for on-chain proof verification.

Which proposal are you referring to by “this”? AFAIK my own proposal above does not require any pre-computed tables.

adlerjohn · February 29, 2020, 4:52pm

Oh weird I may have not replied to the right comment. I meant the proposal immediately above my reply.

Gao-Chuan · February 2, 2021, 3:12am

The probability you go into step 3 is 1/2 + 1/(2**16). Why do you claim that this method only require about 1 + d/16 “real” hashes?

It is 1/2 + 1/(2**16) because there’s 1/2 possibility for r or l to be larger than 2**255 and there’s another 2**16 possibility for r or l to be less than 2**240.

vbuterin · February 2, 2021, 11:17am

there’s 1/2 possibility for r or l to be larger than 2**255

Why would this be the case? Node values are not randomly distributed in the case where you have a single-leaf branch going up, as the node values going up the branch follow the pattern of “start at some value under 2^{241}, add 1 bit per round for 15 rounds, then hash and mod back under 2^{241}”.

Gao-Chuan · February 2, 2021, 6:41pm

Uh! I understand your idea! It’s interesting! Thank you!