Hash-based VDFs, MIMC and STARKs


#1

Edit: turns out this idea is already out there, on page 20 of https://eprint.iacr.org/2018/601.pdf. Congrats to Dan Boneh, Benedict Bunz, Joseph Bonnneau and Ben Fisch.

MIMC is a hash function “core” that Zcash is evaluating as a possibility to switch to in the mid-term future pending much more security analysis: https://github.com/zcash/zcash/issues/2233

The goal of MIMC is to be maximally SNARK and STARK-friendly, and it does so by building its core primitive to be made up entirely out of simple operations within some finite field. Specifically, MIMC’s permutation function looks as follows:

                         k_1                     k_2
                          |                       |
                          v                       v
input -----(x->x³)------(xor)------(x->x³)------(xor)---- ..... ----> output

The core is made up of a few hundred rounds of this, and each round only adds a very small number of constraints or arithmetic steps into the SNARK/STARK process because it literally is just a single xor/addition and cubing. In fact, one should be able to literally take the techniques in my STARK tutorial, as written (see the Fibonacci example), make some minor modifications, and make a STARK out of it fairly easily.

Furthermore, one might notice that this primitive can be calculated in the reverse direction, due to Fermat’s Little Theorem or its prime-power-field analogue, but this would take up to a few hundred times longer. However, making a STARK out of the reversed output is still just as easy (you just calculate the STARK of the reverse computation trace). It seems plausible that with some engineering (and perhaps GPU parallelism), the time taken to compute the STARK could be lower than the time taken to run the computation in reverse order, and a possible holy grail would be doing the two in parallel.

What this gives us is a verifiable delay function (ie. a function f(x)=y where y takes some time to calculate that cannot be parallelized away, but (x, y) can be easily verified to be an input/output pair of f) that happens to exactly align with a primitive that is being evaluated for use in hashes, and which hence fulfills the goal of a VDF that only assumes properties of a single cryptographic primitive.


Special thanks to researchers from Stanford, and Zooko Wilcox from Zcash, for pointers and conversations that led to this post.


#2

Thank you for your super accessible STARKs, Part 3: Into the Weeds tutorial and 100% working python implementation.

I did a port of your mimc_stark to Go and posted it here
https://github.com/wolkdb/deepblockchains/tree/master/stark
for more of us to really get in the weeds and incorporate STARKs into deployable blockchain designs.


#3

Great job!

Also good to see that the STARK overhead is ~300x in another language implementation :slight_smile:

I noticed that the verifier is not significantly faster in go vs python. I guess that’s because STARK verification is mostly just hash calculation?


#4

Thank you – there are tons of parallelizations of verify branch part that are easy to add in go, will explore that this month and report back –


#5

You can parallelize the prover very heavily as well; FFTs, Merkle tree generation and FRI are all highly parallelizable.


#6

I streamlined the Go STARK implementation with goroutines, channels + big.Int optimizations and on an ordinary laptop (2014 MacBook Pro 2.2 GHz Intel Core i7 with 8 “logical” cores) got to this basic level of temporal performance:

NUM_CORES STARK Proof Generation STARK Verification
1 3.12s 46.11ms
2 2.01s 24.70ms
4 1.49s 15.64ms
8 1.42s 16.58ms
16 1.49s 18.30ms
32 1.48s 20.18ms
ethereum/research python 3.78s 52.10ms

#7

Wow! So python actually is not all that suboptimal for this stuff.

Can you also give the numbers for MIMC calculation, and reverse MIMC calculation (replace x := x**3+p with x:= (x-p)**((2*p-1)//3) mod p)?

Also, any idea why adding more cores doesn’t improve things beyond 1.4s? Is there some part of the computation you’re not parallelizing? Merkle tree calculation, FFTs, and FRI computation should all be quite parallelizable.


#8

Yes - will report back on this shortly.

The program flow has to actually use the cores available to it to get improvements, with goroutines/channels/waitgroups/… structured in a pipeline matching the cores nicely – so if you have a machine with 64 cores with only 3 go routines running computation (1), (2) and (3) in parallel, you aren’t going to do much better than with just 3 cores. On the other hand, if you have a machine with 8 cores, having 128 goroutines running through 128 pieces of a for loop will have worse performance than if you just have 8 goroutines. To minimize the time of STARK proof generation, its essential to take the computational dependency graph, break it into stages, and focus on the key bottlenecks of the longest route. This is what the pipelined version of your STARK looks like in my Go re-implementation, with sample times attached:

The longest route is currently (2a+b+c) => (4a+b) =>(7) => (8) => (9) => (11) => (12) (shown in red). Inside the longest route are the time consuming FFTs + inverse FFTs (in step (2b)+(2c)], multi_interp_4 (in step (11)) – I did make a first pass at parallelizing these internal operations, but I’m certain we can do a lot more.

Here is the verification part, much easier to reason about:

Most of the time, breaking up a for loop into N goroutines [as in (v4)] is totally worth it (for some N, but less worth it for 2N), but sometimes its not (because the overhead of the goroutine is just too high) – so it makes for a lot of empirical engineering. With gradient descent in the parameter space, we can have confidence there is another 25-40% speedup to be squeezed out of the proof timing. We’ll try out beefy machines with 24-32 cores for comparison this month and report back.


#9

Why can’t the FFT be massively parallelized? There’s no reason for parallelization to be between functions and not within one function.


#10

No debate that FFTs should be parallelized. Here are the basic stats on Forward + Reverse MiMC in C, Go, Python, JS (code here):

Implementation Forward MiMC Reverse MiMC
C - mimc.c 1.85ms 122.7ms
Go - mimc_test.go 6.11ms 278.8ms
Python - mimc.py 13.40ms 1,291.4ms
Node.js - mimc.js 108.29ms 13,910.1ms

[Stats taken from an 2015 iMac 3.2 GHz Intel Core i5]


#11

Is the STARK proof generation time given above (1.5-3s) for the same parameters as this? If so, that seems very weird. Why is go <2x faster than python for STARK generation but 4x faster for MIMC computation?


#12

The Forward MiMC timings [1.85ms in C, 6.11ms in Go, 13.40ms in Python] are just the computations of mimc(3, 2**LOGSTEPS, constants) you have in your proof generation [just step (2a) in the diagram] and your verification [step (v1) in the diagram], with the same parameters (same 8192 steps, same 64 constants, etc.) … just in different languages, with their various bigint libraries. More cores doesn’t speed up Forward MiMC or Reverse MiMC, of course.

If you replace step (2a)'s Forward MiMC in Go here with Reverse MiMC then the STARK proof generation will take ~272.69ms (278.8ms-6.11ms) longer [1.42s will become 1.69s]., because (2a) is on the longest path.

If you replace step (2a)'s Forward MiMC in Python here with Reverse MiMC then the STARK proof generation will take ~1278ms (1,291.4ms-13.40ms) longer [that is, 3.78s becomes 5.05s].

Does that make sense?

For MiMC computation, the idiosyncrasies bigInt implementations (with memory setup/…) in each language obviously dominate, because that’s pretty much the only thing going on. For STARK generation, the computations of the longest path use bigInt everywhere in the pipeline, but because you can parallelize almost all the steps in the pipeline, there is no heuristic like “its X times faster when you use L language” – what the level of improvements ends up at will depend on the structure of the pipeline and the amount of resources allocated to it.

Succinct prototyping in Python (because it handles bigInts and vectors so well) and putting things into production in a C library with a Go/Rust/… wrappers is the sane thing to do for VDFs + STARKs.


#13

You probably don’t want to use Reverse MiMC; it’s very slow. Instead use MiMC in Feistel mode, so that decryption is as fast as encryption.


#14

The goal here is to be slow in the forward direction as that’s the direction the VDF is computed, and fast in the backward direction for proving/verification. So symmetric execution time is not desired.