Using GKR inside a SNARK to reduce the cost of hash verification down to 3 constraints

Edit to the previous comment:

This attack does not work against the vector Pedersen commitment method (the second approach). If we could produce a collisions in (essentially) (x,y)\mapsto \sum_i^N (x_i [s^i G] + y_i [s^{i+N} G]) we could extract s as one of the roots of the polynomial \sum_i^N x_i T^i + y_i T^{i+N}.

That’s a neat idea, but we’d also need to check that the output is a distinct point (if the addends are nonzero) right? Might still be pretty efficient depending on how edge cases are handled.

I made poc for proof aggregation using GKR
refer to Recursion: Internal vrs External proofs - HackMD

Background

General

In this protocol, GKR used function f_{r_i}^{(i)}(b, c) = \widetilde{add}_i(r_i, b, c)(\widetilde{W}_{i+1}(b) + \widetilde{W}_{i+1}(c)) + \widetilde{mult}_i(r_i, b, c)(\widetilde{W}_{i+1}(b) \cdot \widetilde{W}_{i+1}(c)) .

\widetilde{W}(b, c) is function that returns node’s calculation result in the arithmetic circuit.

Re-construct arithmetic circuit

r1cs file has constraints that has A * B = C form inside. For reconstruction of the arithmetic circuit, make two nodes that represent A * B and C * (-1). And make addition node for these two nodes. That node’s value will be 0. Naively, put a new node from one constraint repeatedly.

Structure

For each step in recursion, aggregator makes a new circuit from circuit that can verify previous proof and original circuit. And then generate proof from that circom file (exactly, from .r1cs and .wtns file)

Initial round

Parse r1cs file and convert it to GKRCircuit. (Let’s call this C_0)
Get input from input.json.
Calculate all nodes value \widetilde{W}(b, c) by circuit and input.
Make proof \pi_0 from Input and GKRCircuit.

Iterative round (0 < i < n)

There are two circuit C_i and C_{v_{i - 1}}. C_{v_{i - 1}} is circuit that can verify C_{i - 1}.
C_{v_i} can be different form for each circuit C_i. If use same circuit for all C_i, then C_v will be same circuit.
To make aggregated proof for previous proof and current round’s proof, we need

  • input (for C_i)
  • proof \pi_{i - 1}

Make integrated circuit C'_i.

Use those inputs, make proof \pi_i.
To be specific, input and proof \pi_{i - 1}

Last round

Also there are two circuit C_n and C_{v_{n - 1}}. To send aggregated proof to on-chain verifier, we can use groth16 prover in snarkjs.
Integrated circuit C'_{n} can be proved with snarkjs also.
So final proof \pi_n is groth16 or plonk proof

why I made this
I implemented gkr verifier in circom. but it can verify only one proof and cannot make recursion.
gkr verifier in circom (verifying circuit) can verify gkr proof but it can generate only groth16 and plonk proof by snarkjs.(It is so-called proof composition) Internal gkr proof cannot be made from that circom circuit.
So there were two options.
(1) Implement new circuit representation and re-implement gkr verifier in that representation
But it was too hard to implement verifying circuit from arithmetic circuit.
(2) Implement conversion from circom circuit to gkr circuit
If this is done, gkr verifier in circom can be converted to gkr circuit. And also we can reuse implemented circuit in circom. So I chose (2).

Benchmark

I verified aggregated external groth16 proof that is generated from 3 input sets.
Intermediate proofs(gkr proof) are verified in the aggregated (newly generated) circuit.

First intermediate proof was 17KB. Second intermediate proof was 1.6MB.
It takes 55s to prove two input sets and generate proof (release build) + groth16 proving time is 2s. (Macbook pro M1 8 cores)
(takes 16s in Ryzen 5900 12 cores)

But tried to aggregate one more, it took a very long time (couldn’t get result) because intermediate proof size was too big.

Much more optimizations should be applied to.

Further works

I think it is enough for poc, and other optimizations will be large works (applying Doubly Efficient Interactive Proofs for General Arithmetic Circuits with Linear Prover Time, circom circuit optimization, optimization during conversion from r1cs to gkr circuit, applying FFT). + It has security issues like initial randomness and others.
If you have any ideas for optimization or any others, please let me know. It would be great help.

2 Likes

It is great to see interest in those kind of techniques.

We had a face quite a bit of challenges when implementing our version of GKR. These include mostly implementation details to optimize the memory access, the parallelization, memory allocation and also some work on the GKR function. But they are mostly specific to the fact that we optimized for our MiMC use-case. Your use-case seems to be very different.

I have a few question about your POC. As I understand, the idea you are working on consists in building a Circom circuit that verifies a GKR proof alongside another sub-circuit to aggregate, and you prove that using GKR. By plugging several of these together in a chain as you did in the diagram, you obtain an aggregator.

Now, what is the topology of your GKR circuit exactly? From your description, it seems like you build the GKR circuit from an R1CS and the obtained GKR circuit takes all the assignment of C_i in the input layer. Thereafter, you employ GKR to evaluate the constraints of C_i (possibly using numerous layers because of the additions).

If so, my take is that the size of C_{v_i} will always be at least larger than the size of the previous circuit C_i. That is because the GKR has O(n) verifier time in the size of the statement (with constant larger than \geq 1). This is the reason the size of the intermediate proofs grow over time.

To make this an effective aggregation system, I recommend you to introduce a mechanism to ensure that |C_{v_i}| < |C_i|, or at least asymptotically. Otherwise, it cannot be beneficial. This mechanism could be the usage of a polynomial commitment with multilinear evaluation capability to “shrink” the initial and final evaluations for instance (such as https://eprint.iacr.org/2011/587.pdf, from the top of my head). There may be other possibilities or strategies as well.

2 Likes

For conversion from r1cs to the arithmetic circuit, I put them together one by one. Each constraint will be one complete arithmetic circuit. But I enumerated it and padded it with the addition of 0 (For making one big layered arithmetic circuit). A complete circuit built from each constraint has no connectivity to the rest of the circuit. It’s possible that they’ve created costs that aren’t necessary.

I totally agree with you. Thank you for your suggestion.

2 Likes