Using GKR inside a SNARK to reduce the cost of hash verification down to 3 constraints

Using GKR inside a SNARK to reduce the cost of hash verification down to 3 constraints (1/2)

Alexandre Belling, Olivier Bégassat

Link to HackMD document with proper math equation rendering

Large arithmetic circuits C (e.g. for rollup root hash updates) have their costs mostly driven by the following primitives:

  • Hashing (e.g. Merkle proofs, Fiat-Shamir needs)
  • Binary operations (e.g. RSA, rangeproofs)
  • Elliptic curve group operations (e.g. signature verification)
  • Pairings (e.g. BLS, recursive SNARKs)
  • RSA group operations (e.g. RSA accumulators)

Recent efforts achieved important speed-ups for some of those primitives. Bünz et al. discovered an inner-product pairing argument which provide a way to check a very large number of pairing equation in logarithmic time. It can be used as an intermediate argument to run BLS signature and pairing-based arguments (PLONK, Groth16, Marlin) verifiers for very cheap although it requires a curve enabling recursion and an updatable trusted setup. Ozdemir et al, and Plookup made breakthrough progresses to make RSA operation practical inside an arithmetic circuit, and more generally arbitrary-size big-integer arithmetic.

In order to make cryptographic accumulators practical, recents works suggests the use of RSA accumulators.

We propose an approach to speed-up proving for hashes based on the line of work of GKR/Giraffe/Hyrax/Libra. It yields an asymptotic speed-up of \times700/3 (more than \times200) for proving hashes: 3 constraints for hashing 2 elements with gMIMC compared to ~700 without.

We start by describing the classic sumcheck and GKR protocols. Readers familiar with these may skip the “background section” altogether. We then introduce some modifications to tailor it for hash functions. Our current main use-case is making merkle-proof verifications cheap in rollups. Similar constructions could be achieved for other use-cases.


This section gives a high-level description of the GKR protocol: a multi-round interactive protocol whereby a prover can convince a verifier of a computation y=C(x). GKR builds on top of the sumcheck protocol so we describe that protocol as well.

GKR makes no cryptographic assumptions. Moreover, it has the interesting feature that it’s prover-time is several orders of magnitude more efficient than pairing-based SNARK. More precisely, we describe Gir++ (cf. Hyrax paper), a variant on GKR specialized for data-parallel computation, e.g. for circuits computing multiple parallel executions of a given subcircuit.

Arithmetic circuits

Consider first a small base circuit C_0 : a layered arithmetic circuit where

  • each gate is either an addition gate or a multiplication gate,
  • each gate has 2 inputs (i.e. fan-in 2 and unrestricted fan-out)
  • the outputs of layer i become the inputs of layer i-1.

The geometry of such a circuit is described by its width G, its depth d and the wiring (which layer i-1 outputs get funneled into which layer i gates). Thus the base circuit takes inputs x\in\Bbb{F}^G (at layer 0) and produces outputs y\in\Bbb{F}^G (at layer d): y=C_0(x). The base circuit C_0 represents a computation for which one wishes to provide batch proofs.

In what follows we will concern ourselves with arithmetic circuits C comprised of the side-by-side juxtaposition of N identical copies of the base circuit C_0. Thus C represents N parallel executions of the base circuit C_0.The circuit thus takes a vector x \in \mathbb{F}^{N \times G} as input and produces an output y \in \mathbb{F}^{N \times G}.

We’ll make some simplifying (but not strictly necessary) assumptions:

  • the width of the base circuit is the same at every layer,
  • in particular, inputs (x) and outputs (y) have the same size,
  • both G = 2^{b_G} and N = 2^{b_N} are powers of 2.


The Gir++ protocol provides and argument for the relation: R_L = \Big\{(x, y) \in (\mathbb{F}^{N \times G})^2~\Big|~ y = C(x)\Big\}

Describing the protocol requires one to define a certain number of functions. These functions capture either the geometry of the circuit or the values flowing through it. To make matters precise, we consistently use the following notations for input variables

  • q'\in\{0, 1\}^{b_N} is the index of one of the N identical copies of the base circuit C_0 within C,
  • q\in \{0, 1\}^{b_G} is a “horizontal index” within the base circuit,
  • i\in [d]=\{0,1,\dots,d\} is a layer (“vertical index”) within the base circuit.

This makes most sense if one thinks of the subcircuit as a “matrix of arithmetic gates”. Thus a pair (q,i) describes the position of a particular arithmetic gate in the base circuit, and a triple (q',q,i) describes the arithmetic gate at position (q,i) in the q'-th copy of the base circuit C_0 within C.

  • h'\in\{0, 1\}^{b_N} represents the index of a copy of the base circuit within C
  • h_L,h_R\in\{0, 1\}^{b_G} represent two “horizontal indices” in the base circuit.

With that notation out of the way, we describe various functions. The first lot describes the circuit geometry. Recall that all gates are fan-in 2:

  • add_i(q, h_L, h_R) = 1 iff the gate q at layer i takes both h_L and h_R as inputs from layer i - 1 and is an addition gate. Otherwise add_i(q, h_L, h_R) = 0.
  • mul_i(q, h_L, h_R) = 1 if the gate q at layer i takes both h_L and h_R as inputs from layer i - 1 and is an multiplication gate. Otherwise mult_i(q, h_L, h_R) = 0.
  • eq(a, b) returns a == b

Next we introduce functions that captures the values that flow through the circuit:

  • V_i(q', q) is the output value of the gate at position (q,i) in the q'-th copy of the base circuit C_0 within C.

The \Bbb{F}-valued function below captures transition from one layer to the next:

  • P_{q', q, i}(h', h_L, h_R) = eq(h',q') \cdot \left[ \begin{array}{r} add_i(q, h_L, h_R)\cdot(V_{i-1}(q', h_L) + V_{i-1}(q', h_R))\\+\;mul_i(q, h_L, h_R)\cdot V_{i-1}(q', h_L)\cdot V_{i-1}(q', h_R)\end{array}\right].

It holds that V_i(q',q) = \sum_{h' \in \{0, 1\}^{b_N}}\sum_{h_L< h_R \in \{0, 1\}^{b_G}}P_{q', q, i}(h', h_L, h_R) This is not complicated to see: the RHS of this equation contains at most one nonzero term.

Key observation. V_i is expressed as an exponentially large sum of values of the function of V_{i-1}. The sumcheck protocol described in the next section provides polynomial time verifiable proofs for assertions of the form

\displaystyle\qquad\qquad\texttt{(some claimed value)} =\sum_{x\in\left\{\begin{array}{c}\text{exponentially}\\ \text{large domain}\end{array}\right\}}f(x)

for domains of a “specific shape” and “low degree” functions f:\Bbb{F}^n\to\Bbb{F}. This gives rise to a proof system for R_L.

The sumcheck protocol: setting up some notation

This section describes the sumcheck protocol. It is a central piece of how Gir++ works.

Multilinear polynomials are multivariate polynomials P(X_1,\dots,X_n)\in\Bbb{F}[X_1,\dots,X_n] that are of degree at most 1 in each variable. e.g.: P(U,V,W,X,Y,Z)=3 + XY + Z -12 UVXZ +7 YUZ (it has degree 1 relative to U,V,X,Y,Z and degree 0 relative to W). In general, a multilinear polynomial is a multivariate polynomial of the form P(X_1, X_2\dots, X_{d}) = \sum_{i_1 \in \{0, 1\}}\sum_{i_2 \in \{0, 1\}}\cdots\sum_{i_{d} \in \{0, 1\}} a_{i_1, i_2, \dots, i_{d}} \prod_{k=1}^{d} X_k^{i_k} with field coefficients a_{i_1, i_2, \dots, i_{d}}\in\Bbb{F}.

Interpolation. Let f: \{0,1\}^d \rightarrow \mathbb{F} be any function. There is a unique multilinear polynomial P_f=P_f(X_1,\dots,X_d) that interpolates f on \{0_\mathbb{F}, 1_\mathbb{F}\}^d. Call it the arithmetization of f. For instance P(X,Y,Z)=X+Y+Z-YZ-ZX-XY+XYZ is the arithmetization of f(x,y,z)=x\vee y\vee z (the inclusive OR).

In the following we will sometimes use the same notation for a boolean function f and its arithmetization P_f. This is very convenient as it allows us to evaluate a boolean function \{0,1\}^d\to\Bbb{F} on the much larger domain \Bbb{F}^d. This is key to the sumcheck protocol described below.

Partial sum polynomials. If P\in\Bbb{F}[X_1,\dots,X_d] is a multivariate polynomial, we set, for any i=1,\dots,d, P_i \in\Bbb{F}[X_1,\dots,X_i] defined by P_i(X_1, ..., X_i) = \sum_{b_{i+1}, \dots, b_d\in\{0,1\}} P(X_1,\dots,X_i,b_{i+1},\dots,b_d) the partial sums of P on the hypercube \{0, 1\}^{d-i}. We have P_d = P. Note that each P_i is of degree \leq \mu in each variable.

The sumcheck protocol: what it does

The (barebones) sumcheck protocol allows a prover to produce polynomial time checkable probabilistic proofs of the language R_L = \Big\{(f,a) ~\Big|~ f:\{0,1\}^d \rightarrow \mathbb{F} \text{ and }a \in \mathbb{F} \text{ satisfy } \sum_{x \in \{0, 1\}^d} f(x) = a\Big\} given that the verifier can compute the multilinear extension of f (directly or by oracle access). Verifier time is O(d + t) where t is the evaluation cost of P. The protocol transfers the cost of evaluating the P=P_f on the whole (exponential size) boolean domain to evaluating it at a single random point \in\Bbb{F}^d.

Note: the sumcheck protocol works more broadly for any previously agreed upon, uniformly “low-degree in all variables” multivariate polynomial P which interpolates f on its domain. The only important requirement is that P be low degree in each variable, say \deg_{X_i}(P)\leq \mu for all i=1,\dots,d and for a “small” constant \mu. In the case of multilinear interpolation of f one has \mu=1, but it can be useful (and more natural) to allow larger \mu depending on the context (e.g. GKR).

We actually need a variant of this protocol for the relation R_L = \left\{(f_1, f_2, f_3,a) ~\left|~ \begin{array}{l}f_1,f_2,f_3:\{0,1\}^d \rightarrow \mathbb{F} \text{, } a \in \Bbb{F}\\\text{ satisfy } \displaystyle\sum_{x \in \{0, 1\}^d} f_1(x)f_2(x)f_3(x) = a\end{array}\right.\right\}. This changes things slightly as we will need to work with multivariate polynomials of degree \leq 3 in each variable. To keep things uniform, we suppose we are summing the values of a single function f (or a low-degree interpolation P thereof) over the hypercube \{0,1\}^d which is of degree \leq \mu in all d variables. e.g. for us f=f_1f_2f_3, P=P_{f_1}P_{f_2}P_{f_3} and \mu=3. The sumcheck protocol provides polynomial time checkable probabilistic proofs with runtime O(\mu d + t).

The sumcheck protocol: the protocol

The sumcheck protocol is a multi-round, interactive protocol. It consists of d rounds doing ostensibly the same thing (from the verifier’s point of view) and a special final round.

At the beginning, the verifier sets a_0 := a, where a is the claimed value of the sum of the values of f (i.e. of a uniformly low-degree extension P of f) over the hypercube \{0,1\}^d.

Round 1

  • The prover sends P_1(X_1) (univariate, of degree at most \mu), i.e. a list of \mu+1 coefficients in \Bbb{F}.
  • The verifier checks that P_1(0) + P_1(1) = a_0.
  • The verifier randomly samples \eta_1\in\Bbb{F}, sends it to the prover and computes a_1 = P_1(\eta_1).

The next rounds follow the same structure.

Round i\geq 2

  • The prover sends P_i(\eta_1,\dots,\eta_{i-1},X_i) (univariate, of degree at most \mu), i.e. a list of \mu+1 coefficients in \Bbb{F}.
  • The verifier checks that P_i(0) + P_i(1) = a_{i-1}.
  • The verifier randomly samples \eta_i\in\Bbb{F}, sends it to the prover and computes a_i = P_i(\eta_i).

Giraffe describes a refinement to compute the evaluations of p_i in such a way the prover time remains linear in the number of gates: link to the paper. Libra also proposes an approach in the general case (meaning even if we don’t have data-parallel computation)

Special final round

The verifier checks (either direct computation or through oracle access) that P_{d}(\eta_{d})\overset?=P(\eta_1, \dots, \eta_{d}). In some applications it is feasible for the verifier to evaluate P directly. In most cases this is too expensive.

The sumcheck protocol reduces a claim about an exponential size sum of values of P to a claim on a single evaluation of P at a random point.

The GKR protocol

The GKR protocol produces polynomial time verifiable proofs for the execution of a layered arithmetic circuit C. It does this by implementing a sequence of sumcheck protocols. Each iteration of the sumcheck protocol inside GKR establishes “consistency” between two successive layers of the computation (starting with the output layer, layer 0, and working backwards to the input layer, layer d). Every run of the sumcheck protocol a priori requires the verifier to perform a costly polynomial interpolation (the special final round). GKR bysteps this completely: the prover-provides the (supposed) value of that interpolation, and another instance of the sumcheck protocol is invoked to prove the correctness of this value. In GKR, the only time the final round of the sumcheck protocol is performed by the verifier is at the final layer (layer d: input layer).

The key to applying the sumcheck protocol in the context of the arithmetization described earlier for a layered, parallel circuit C is the relation linking the maps V_i, P_{q', q, i} and V_{i-1} (here these maps are identified with the appropriate low-degree extensions). Establishing consistency between two successive layers of the GKR circuit C takes up a complete sumcheck protocol run. Thus GKR proofs for a depth d circuit are made up of d successive sumcheck protocol runs.

The main computational cost for the prover is computing intermediate polynomials P_i. This cost can be made linear if P can be written as a product of multilinear polynomials as decribed in Libra by means of a bookkeeping table.

First round

  • The verifier evaluates v_{d} = V_{d}(r_{d}', r_{d}) where (r_{d}', r_{d}) \in \mathbb{F}^{b_N + b_G} are random challenges. Then he sends r' and r to the prover. The evaluation of V_{d} can be done by interpolating the claimed output y.
  • The prover and the verifier engages in a sumcheck protocol to prove that v_{d} is consistent with the values of the layer d-1. To do so, they use the relation we saw previously. V_i(q',q) = \sum_{h' \in \{0, 1\}^{b_N}}\sum_{h_L,h_R \in \{0, 1\}^{b_G}}P_{q', q, i}(h', h_L, h_R)

Important note. This is an equality of functions \{0,1\}^{b_N}\times\{0,1\}^{b_G}\to\Bbb{F}:

  • on the LHS the map V_i(\bullet',\bullet)
  • on the RHS the map \sum_{h' \in \{0, 1\}^{b_N}}\sum_{h_L,h_R \in \{0, 1\}^{b_G}}P_{\bullet', \bullet, i}(h', h_L, h_R)

Since we are going to try and apply the sumcheck protocol to this equality, we first need to extract from this equality of functions an equality between a field element on the LHS and an exponentially large sum of field elements on the RHS. This is done in two steps:

  • multilinear interpolation of both the LHS and RHS to maps \Bbb{F}^{b_N}\times\Bbb{F}^{b_G}\to\Bbb{F},
  • evaluation at some random point (Q',Q)\in\Bbb{F}^{b_N}\times\Bbb{F}^{b_G}
  • sumcheck protocol invocation to establish V_i(Q',Q) = \sum_{h' \in \{0, 1\}^{b_N}}\sum_{h_L,h_R \in \{0, 1\}^{b_G}}P_{Q', Q, i}(h', h_L, h_R), which requires low degree interpolation of the map P_{Q', Q, i}(\bullet', \bullet_L, \bullet_R):\{0,1\}^{b_{N}}\times\{0,1\}^{b_{G}}\times \{0,1\}^{b_{G}}\to\Bbb{F}. This particular low degree-interpolation is systematically constructed as a product of multilinear interpolations of functions that together make up the map P_{\bullet', \bullet, i}(\bullet', \bullet_L, \bullet_R):\{0,1\}^{b_{N}}\times\{0,1\}^{b_{G}}\times\{0,1\}^{b_{N}}\times\{0,1\}^{b_{G}}\times \{0,1\}^{b_{G}}\to\Bbb{F}

We won’t bother further with the distinction (q,q')\in\{0,1\}^{b_N}\times\{0,1\}^{b_G} and (Q,Q')\in\Bbb{F}^{b_N}\times\Bbb{F}^{b_G}.

As a result, at the final round of the sumcheck, the verifier is left with a claim of the form P_{r_{d}', r_{d}, i}(r'_{d-1}, r_{L, d-1}, r_{R, d-1}) == a, where r'_{d-1}, r_{L, d-1} and r_{R, d-1} are the randomness generated from the sumcheck. Instead of directly running the evaluation (which requires knowledge of V_{d-2}), the verifier asks the prover to send evaluation claims of V_{d-1}(r'_{d-1}, r_{L, d-1}) and V_{d-1}(r'_{d-1}, r_{L, d-1}). The verifier can then check that those claims are consistent with the claimed value of P_{r_{d}', r_{d}, i} using its definition.

To sum it up, we reduced a claim on V_{d} to two claims on V_{d-1}.

Intermediate rounds

We could keep going like this down the first layer. However, this would mean evaluating V_0 at 2^d points. This exponential blow up would make the protocol impractical. Thankfully, there are two standard tricks we can use in order to avoid this problem. Those are largely discussed and explained in the Hyrax and Libra paper.

  1. The original GKR paper asks the prover to send the univariate restriction of V_i to the line going through v_{0, i} and v_{1, i} and use a random point on this line as the next evaluation point.
  2. An alternative approach due to Chiesa et al. is to run a modified sumcheck over a random linear combination \mu_0 P_{q', q_0, i} + \mu_1 P_{q', q_1, i}. By doing this, we reduce the problem of evaluating V_i at two points to evaluating V_{i-1} at two point. It is modified in the sense that the “circuit geometry functions” add_i and mul_i are modified from round to round:
    \begin{array}{l}(\mu_0 P_{q', q_0, i} + \mu_1 P_{q', q_1, i})(h', h_L, h_R) \\ \quad= eq(q', h')\cdot \left[\begin{array}{c}(\mu_0add_i(q_0, h_L, h_R)+\mu_1add_i(q_1, h_L, h_R))\cdot(V_{i-1}(h', h_L)+V_{i-1}(h', h_R)) \\[1mm]+ (\mu_0mul_i(q_0, h_L, h_R)+\mu_1mul_i(q_1, h_L, h_R))\cdot V_{i-1}(h', h_L)\cdot V_{i-1}(h', h_R)\end{array}\right]\end{array} The overhead of this method is negligible.

Hyrax and Libra suggest to use the trick (2) for the intermediate steps. For the last rounds, we will instead apply the trick (1). This is to ensure that the verifier evaluates only one point of V_0.

The last steps

At the final steps, the verifier evaluates V_0 at the point output by the last sumcheck.

Cost analysis

As a sum up the verifier:

  • Generate the first randomness by hashing (x, y). One (|x| + |y|)-sized hash. In practice, this computation is larger than all the individual hashes. We expand later on a strategy making this practical.
  • Evaluates V_{d} and V_0 at one point each.
  • d(2b_G + b_N) consistency checks. As a reminder they consists in evaluating a low-degree polynomial at 3 points: 0, 1, \eta. And calling a random oracle.

For the prover

  • Evaluate the V_k. This is equivalent to execute the computation.
  • Compute the polynomials of the sumcheck: ~\alpha dNG + \alpha dGb_G multiplications, where \alpha is small (~20). Since, in practice N \gg b_G, we can retains 20dNG.
  • d(2b_G + b_N) fiat-shamir hashes to compute

Our contribution (2/2)

Having recalled the basics of sumcheck and GKR, we are ready to present our idea. We stress that this is a proposal. We are very interested in receiving feedback, in particular attempts at breaking this scheme.

Compiling GKR verifier in a R1CS

Hyrax proposes to compile this protocol using discrete-logs assumptions in order to obtain a zero-knowledge succinct non-interactive argument of knowledge. Its purpose is to establish a proof system without trusted setup. Libra propose to use a multilinear commitment, this encapsulate the evaluations of V_d and V_0 at the expense of an increased prover time.

The verifier typically runs in ~1sec. However, it is be feasible to check the verifier transcript of the GKR NIZK with a pairing-based argument (like Plonk or Groth16).

The evaluations of V_0 and V_{d} requires few multiplication gate: 1 for each input and 1 for each output. On top of that it allows us to check relations between the inputs and the output cheaply. And the verifier time is constant.

The sumchecks are however expensive to verify in practice as we need to use randomness from the verifier. The classical way to do it non-interactively is to use the Fiat-Shamir heuristic and therefore a hash function. Even though, this is a logarithmic overhead, it has a large constant (in our circuit, we need to hash ~20k field elements).

Tuning GKR to make it a gMIMC hash proving accelerator

In the constraints model of GKR we may only have addition gates and multiplication gates and we cannot use “constants”. However, nothing prevents us from using different sets of gates. One may even use gates with different fan-in if needed. We take advantage of this observation to design a proof system specially tailored for data-parallel gMIMC hash computation. Each copy of the base circuit takes 2 inputs and returns one output.

Using custom gates

Following a suggestion in Libra, we define a custom family of gates. Let \alpha be the exponent for gMiMC in \Bbb{F} (a small positive integer, 3 - 7 in practice) and let k(1),\dots,k(101) be elements in \mathbb{F} (there are 101 rounds in gMiMC). Ciph_{k(i), \alpha}(x,y) = x + (y + k(i))^\alpha and the copy gate Copy(x, y) = x

The polynomial P_{q', q, i} must be set to P_{q', q, i}(h', h_L, h_R)= eq(q', h')\cdot\left[\begin{array}{r}ciph_i(q, h_L, h_R)\cdot Ciph_{k(i), \alpha}\Big(V_{i-1}(h', h_L),~V_{i-1}(h', h_R)\Big) \\+ copy_i(q, h_L, h_R)\cdot Copy\Big(V_{i-1}(h', h_L),~ V_{i-1}(h', h_R)\Big)\end{array}\right] The ciph_i(\bullet, \bullet_L,\bullet_R) and copy_i(\bullet, \bullet_L,\bullet_R) functions are the analog of the add_i(\bullet, \bullet_L,\bullet_R) and mul_i(\bullet, \bullet_L,\bullet_R) in that they encode the geometry of the gMiMC base-circuit. We use multilinear interpolation to extend their domain. For cost analysis (see below) we record the degrees of these polynomials w.r.t. the variables:

  • it has degree \alpha+1 in h',
  • it has degree 2 in h_L
  • it has degree \alpha + 1 in h_R.

(Note: the various “+1” and “+2” come from degree 1 occurrences of variables in eq, ciph_i and copy_i.)

The verifier has to evaluate a degree \alpha polynomial in the consistency checks of the sumcheck. Since alpha is small, this is a negligible overhead compared to the cost of hashing. We can also still use Libra’s bookkeeping algorihm to keep the prover runtime small.

As a result, we obtain a GKR circuit able to verify the computation of N = 2^{b_N} hashes. Its base circuit consists of 2 gates per each layer (one copy gate, one ciph gate) and has a depth of 101.

Summing up, the consistency check costs 101[(b_N + 1)(\alpha + 2) + 3] constraints. The cost of Fiat-Shamir hashing is, for each GKR layer, (\alpha + 2) hashes of field elements: b_N times for the sumchecks rounds on h', \alpha + 2 field elements once for those on h_R and 3 fields elements once for those on h_L. Let T(n) denote the cost of hashing a string of n fields elements. The total cost of the Fiat-Shamir hash can be rewritten as 101[(b_N + 1)T(\alpha + 2) + T(3)]. These hashes could conceivably be computed directly in the smart contract using a cheap hash function (b_N=20, \alpha=7 equate to 20k hashes and a multiexp of that size or inside the SNARK) or inside the SNARK circuit (with approximatively T(n) = 350n, we estimate it is going to cost an additional 7M constraints.

The evaluation of V_0 and V_{d} is efficient in term of constraints. We need to interpolate 2 multilinear polynomials at given points from their evaluation representation. It takes 1 constraint per input and 1 constraint per output for a total of 3N constraints (plus the logarithmic overhead) per hash to prove. We can similarly obtain circuits for proving hashes with a different number of inputs using analogous methods.

Splitting the depth of the circuit

Although the circuit is asymptotically efficient, the sumcheck induces a lot of overhead, mostly because of the Fiat-Shamir hashes. It is possible to trade asymptotic efficiency for lower overhead by using two sub-circuits of depth 51 to compute a hash. This implies that each sub-circuit is doing two halves of the rounds needed to compute a hash in parallel. Hence a two-fold decrease of depth but a two-fold increase in size of the public input/output. This can be generalized for any n-split of the circuit.

Composing several GKR circuits

As already mentioned, the majority of the verifier overhead lies in the usage of Fiat-Shamir to generate the sumcheck’s challenges. Depending on the input size, it yields in practice the need to hash 10000-30000 field elements. We propose to delegate the hashes to another GKR circuit with an important split factor.

As a result, the rollup circuits has less hashes to perform which contribute to the reduce the overhead.

Generating the initial randomness

There remains an unsolved problem: generating the very first random input r_1 for the very first round of the very first sumcheck run dedicated to checking consistency between layers 0 and 1. In an interactive setting, this would pose no problem: the verifier would simply randomly sample r_1=(q_1',q_1)\in\Bbb{F}^{b_N}\times\Bbb{F}^{b_G}, send it to the prover who would then start work on generating a sumcheck proof for the equality
V_d(r_1) = \sum_{h\in\{0,1\}^{b_N}}\sum_{h_L,h_R\in\{0,1\}^{b_G}} P_{q',q,d}(h, h_L,h_R), and continue down the sumcheck protocol. When we turn things non-interactive, though, we need to efficiently generate the initial randomness, verifiably and with little overhead. Of course, hashing all of x and y (say) is out of the question: this is precisely the work that the verifier tries to avoid having to do. We see three viable possibilities for that.

  • Generate randomness from a separate deterministic sumcheck / GKR run
  • Pedersen Hash solution

Generate randomness from a separate deterministic sumcheck

Both a (noninteractive) sumcheck run and a (noninteractive) GKR run involve moderate hash computations. To be precise: under Fiat-Shamir, every variable elimination, i.e. every round in the sumcheck protocol, as well as the final round, require a hash computation.

The idea is thus to do a separate run (of either a new sumcheck problem or the actual GKR) using an initial random seed (likely low entropy and possibly biasable). That separate run is used to bind us to x and y through intermediate Fiat-Shamir hash computations. Note taht this involves only very few intermediate hashes. Using the hashes thus produces, one constructs a good initial random seed for the actual circuit verification.

In other words, one uses a dummy sumcheck / GKR run to produce a quickly (polynomial time) verifiable hash for an (exponentially) large input (x and y). Since the SNARK circuit re-uses x and y in the actual GKR proof, we expect this to be binding to x and y.

We see three simple ways to generate an agreed upon initial seed:

  • a hardcoded value such as r_1^\mathsf{sep}=-1, or a “more complicated” hardcoded value r_1^\mathsf{sep}\in\Bbb{F} (e.g. a generator of \Bbb{F}^\times).
  • a value r_1^\mathsf{sep} deterministically generated from a random beacon value RB that changes periodically and is known to all participants

We next describe two options for the separate sumcheck / GKR runs used to generate the initial randomness.

A simple sumcheck

Let r_1^\mathsf{sep} be some agreed initial randomness seed. One can consider the following sumcheck problem a=\sum_{i=1}^{G\times N}x_i+\sum_{i=1}^{G\times N}y_i=\sum_{i=1}^{2\times G\times N}u_i where we set u=x\|y, the concatenation of the vectors x and y, and view it (equivalently) as a function \widehat{u}:\{0,1\}^{b_G + b_N + 1}\to\Bbb{F} where \widehat{u}\big(\epsilon_0,\dots,\epsilon_{b_G+b_N}\big)=u_i if \epsilon_{b_G+b_N}\cdots\epsilon_1\epsilon_0 is the binary representation of i-1\in[\![~0\dots 2^{b_G+b_N+1}~[\![ for instance.) Alternatively, one may also consider u'=x\|0_{G\times N}\|y\|0_{G\times N} or some other padded variant on u (again, viewing it as a function \widehat{u'}:\{0,1\}^{b_G + b_N + 1 + 1}\to\Bbb{F}. Padding with 0’s (and thus doubling the size of the domain of the function) doesn’t affect the sum but produces an extra round in the sumcheck protocol and completely changes the intermediate steps of the computation.

Using the (low entropy) initial seed r_1 as the initial randomness for Fiat-Shamir, one runs an instance of sumcheck computing intermediate hashes as one goes along. The intermediate hashes thus generated (O(b_G+b_N) many) serve as binding commitments to x and y. From them, one is free to deterministically generate the proper randomness r_1 to be used to bootstrap the actual Fiat-Shamir noninteractive execution of GKR.

Note that x and y will (likely) be private inputs in the Snark circuit. (Although they may be public for some use cases, or parts of them may be public, for instance if the overall computation represents Merkle branch verifications from leaves to a public root hash.) Thus the Snark circuit will re-use these same x and y in the proper GKR verification.

Duplicating the GKR run

The idea is the same as above, but rather than using a completely different instance of the sumcheck protocol to generate the randomness bootstraping the “real” noninteractive GKR verification, one runs a dummy verification of the same GKR circuit (with initial randomness r_1^\mathsf{sep}).

Again, the SNARK circuit will ensure that x and y are re-used in both executions. This demands more work from the prover in that it doubles the effort on the prover’s end (and only slightly more work from the verifier). But it produces further consistency. Indeed, the first polynomial produced in the GKR prover run, P_1(X_1)=\sum_{b_2,\dots,b_n\in\{0,1\}} P(X_1,b_2,\dots,b_n), is the same no matter what. Its coefficients (of which there are very few, say, 3\mu) may thus be provided as further public data in the SNARK circuit used to verify the GKR proof.

N.B. We believe the first solution (separate sumcheck run) to be just as safe and less work for the prover, and thus superior.

Pedersen Hash solution

All of the checks we describe live in a constraint system. This approach proposes to leverage the final verifier in order to succinctly prove the hashing of x and y. We do it with the following protocol transforms:

  • Make x and y public inputs of the SNARK proof (if they were not already, it depends on the use-case). We set u = x \| y, the concatenation of the vectors x and y, and set u_i to be its i-th coordinate.

A priori this would require the prover to send x and y to the verifier, who would have to include them in its SNARK multi-exponentiation. For large computations (i.e. large vectors x and y) this imposes a large multi-exponentiation onto the verifier, which is undesirable. Furtermore, the verifier may not care about either x and y, say if they represent intermediate steps in a Merkle proof where only the root hash is public knowledge (and thus, ideally, only the last coordinate of y, say, ought to be public data). This problem is bypassed as follows.

  • The prover computes G = \sum_{i \in [|u|]} u_i\cdot G_i, where the G_i are the SNARK verification key parts corresponding to the public inputs u_i, and sends it to the verifier.

The verifier thus won’t need to compute the associated part of the public input multi-exponentiation. It can directly plug G into the multi-exp. The purpose fo G is to be binding to x and y and to serve as the randomness seed for the very first Fiat-Shamir randomness.

  • Both the verifier and prover compute r = H(G).

We upgrade r to a public input of the SNARK proof. There is thus an associate curve point G_r in the verification key to be used by the verifier in its multi-exponentiation. The purpose of r is to serve as the initial Fiat-Shamir randomness.

  • The verifier computes its public input multi-exponentation as r\cdot G_r + G + \Delta where \Delta is the remaining part of the multi-exp, corresponding to the “actual SNARK verification”.

This is equivalent of the using the hash of a pedersen hash of x and y by reusing the trusted setup. It is worth noting that in the end, the verifier does not have to perform computation linear in |u| as the prover does it for him.


Summarizing my own intuition pumps for this scheme below.

Let’s take a much simpler example, where the circuit is just the raw MIMC permutation: x_{i_1} = x_i^3 + k_i. So each “copy” of the circuit is just a straight line, of depth d (eg. d = 200). Let V_0(b_1...b_k) be a k-variate linear-in-each-variable polynomial representing a “hypercube” containing the inputs (ie. the inputs would be at V_0(0, 0, 0...0, 0), V_0(0, 0, 0...0, 1), V_0(0, 0, 0...1,0) etc) and V_d(b_1....b_k) is similarly a polynomial representing a hypercube containing the outputs.

Within the hypercube (ie. at all inputs where all coordinates are 0 or 1), we have V_{i+1}(b_1...b_k) = V_i(b_1...b_k)^3 + k_i. But we can’t make this true for the polynomials in general (ie. including outside the hypercube), because then the degree of each V_i would keep multiplying by three, so later V_i's would take an exponential amount of time to compute (and require an exponentially large trusted setup). Rather, we want each V_i to be linear in each variable. So instead, we pull some clever tricks involving sumcheck protocols.

Let us define the polynomial eq(b_1...b_k, c_1...c_k) as being the multilinear polynomial that equals 1 when (c_1 ... c_k) = (b_1 ... b_k) and 0 elsewhere inside the hypercube. For example, for k = 2, the polynomial would be:

(1-b_1) * (1-b_2) * (1-c_1) * (1-c_2)
+ (1-b_1) * b_2 * (1-c_1) * c_2
+ b_1 * (1-b_2) * c_1 * (1-c_2)
+ b_1 * b_2 * c_1 * c_2

We now make a new equation connecting V_{i} and V_{i+1}:

V_{i+1}(b_1...b_k) = \sum_{(c_1...c_k) \in \{0,1\}} eq(b_1...b_k, c_1...c_k) * V_i(c_1...c_k)^3 + k_i

If you evaluate this equation for any point (b_1...b_k) in the hypercube, you’ll get an expression that contains V_i^3 + k in one term (where (c_1...c_k) = (b_1...b_k)) and zero everywhere else. Now you might ask, why do this? Why make the relation even more complicated by turning it into a sum containing the original value plus a bunch of zeroes?

The answer is this: unlike the original equation V_{i+1}(b_1...b_k) = V_i(b_1...b_k)^3 + k_i., which is only true inside the hypercube, this new equation is true everywhere; one could evaluate it at V_{i+1}(123, -5, 42069, 7395723240) and it would check out, despite the fact that V_{i+1} or even eq don’t directly “mean anything” at those coordinates.

Notice a key piece of cleverness that makes this possible: no matter which point you’re evaluating V_{i+1} at, V_i is only ever evaluated within the hypercube. The randomization in making an “out-of-domain” evaluation instead comes from the fact that if you try to evaluate the above expression outside the hypercube, it’s eq that will give outputs other than 0 or 1, and so you’re doing a random-looking linear combination across the whole domain.

Now, how do we actually use this? First, recall the sum-check protocol:

  • You are trying to prove a_0 = \sum_{b_1...b_k \in \{0,1\}} f(b_1...b_k)
  • You provide P_1(0) = \sum_{b_2...b_k \in \{0,1\}} f(0, b_2...b_k) (ie. the top half of the cube) and P_1(1) = \sum_{b_2...b_k \in \{0,1\}} f(1, b_2...b_k) (ie. the bottom half of the cube)
  • Verifier sends you a random coordinate r_1, computes a_1 = P_1(r_1), and asks you to recursively prove \sum_{b_2...b_k \in \{0,1\}} f(r_1, b_2...b_k) (one can think of this as taking a random linear combination of the top and bottom halves of the cube, and repeating the protocol with the resulting half-size cube)
  • Repeat until the prover provides the constant P_n = f(r_1 ... r_k). Use the original polynomial commitment to f to verify that this evaluation is actually correct.

We use this protocol, but we do something clever. We start by asking for a commitment to V_0 and V_d, take a random coordinate z_d = (b_1 ... b_k), ask for V_d(z_d). The goal is, of course, to prove that the above sum-check relation holds between V_d and V_{d-1}, V_{d-1} and V_{d-2} and so on all the way down to V_1 and V_0. But to save prover time we’re not going to ask the prover to make a commitment to any of the intermediate V_i layers. Instead, we run the sumcheck protocol for V_d(z_d), get out of it some point z_{d-1} = (r_1...r_k), and we directly plug that into the sumcheck protocol for the next layer. We keep walking through all the layers, and finally we get a claimed evaluation of V_0, and we check that against V_0. Effectively, we don’t even bother checking anything in the middle, we just trust the prover all the way until the circuit gets to an evaluation of V_0, and only check its correctness at the end.

This allows you to prove N MIMC permutations with commitments (ie. size-N polynomials, requiring a size-N elliptic curve linear combination) at only the start and the end, so you only need 2N elliptic curve operations. Everything in the middle only requires prime field operations, which are much faster. You can then use the V_0 and V_d commitments as a lookup table in PLONK, accessing the table in one constraint per read.

The work above extends this scheme to look at not just homogeneous circuits that repeat circuits of width 1, but also repeated circuits more generally.


Very cool!

One issue with GKR-based protocols including Giraffe, Libra, Gir++ etc. is that they require layered arithmetic circuits. Also, the verifier’s proof checking scales with the depth of the layered circuit (so hundreds of layers for MiMC).

Check out an adaptation of the sum-check protocol and multilinear polynomials that avoids these problems in the Spartan proof system: I suspect Spartan can also support parallel computation (see SpartanNIZK).