AirScript: language for defining zk-STARKs

Not sure I’m understanding what’s going on here.

when ($k0) {

The idea here is that $k0 is a constant register which is set to 0 for steps that are multiples of 32 and 1 elsewhere, correct?

       S1: [$r0, $r1, $r2, $r3];
       S1: MDS # S1^alpha + K1;
       S1: MDS # S1^(inv_alpha) + K2;

What does it mean for S1 to be set to three different things? Also, is “#” a comment or something else?

   h: $p0 ? $r4 | $r0;

So $p0 is a list of bit constants that states what the path in the Merkle tree is?

       S1: [h, $s0, 0, 0];
       S2: [$s0, h, 0, 0];

Where are these values being read? It seems like they’re just being overwritten above…

And where are the constraints on the input and output?

Here are quick answers to the questions, but I’ll also write up a separate post later to explain in detail how this all works.

  1. Yes, $k0 is a constant register that repeats the following pattern every 32 steps: 31 ones followed by 1 zero. So, for steps 0-30 $k0=1 and for step 31 $k0=0; then for steps 32-62 again $k0=1 and for step 63 $k0=0 etc. This pattern is defined on line 131 of this file.
  2. # is a matrix multiplication operator. I describe all available operators here.
  3. S1 is just a variable - so, setting it to 3 different things works the same way as it work in regular programming languages. That is, first the variable S1 is defined as a vector of 4 elements initialized to the values held in registers $r0, $r1, $r2, and $r3; then it is set to the result of the expression MDS # S1^alpha + K1; and finally it is set to the result of MDS # S1^(inv_alpha) + K2 expression.

Just to give an example with simpler expressions, if I had the code like:

S1: [$r0, $r1];
S1: S1 + 2;

It would basically mean that after these two statements S1 is a vector of two values: [$r0+2, $r1+2]. I describe how variables work here.

  1. Yes, $p0 holds bit constants with the path in the Merkle tree - but these constants are “stretched”. So, if the proof is for leaf at index 42, the path would be 00101010 (assuming tree of depth 8). And the way these values would appear in register $p0 is: 0 for steps 0-31, 1 for steps 32-63, 0 for steps 64-95 etc.

  2. The boundary constraints are defined separately. For example, you can see one such constraint on line 165 of this file. It basically says that the value of register $r0 at the last step should be equal to the root of the Merkle tree.

Below is a more detailed explanation of how the current version of genSTARK library works on the example of a Merkle proof computation.

Library components

First, I want to briefly describe all the components that come into play to when generating/verifying STARK proofs (a more in-depth explanation is here and here). Here is a high level diagram of how proofs can be generated:

image

In the above diagram:

  • AirScript code defines transition function, transition constraints, and readonly registers. This code is then “compiled” into a Stark object.
  • Stark object is used to generate proofs. You can generate many proofs using the same Stark object by providing different assertions (same as boundary constraints), secret and/or public inputs.

Verification of a proof works in a similar manner, except the verifier does not need to supply secret inputs to the verify() method:

image

Merkle Proof

Let’s say we have a Merkle tree that looks like this (numbers in the circles are values of the nodes):

image

For illustrative purposes I’m assuming that hash(a,b) = ab. For example, hash(3,4) = 34 and hash(4,3) = 43.

A Merkle proof for leaf with index 2 (node with value 3) would be [3, 4, 12, 5678]. To verify this proof we need to do the following:

  1. Hash the first two elements of the proof: hash(3,4) = 34
  2. Hash the result with the next element of the proof: hash(12,34) = 1234
  3. Hash the result with the next element of the proof but now the result should be the first argument: hash(1234, 5678) = 12345678

Execution Trace

Now, let’s translate the above logic into an execution trace that we can later convert into a STARK proof.

First, let’s say we have a hash function that works like this:

  1. The function requires 4 registers to work and takes exactly 31 steps to complete.
  2. To hash 2 values with this function, we need to put the first value into register 1 and the second value into register 2 (the other 2 registers can be set to 0).
  3. After 31 steps, the result of hashing 2 values will be in register 1.

Our execution trace will have 8 mutable registers - 4 registers for computing hash(a,b) and the other 4 registers for computing hash(b,a). These registers are named r0 - r7. We also need a few readonly registers:

  • k0 - this register will control transition between invocations of the hash function.
  • p0 - this is public input register that will contain a binary representation of the index of the node for which the Merkle proof was generated. In our case, the index is 2 so the binary representation is 010.
  • s0 - this is a secret input register that will contain elements of the proof.

Now, since we need to invoke the hash function 3 times, and we need a step between each invocation to figure out which values to hash next, our execution trace will have exactly 96 steps (for the purposes of this example I’m not requiring the trace length to be a power of 2):

Step k0 p0 s0 r0 r1 r2 r3 r4 r5 r6 r7
0 1 0 12 3 4 0 0 4 3 0 0
1 0 12
31 0 0 12 34 43
32 1 1 5678 34 12 0 0 12 34 0 0
1 1 5678
63 0 1 5678 3412 1234
64 1 0 1234 5678 0 0 5687 1234 0 0
1 0
95 0 0 12345678 56781234

Here is what’s going on:

  1. [step 0]: we initialize registers r0 - r7 as follows:
    a. First element of the proof (3) is copied to registers r0 and r5.
    b. Second element of the proof (4) is copied to registers r1 and r4
    c. All other mutable registers are set to 0
  2. [step 31]: after 31 steps, the results of hash(3,4) and hash(4,3) are in registers r0 and r4 respectively. Now, based on the value in register p0 we decide which of the hashes advances to the next step. The logic is as follows: if p0=0, move the value of register r0 to registers r0 and r5 for the next step. Otherwise, move the value of register r4 to registers r0 and r5.
    a. In our case, at step 31 p0=0, so the value 34 moves into registers r0 and r5 for step 32.
  3. [also step 31]: At the same time as we populate registers r0 and r5 with data for the next step, we also populate registers r1 and r4 with data for the next step. In case of r1 and r4, these registers are populated with values of register s0 which holds the next Merkle proof element.
    a. In our case at step 31 value of s0=12, so 12 moves into registers r1 and r4 for step 32.
  4. After this, the cycle repeats until we get to step 95. At this step, the result of the proof should be in register r0.

In my prior posts I’ve shown an AirScript transition function that would generate such a trace. Here is a simplified version:

when ($k0) {
  // code of the hash function goes here
}
else {
  // this happens at steps 31 and 63 (technically at step 95 as well, but doesn't matter)

  // select the hash to move to the next step
  h: $p0 ? $r4 | $r0;

  // value h goes into registers r0 and r5
  // value from s0 goes into registers r1 and r4
  out: [h, $s0, 0, 0, $s0, h, 0, 0];
}

And here is how transition constraints for this would be written in AirScript:

when ($k0) {
  // code of the hash function constraints goes here
}
else {
  h: $p0 ? $r4 | $r0;
  
  S: [h, $s0, 0, 0, $s0, h, 0, 0];

  // vector N holds register values of the next step of computation
  N: [$n0, $n1, $n2, $n3, $n4, $n5, $n6, $n7];

  // this performs element-wise subtraction between vectors N and S
  out: N - S;
}

Readonly registers

A bit more info on how values for readonly registers (k0, p0, s0) are set:

As I described previously, k0 is just a repeating sequence of 31 ones followed by 1 zero. It works pretty much the same way as round constants work in your MiMC example. Within AirScript k0 is considered to be a static register are the values for this register are always the same.

Values for p0 are calculated using a degree n polynomial where n is the number of steps in the execution trace. The polynomial is computed by generating the sequence of desired values (e.g. for index 010 it would be 32 zeros, followed by 32 ones, followed by 32 zeros), and then interpolating them against lower-order roots of unity. Since the verifier needs to generate values for this register on their side as well, p0 register is considerably “heavier” as compared to k0 register. But for execution traces of less than 10K steps (or even 100K steps), the practical difference is negligible. Within AirScript, p0 is considered to be a public input register as its values need to be computed at the time of proof generation and verification and so must be known to both the prover and the verifier.

Values for s0 are computed in the same way as values for p0 but unlike p0 the source values are not shared with the verifier. Instead, values of s0 are included into the proof in the same way as values of mutable registers. From the verifier’s standpoint, s0 is almost like another mutable register, except the verifier does not know a transition relation for this register. The verifier is able to confirm that values of s0 form a degree n polynomial where n is the number of steps in the execution trace.

This is not really a problem given our transition function and constraints. Basically, what the verifier sees is that every 32 steps an unknown value is copied into registers r1 and r4, and then this value is used in hash function calculations that produce the desired output. Within AirScript, s0 is considered to be a secret input register as the verifier is not able to observe source values for this register.

Boundary constraints

The only boundary constraint that we need to check here is that at the end of the computation we get to the root of the Merkle tree. If everything worked out as expected, the root should be in registers r0 or r4 (depending on wither the index of the leaf node is even or odd).

1 Like

Thanks! This is really helpful.

Great! If you (or anyone else) see any issues or have any concerned about the methodology I outlined above - do let me know :slight_smile:

I did have one more question for you: in your MiMC example, you build a Merkle tree of P, D, and B evaluations (source) ; then you use the root of this tree to compute linear combinations of P, B, and D; and then you build another Merkle tree from these linear combinations (source).

My question is: for the first Merkle tree, do we really need to include values of P, D, and B? Could we get away with just using values of P to build it? Here is how I’m thinking about it:

The prover builds the first Merkle tree using values of P evaluations only (mTree). Then uses the root of mTree to compute linear combinations of P, D, and B, and then builds another tree from the resulting values (lTree). This last part is the same as in your example.

The verifier then does the following:

  1. Verifies that values of all spot check of P are indeed a part of mTree (this is done via Merkle proofs against mTree root - same as in your example).
  2. Uses these values to compute values of D and B for each spot check.
    a. D value can be computed as C(P(x)) / Z(x)
    b. B value can be computed as (P(x) - I(x)) / C(P(x))
  3. Computes linear combinations of P, D, and B using mTree root (same as in your example).
  4. Verifies that linear combinations for all spot checks are a part of the lTree (this is done via Merkle proofs against lTree root - same as in your eample).

Would this approach compromise security somehow?

Interesting! I think that might actually work…

If it does work, that would be really cool! I quickly implemented this simplification in this pull request and the results are:

  • About 3% - 4% proof size reduction for something as simple as MiMC.
  • About 20% proof size reduction for something slightly more complex (e.g. Merkle proof). For example, proof of a Merkle branch verification for a tree of depth 16 is now 109 KB (vs 136 KB before).

In general, the more complex the STARK (i.e. more transition and boundary constraints), the greater the impact. Though, I’d expect the impact to be smaller for proofs with long execution traces (there, proof size is dominated by FRI proofs).

109 KB… nice! If you crank up the code redundancy from 8x to 16x and do 2**20 proof of work on the Merkle root and then decrease the branch count based on that (the code redundancy should let you decrease by 25%, and the PoW by another ~20%) then it seems like we’re close to these STARKs fitting inside a block!

Adding proof of work to the Merkle roots is a very cool idea (and it’s pretty simple to do)! Here is how I’m thinking of implementing it:

  1. Compute a nonce such that hash(nonce | root) < threshold.
  2. Then, use this nonce as a seed to PRNG used to generate indexes for spot check.

The nonce would be included in the proof, and the threshold parameter could be configurable. Also, I’m guessing this technique could be applied to the main Merkle tree (mTree) and Merkle trees in the FRI proof - right?


As I was writing this, another potential optimization idea came to mind:

Right now, the proof includes spot-checks against several Merkle trees. Specifically: spot-checks against mTree (source), spot-checks against lTree (source), and spot checks against m and m2 trees in the FRI proof (source). It might be possible to roll spot-checks against lTree into the FRI proof at no extra cost. Here is how I’m thinking about it:

During the first pass of the FRI proof function, lTree and m are actually the same trees. This is because the values passed into prove_low_degree() function are just evaluations of the linear combination (source).

We might be able to use spot-checks against m tree from the first element of FRI proof instead of spot-checks against lTree to validate the correctness of linear combination. It would work like this:

  1. Build mTree in the same way as it is done now.
  2. Compute linear combination of evaluations in the same way as is done now - but don’t put them into a separate Merkle tree.
  3. Run FRI proof. The first component of the FRI proof will contain spot-checks against m tree which is a tree of all linear combinations.
  4. Use the same indexes as were used to spot-check m tree in the first component of the FRI proof to spot-check the mTree.

The problem is that this undermines security somewhat. By manipulating composition of FRI proof trees, the attacker could theoretically try to hit the “right” indexes in the mTree and fake the proof. But I wonder if adding proof of work requirement to building the trees is enough to offset this reduction in security.

If it does work, it would reduce the proof size by about 10%.

Yep! That looks correct to me.

As I was writing this, another potential optimization idea came to mind:

I’ve thought of this myself, and didn’t do it because it would make the code too complex and I was deliberately making mimc_stark.py an educational implementation rather than something crazy optimized. I have no idea whether or not this is secure; I’d recommend asking Eli.

Another thing worth exploring is playing around with making the FRI skip-down degree 8 instead of 4. The reason why this could be an optimization is that the Merkle tree (at least in mimc_stark.py) is designed in such a way that the FRI check values are right beside each other, so you only pay an O(1) cost for each value, plus O(log(N)) per sample for the branch. Making the skip-down degree 8 cuts down the number of sampling rounds by a factor of log(4)/log(8) (ie. by a third), at the cost of 2x more chunks, but that still seems like a net improvement.

I am wondering, would it make more sense to use a VDF instead of adding proof-of-work to Merkle roots? It could work like this:

  1. Build a Merkle tree.
  2. Compute seed = VDF(root, steps).
  3. Generated indexes using the seed and PRNG.

The reasoning is as follows:

  • 2^{20} proof-of-work takes about 1 second to compute on my computer. But someone with a modern ASIC could compute thousands (or even millions) 2^{20} proof-of-works per second.
  • At the same time, running something like a reverse MiMC on my computer for 2^{16} steps takes about a second as well. But running thousands (or millions) of reverse MiMC calculations per second would take orders of magnitude more resources than buying an ASIC.

Are there any drawbacks of the VDF approach described above that I’m not seeing?

1 Like

The problem is that you’re not really taking advantage of the VDF’s non-parallelizability, because the VDF could be calculated in parallel for many different choices of root. So it reduces to just being a hash function.

Yep, that makes sense. Though, even if it reduces to being just a hash function, I think using MiMC has some advantages. The biggest drawback is probably potential attacks against MiMC that we don’t need to worry about with true and tested hash functions.


There is another potential optimization I thought of which, if works, can reduce proof size by about 15%. The optimization tries to take advantage of some redundancies in the FRI proof structure.

In your example, each layer of a FRI proof contains two sets of Merkle paths (code):

  1. Merkle paths for indexes in the “column” (tree m2 in the example);
  2. Merkle paths for indexes in the “polynomials” (tree m in the example).

In subsequent layers, m2 becomes m, and a new m2 is created from m by splitting it into sets of quartic polynimials and evaluating them at a random point x. I tried to illustrate this in the graphic below:

image

In this picture, P are the “polynomial” Merkle paths, and C are the “column” Merkle paths. For each path in C, there are 4 paths in P (in the example, number of paths in C is 40, and, thus, number of paths in P is 160).

It is easy to see that P_d and C_{d-1} paths come from the same tree. If we can make it so that C_{d-1} is a subset of P_d without compromising security, we can reduce FRI proof size by up to 20%. Here is how I think it could work:

Instead of deriving pseudo-random indexes from roots of m2 at every layer (code), we can do the following:

  1. Derive pseudo-random indexes from the root of m2 at the last layer (i.e. depth d). These indexes will determine paths for C_d (40 indexes).
  2. Use these indexes to determine indexes for P_d (160 indexes). This is done in the same way as now (code).
  3. Use root of m2 at the next layer up to randomly sample indexes for C_{d-1} from P_d (basically, choose 40 random indexes out of the set of 160 indexes).
  4. Same as in step 2, use indexes for C_{d-1} to determine indexes for P_{d-1}.
  5. Move to the next layer up and repeat steps 3 and 4.

Here is an illustration of how indexes for each set of paths could be derived:

image

Basically, we recurse all the way down, and then use the root of the Merkle tree at the bottom layer to derive all other indexes as we move back up in the recursion.

It doesn’t seem like an attacker gains any advantages in this setup. They still need to build all the same Merkle trees on the way down in the recursion, and the indexes for each proof layer are still selected pseudo-randomly based on the roots of all treesn

If you (or anyone else), sees any flaws in this approach, let me know.

I’ve just released a new version of the library (v0.5.4) which has a bunch of speed and some proof size optimizations. The updated benchmarks are here, but at the high level, proof sizes went down a lot without impacting proof generation time. For example, a STARK for a Merkle proof of depth 16 (using Rescue hash function) is now ~63 KB and still takes about 1 second to generate.

To reduce the proof size, here is what I’ve done:

  • Increased extension (or blow-up) factor from 8x to 16x, and reduced query counts to 48 for the execution trace tree, and to 24 for FRI proof trees. If I’ve done my math correctly, the resulting proofs have security level around 96 bits.
  • In FRI proofs, packed all 4 elements of a single “row” into a single Merkle leaf (in the original code, each “row” element was put into a separate leaf). As far as I can tell, this doesn’t affect security, and I think StarkWare guys are doing the same thing (based on “FRI Layer Skipping” section from here).
  • I have not yet added proof-of-work to Merkle roots. Doing so, will increase the security level to ~120 bits.

Proof size can probably be reduced further by 10% - 20% by implementing DEEP-FRI, and maybe even more if the approach I described in the previous post works.

To improve proof generation time, here is what I’ve done:

  • Used latest version of Galois library. This improved speed of arithmetic operations in 128-bit fields by 8x - 9x.
  • Implemented Blake2s hash function in WebAssembly. For small inputs (e.g. 32 - 64 bytes) it came out to be 4x faster than Node’s native implementation.
  • Used a smaller domain for constraint evaluation. For example, if extension factor is 16x and constraint degree is 3, the constraints are evaluated in the domain that is 4x of the execution trace, and then the result is extended to the full 16x domain.

The performance is now approaching the limits of what can be done in a single thread - so, one of the future steps would be to update the library to make use of multi-threading.

1 Like

Truly excellent work!

I think your optimization works and is even a security improvement over the original, because it avoids the attack I have described here. I think it is quite similar to what I proposed at the end of that post. I proposed to follow the path in the forward direction and you are doing it in the reverse direction.

You can find some extra optimizations in my followup post here that you might be able to use.

I have been thinking about it some more and I think this part could be problematic:

An attacker could try to influence which indexes are chosen by trying multiple versions of the tree. The attacker could try to do this again at every level of the recursion, thus greatly increasing the odds that the attack succeeds.

I think it would be better to use the combined hash of all roots as the random source to pick the indexes at every step. Now you only have to add the PoW to this random source instead of to all the tree roots.

Thank you for thinking about this! I’m by no means an expert on these subjects, but here are my thoughts:

Regarding your first comment: I’m not sure I see how the scheme I proposed or the scheme you proposed here can increase security. It seems to me that both schemes make it more difficult to cheat at several layers simultaneously, but they don’t make it any more difficult to cheat at any given layer. I guess the question is, if you are able to cheat in a single layer, does this compromise the entire proof or do you need to cheat at more than one layer? If the former, then at best, the security stays the same.

Regarding other optimizations you mention here:

  1. Already implemented - reduced proof size by quite a bit.
  2. This could be significant too - though it’s quite a bit of work (primarily because I’d need to write optimized versions of octic polynomial evaluation/interpolation functions - otherwise proving time would suffer quite a bit).
  3. I’m not sure the benefit here would be meaningful. Yes, we could send fewer than 256 values, but then we’d have to send Merkle branches for the values that we do send. This probably would offset the savings quite a bit - so, net savings are unlikely to be more than 1% of proof size, and potentially less than that.
  4. I’m not sure I fully understood this. Is the suggestion to increase the number of last layer values beyond 256? If so, I think it could result in some savings - but I’ll need to think more about the exact benefit.

Regarding your last comment: the only tree the attacker would manipulate to influence the choice of indexes is the base layer tree (tree at depth d). All other trees are “fixed” - meaning, if you change any of them, you’d be changing the base layer tree too (so, it’s probably easier to manipulate the base tree directly).

One of the open questions in mind is: does my scheme make it somehow easier for the attacker to build trees at each layer in such a way that the probability of cheating at any one layer increases.

I will explain in more detail how we can do an attack on the original version.

I assume that it possible to cheat on a large majority of the constraints except that the values that go into the FRI are not on a low degree polynomial. So we have 16X values that are supposed to be on a X degree polynomial, but are actually on a 16X degree polynomial. If we build the FRI proof with these values we will get caught at the final step because the 256 final values are not on a 16 degree polynomial, but are on a 256 degree polynomial.

The FRI proof is organized in such a way that we can separate the 16X values into 256 groups. Each group of values are used to calculate one of the 256 final values. These groups are independent in the sense that we can change the values in one group, without causing checks in other groups to fail. Changing values in one group will off-course change which rows/columns are chosen, but all checks on all untouched groups will pass. If we only change the values of a group at one stage of the recursion then that will only cause the checks to fail if a row from that group is chosen at that specific stage of the recursion. They will all pass at every other stage.

To cheat on the FRY we will pick 16 groups with a total of X values and determine what the values in all other groups need to be if they were on the same polynomial as the chosen X. This is our target polynomial. Note that if we would use the target polynomial to start the FRI, we would pass the check on the final 256 values, but it is unlikely that we would pass the first checks, because we have changed 15X of the 16X values. So instead of changing 15X values we will only change a few groups at the first step of the recursion to the target polynomial. Since only a small percentage of the values have changed we have a pretty good chance of not to get caught.

If we change 24 groups and do 40 checks we have a ((256-24)/256)^{40} = 1.9\% chance to pass all checks. We do not have to try many variations of the tree to get away with it.
After we found a solution that passes all checks, we will go to the next level of the recursion and we will change another 24 groups. After 10 recursions all groups are on the same low degree polynomial and we will pass the check on the 256 values.

It is easy to see that our proposals are not vulnerable to this attack, because we will always check at every stage of the recursion and we will spot any changes made.

The attack on your proposal is different but is based on the same principle: Try many variations of the tree and use the best one to go to the next level of the recursion.

It depends on the implementation, but the 160 values from which 40 are chosen are probably ordered in a specific way. In a basic implementation every set of four values come from the same row and are ordered from low to high. Knowing this, we could try many variations of the tree and choose the tree that would pick as many low values a possible. Having done this at every level of the recursion, some values are far more likely to be checked than others. If we chose our X values to be the ones that are found via a path of mainly low values we are more likely to pass all checks.

Regarding optimization 3. The idea is to not publish the 256 values or any merkle branches.
The 40 final values are calculated based on 160 values of the level below and the chosen column.
These 40 final values are supposed to be on a 16 degree polynomial, so we only check if that is the case. This could lower the security of the proof a little because we will accept any 16 degree polynomial, but I don’t think it would help an attacker a lot to be able to use multiple polynomials in the final stage that are not to be mixed.

Regarding optimization 4. The idea is to use more values at the start of the FRY proof (like 32X instead of 16X) and less checks per level of recursion, but as I read above, you have already considered that.