FRI as erasure code fraud proof

vbuterin · December 11, 2019, 2:44pm

Ah, I was just thinking of a list of indices. Or would that be attackable because a prover could create a list that has a lot of duplicates and do mischief that way?

If we want to do a boolean vector, then one chunk in the vector would represent 256 data pieces, so if you hit a 0 you could just query sequentially until you find a 1. To deal with contiguous chunks of D being missing, you could use our index shuffling function to shuffle [1 ... |D|] and put the bits in that order.

The algorithm I know about is the Berlekamp-Welch algorithm (the article used to have a nice graphical example that I added ~7 years ago, but someone else edited the article and made it look like the usual run-of-the-mill terrible wikipedia math article again… ). But this algorithm is O(n^3) so it would be fairly complex to run in practice. I know that you can have an FFT-based algo to correct for known omissions, but that’s a much easier problem…

dankrad · December 11, 2019, 3:57pm

That is my intuition. As a minimum, you would probably have to prove that it is monotone or something like that, which sounds difficult.

It’s great that there is an algorithm, but it’s very likely that O(n^3) will be too slow to solve our problem. Especially since we have to generate fraud proofs quickly, and we have large amounts of data. Sounds like an additional research problem required to be solved to use FRIs in this way.

denett · December 11, 2019, 4:04pm

I don’t think it is necessary for the clients to ask for the full middle level of the merkle tree.
The middle level could also be expanded using Reed-Solomon and the clients could sample this expansion as well.
When this sampling is combined with the other sampling, this will cost only one extra hash per sample.
The checker needs to reconstruct the middle level and checks to see if the upper part of the merkle tree is correct.
If it is correct, the checker could build a proof showing that a sub-branch is incorrect as you described.
Otherwise the checker can construct a proof that the upper part is not correct. The upper part is of the same size as a sub-branch so the proof will be of similar size.

Because the sampling of a middle level is relatively cheap, we can choose to expand and sample every level. Then we only need to proof a subbranch with two nodes.

vbuterin · December 11, 2019, 4:24pm

This is true. Taking this approach to the extreme you basically get coded Merkle trees. It’s a reasonable approach, though I personally feel like the difference between 40 * log(n) and sqrt(n) is small enough that the gains are not worth the complexity. Particularly note that if you’re willing to have fraud proofs be bigger than the sampling size (which is a good tradeoff as fraud proofs are going to be rare) then you can sample a layer slightly above the middle layer, eg. get n^{0.4} sized sampling and n^{0.6} sized fraud proofs.

For eg. 32 MB (2^{20} chunks), 40 samples would be 40 * log_2(2^{20} \div 40) = 587 chunks, whereas the sqrt layer would be 1024 chunks, and a n^{0.4} layer would be 256 chunks.

denett · December 11, 2019, 4:40pm

If the block creator just fills the expansion with random data, it will be impossible to find enough (>25%) points and the checker cannot build a fraud proof. Or am I missing something?

As an alternative, the clients could do the FRI sampling them self. When done in a certain way (as I tried to explain here), it is guaranteed that the values that are sampled are on the same polynomial.

dankrad · December 11, 2019, 11:06pm

I think this construction actually needs two FRIs, one to show that the original erasure coding is “almost correct”, and then potentially one for a fraud proof. If they do this, they will not be able to create the original FRI, which requires a high percentage of the points to be on a low degree polynomial.

Aha, now this is a very interesting idea! Sorry I missed it the first time. So if we do this, we have to do a little bit of extra work when downloading an element by also downloading the FRI path. However, we can probably do this very intelligently by “injecting” those elements required into the Merkle tree. For example, f(z) and f(-z) could be sibling leaves, so getting f(-z) comes at no extra cost. Then if w^2=-z^2, we would make those f(w) and f(-w) a sibling of the node that contains (f(z), f(-z), and at this level mix in the values f(z^2) and f(-z^2).

The Merkle tree would be twice as deep, but it would come with relatively cheap full authentication of the elements lying on a low degree polynomial. Very cool!

Since this scheme would have no difference between coding rate and stopping rate, it may well be the most efficient at the moment (apart from the STARK one, which might not be feasible at this point after all).

dankrad · December 17, 2019, 12:52pm

It seems from some discussions I had that you cannot put all FRI layers into one Merkle tree, as each layer needs to be computed (Fiat-Shamir) from the previous layer.

However, I still quite like the idea of combining data availability checks with spot checks on the FRI layers. It will not save us from having to do FRI for fraud proofs when parts of the data are just not available. But it does improve the construction because we need to accommodate for a smaller buffer to construct that fraud proof, as we don’t need to consider that some of the downloaded data may be incorrect. For example, for a FRI rate of 0.8 and an erasure code rate of 0.25, the original construction would get a stopping rate of 0.25/0.8^2 \approx 0.39, whereas by doing spot checks with data availability checks we get to 0.25/0.8 \approx 0.31.

denett · December 18, 2019, 7:27pm

Yes, we still need extra merkle branches for the FRI, but the described method is quite efficient, because the column merkle branch and the next row merkle branch can be combined.
I expect the size of the samples to increase by a factor of 5 or 6 compared to just sampling a single merkle branch.

dankrad · December 19, 2019, 9:22pm

For the record I would want to correct this: I think this approach need a FRI both for the initial proof of closeness and as a fraud proof. Otherwise, the producer can just commit to a totally random D and nobody would have a chance to make a valid FRI fraud proof out of it (at least not a short one).

I also have a feeling that the FRI proofs we are talking about are going to be much bigger: My estimate (worst case) is that a polynomial would need 2^{27} evaluations (at rate 0.25), which means a Merkle tree of depth 26. So that means each FRI spot check needs 2\cdot 25 field elements plus 27 \cdot 26 /2 = 351 hashes for Merkle proofs. That’s already around 13 kb for one spot check, and to get to 80\% correctness you using my naïve estimate you would need -128 \frac{\log2}{\log 0.8} \approx 400 evaluations (does someone know the correct way to approximate this?)

So it would appear that this FRI is already 5 MB in size. Am I wildly off here?

denett · December 19, 2019, 11:59pm

My estimate for a FRI without special optimizations: A block of 32MB = 2^{20} chunks of 32 bytes, expanded is 2^{22} chunks. A merkle branch for 4 chunks is 20 hashes plus 4 values.
To get the first row is 24 chunks, to get the column and the next row is 22 chunks. So for one spot check we need 24+22+..+4 = 154 chunks. A light client doing 40 spot checks needs to download 192.5kb plus 64kb(for the middle layer) which is 0.78% of the 32MB.

Instead of doing a new FRI for the fraud proof, would it not be sufficient to proof that a subset of \sqrt{|D|} nodes that do not match the merkle tree, actually do match the FRI used for the sampling?

For that we only need the full FRI layer of \sqrt{|D|} size. Because the FRI roots are known we already know all the columns that are chosen during the FRI. So to verify the fraud proof, the \sqrt{|D|} values can be combined using the first column to \sqrt{|D|}/4 values in the next layer. We can continue doing this until we have a single value that can be checked at the committed FRI layer.

So if we let the light-clients download an extra 64kb for the FRI layer, the fraud proof can be around 64kb as well.

EDIT: This fraud proof does not work, because it will be easy to build a fake fraud proof.

dankrad · December 20, 2019, 11:25am

Need to also consider the worst case which is 2^{26} chunks.

Can you say why that is? Because it seems like an elegant idea.

It seems that this is in the right ballpark. So that would make it impractical for fraud proofs. The main problem is that we want to do a FRI at a very high correctness rate, which is expensive. If we instead lower this rate to only 50\% correct, then our proof will be several times smaller. However, we will need to lower the rate to 0.125 then (because we need to make sure that the fraud prover has twice as many correct elements as needed in order to create the FRI; in addition, he needs twice as many since half of them could be incorrect, so at a rate 0.125 the stopping rate would be 0.5).

Then we would need about -128 \frac{\log 2}{\log 0.5} = 128 evaluations, bringing us to ca. 1.6 MB proof size (or 1 MB at security 80 bits).

However, we bought this at the expense of having a stopping rate that’s q=4r (r being the coding rate). For r=0.125 that’s actually worse that the 2D scheme (which would achieve a stopping rate of ca \sqrt r \approx 0.35). Also the fraud proofs in the 2D scheme would only be 370 kb and thus smaller.

denett · December 21, 2019, 8:40pm

Why is the worst case 2^{26}?

FRI is based on the premise that for two lines to intersect at a randomly chosen point, you got either very lucky or the two lines are the same. This does not hold if the point is known in advance. That is why we need Fiat-Shamir at every level of the FRI.

We could still use the trick to roll up the \sqrt{|D|} sized subset to a single value in the fraud proof, but then it needs to be a new FRI where the first column is chosen based on the merkle root of this subset.
Then we use this new FRI to show that more than 25% of the total merkle branches are on the same polynomial as the subset.

If the checker collected 50% of the samples, he can commit using a merkle tree to the 50% of the samples that are missing. So when during the FRI sampling a missing branch is selected, this can be proven via a merkle branch. The resulting FRI has a expansion rate of 2, so for 80 bits of security we will need 80 FRI samples. This means the total proof will be around 450kB. Increasing the original expansion rate will off course decrease this size significantly.

denett · December 22, 2019, 12:47pm

I was wondering, why is it necessary for the light-clients to get 80 bits security on the data availability sampling? Isn’t that a false sense of security, because the bigger risk seems to be that you are one of the few clients that get served the correct samples.
Besides, the main strength of the data availability sampling seems to be that it is impossible to convince a large part of the validators that unavailable data is available. So an unavailable block will never be finalized.

vbuterin · December 22, 2019, 5:14pm

I think 30 bits of security on data availability sampling should be fine.

dankrad · December 22, 2019, 9:57pm

I think for the sampling, much lower security is ok. But for the correctness of encoding, I would go for 128 bits if possible (to be consistent with the rest of the protocol) or 100 as a compromise (assuming that FRIs are a billion times harder to compute than simple things like hashes). I’m very critical of anything that’s only 80 bits security.

dlubarov · December 23, 2019, 10:53pm

Are you sure that’s needed? If we do Fiat-Shamir only on f^{(0)}, then throughout the commit phase the prover can predict where an interpolated polynomial will be evaluated, but they can’t predict which chunk of 2^η points will be interpolated. So it doesn’t seem obvious to me that soundness would break down. Wouldn’t each invalid “coset reduction” (from 2^η points in f^{(i)} to a point in f^{(i + 1)}) still have a certain probability of being detected?

denett · December 24, 2019, 9:33am

To cheat a FRI that is 4x expanded you start on f^{(0)} with 25% of the points on the right polynomial and 75% on the wrong polynomial (worst case). For the 75% that are wrong you can cheat half on f^{(1)} such that all checks on f^{(2)} and higher will be fine. So now instead of having a success rate of 25% we have a success rate of 62.5%. If you cheat on multiple levels you can get this even higher. This is only possible because the column is known in advance, otherwise you would not know how to cheat on f^{(1)}.

denett · December 28, 2019, 8:56pm

We could use the below optimizations to lower the size of the sampling of the light clients and the fraud proof. This will result in a light-client sampling size of 116kB with 30 bits of security and a 272kB fraud proof with 128 bits of security.

Optimizations:

Instead of an expansion rate of 4, we use an expansion rate of 16.
We use 30 bits security on the sampling and 128 bits for the FRI proof.
Instead of sending the middle layer of the expanded merkle tree, we send the middle layer of the original merkle tree of the block data. This is possible because it is always possible to use a subset from the actual block data for the fraud proof. For 32 MB of data this middle layer has a size of 32 kB.
We can stop the FRI early by sending a partial layer of the FRI. When we are for example on a layer where there are 256 values that have to be on a 16 degree polynomial, we can just send the first 16 values and expand these to 512 values. So we don’t need any merkle branches.
Instead of using only cubic interpolation in a FRI we can vary the degree depending on the remaining dept of the FRI. Depending on the number of samples of the FRI there is an optimal degree for every layer. So the optimal degrees for the client sampling will be different from the optimal degrees for the fraud proof.
For the fraud proof we can add extra security by adding 20 bits proof of work. So before the samples are chosen, the FRI builder has to find a nonce such that the hash of the FRI roots and the nonce starts with 20 zero bits. This hash is used as a seed during the sampling.

Light clients
The light clients have to download the 32 kB middle layer, and 15 FRI traces of the following form:
Depth 24: degree 15 (16+20=36 chunks)
Depth 20: degree 7 (8+17=25 chunks)
Depth 17: degree 7 (8+14=22 chunks)
Depth 14: degree 7 (8+11=19 chunks)
Depth 11: 128 values of the polynomial (128 chunks)
For 8 samples the light client has to download (36+35+22+19)*15+128 chunks = 84 kB, together with the middle layer this will be 116 kB.
When we assume a fraud proof can be build when 25% of the samples have been collected, the security is 30 bits.

Fraud proof
For the fraud proof we need 54 FRI traces of the following form:
Depth 24: degree 15 (16+20=36 chunks)
Depth 20: degree 15 (16+16=32 chunks)
Depth 16: degree 7 (8+13=21 chunks)
Depth 13: 512 values of the polynomial (512 chunks)
We need to publish the 32kB of the block data that is invalid. This data can be rolled up to 4 values of depth 16. To validate these values we need one branch of 21-4=17 chunks.
We also need merkle branches showing the indexes that are sampled, but have not been collected. On average this will be 54*3=162 branches of 21 chunks is 3402 chunks. By publishing the level with 256 values of this merkle tree instead of the root, this can be reduced to 2352 chunks.
This will total a fraud proof of 272kB.
With 54 FRI samples at a 4x expansion, this results in 108 bits of security. Adding 20 bits of PoW, results in 128 bits of security.

dankrad · December 29, 2019, 1:41pm

The problem is that the fraud proof might be huge. Imagine one complete quarter of the data missing up to a second-layer Merkle root, you would have to provide that whole chunk of data!

That sounds like a cool technique, is there a description of this with security proof?
Isn’t there a problem that now the prover can commit to some data and then get lots of different proofs of work on it to manipulate the samples?

The problem here is that the 30 samples only guarantee a very low degree of proximity. So 25% is not the right stopping rate, we will need a much higher number of samples so that the fraud prover has enough good samples to construct a FRI with.

denett · December 29, 2019, 9:48pm

There is indeed a problem when block data is available and the merkle tree for the block data is correct, but there exists a withheld merkle branch from the extension part that is not on the polynomial. In that case, it is not possible to construct a fraud proof (because it is to large) and we can not accept this block as valid, because using the withheld merkle branch, it would be possible to construct a fraud proof that could rollback the block.
I see three ways to tackle this problem:

Just add the full middle layer, so we can build a fraud proof. (480kB extra for the light clients)
Only allow merkle tree branches that are valid for the data availability FRI to be used in the fraud proof.
The FRI traces of the data availability FRI have to be included in the fraud proof. Now the withheld merkle branch can not be used in a proof, because it is invalid for the data availability FRI. Because no fraud proof is possible we can accept the block. Downside is that some light-clients might see the block as unavailable. (115kB extra for 54 traces in the proof)
Do an extra data availability sampling round on the full middle layer of 512kB. I estimate this to be 35kB for the light-client, but we no longer need to download the 32kB partial middle layer. Fraud proofs for this new data availability sampling round will be a little smaller than for the full data availability round, because the merkle trees are shorter. (3kB extra for the light client)

Option 3 seems preferable.

I have not found a formal description of it, first saw it in this post. Without PoW it is also possible to try multiple versions of the FRI to get a new sampling by changing one value on one level (not trivial, but doable). Besides if one FRI sample has a chance of 2^{-x}, you will need on average 2^{x+20} hashes to crack the FRI.

When 25% of the samples are available, only a small fraction (2^{-30}) of the light clients will be successful in collecting 15 samples (with infinite light-clients).
So there will be at least 25% of the samples available for the proof. These samples are all on the same polynomial, because they passed the data availability FRI. The builder only needs to proof it has more than 6.25% of the samples. So the 25% will be sufficient.