There is indeed a problem when block data is available and the merkle tree for the block data is correct, but there exists a withheld merkle branch from the extension part that is not on the polynomial. In that case, it is not possible to construct a fraud proof (because it is to large) and we can not accept this block as valid, because using the withheld merkle branch, it would be possible to construct a fraud proof that could rollback the block.
I see three ways to tackle this problem:
- Just add the full middle layer, so we can build a fraud proof. (480kB extra for the light clients)
- Only allow merkle tree branches that are valid for the data availability FRI to be used in the fraud proof.
The FRI traces of the data availability FRI have to be included in the fraud proof. Now the withheld merkle branch can not be used in a proof, because it is invalid for the data availability FRI. Because no fraud proof is possible we can accept the block. Downside is that some light-clients might see the block as unavailable. (115kB extra for 54 traces in the proof) - Do an extra data availability sampling round on the full middle layer of 512kB. I estimate this to be 35kB for the light-client, but we no longer need to download the 32kB partial middle layer. Fraud proofs for this new data availability sampling round will be a little smaller than for the full data availability round, because the merkle trees are shorter. (3kB extra for the light client)
Option 3 seems preferable.
I have not found a formal description of it, first saw it in this post. Without PoW it is also possible to try multiple versions of the FRI to get a new sampling by changing one value on one level (not trivial, but doable). Besides if one FRI sample has a chance of 2^{-x}, you will need on average 2^{x+20} hashes to crack the FRI.
When 25% of the samples are available, only a small fraction (2^{-30}) of the light clients will be successful in collecting 15 samples (with infinite light-clients).
So there will be at least 25% of the samples available for the proof. These samples are all on the same polynomial, because they passed the data availability FRI. The builder only needs to proof it has more than 6.25% of the samples. So the 25% will be sufficient.