It seems like these proofs can be extended to remove the in-protocol (safety) fault tolerance threshold for non-validator nodes. In this modified version, nodes detect safety in the following way:
- Nodes choose a fault tolerance threshold 0 < t < 1. They consider a block justified_t if at least 1/2 + t/2 of the validator set votes for it (in any of the N slots that include or follow its slot).
- If there are N + 1 justified_t blocks in a row, the first block in this sequence is considered finalized with fault tolerance t.
Note that 1/2 refers to half of the weight of the entire validator set. 0 < t as otherwise safety could be detected on two chains, each with 1/2 of the validator set.
The safety proof follows a similar structure to the original proof (just replacing 2/3 with 1/2 + t/2), so we just consider each case:
Case 1: If [j[i+1] ... j[i+1] + N)
is fully inside [slot(b1) ... slot(b1) + 2N)
, then there would be 1/2 + t/2 of validators voting for something in the b2
chain intersecting 1/2 + t/2 of validators voting for something in the b1 chain, implying at least 1/2 + t/2 + 1/2 + t/2 -1 = t violated (1).
Case 2: If j[i+1] >= slot(b1) + 2N
, then 1/2 + t/2 of validators would have made a vote with a span surrounding (slot(b1), slot(b1) + 2n)
and 1/2 + t/2 of validators a vote with a span within that same range, meaning at least t violated (2).
Case 3: Now consider the case where slot(b1) + N < j[i+1] < slot(b1) + 2N
, so [j[i+1] ... j[i+1] + N)
is partially inside and partially outside [slot(b1) ... slot(b1) + 2N)
. There are now two subsets of validators: a set v1, which made votes surrounding the span (slot(b1), slot(b1) + 2n)
and a set v2, which made votes inside of this span. The combined size of v1 and v2 is 1/2 + t/2, meaning at least t of them also participated in the b1
chain. These validators therefore violated conditions (1) or (2), or some combination of both.
The above changes seem to make the plausible liveness proof more complicated (as expected). Nodes who choose t > 1/3 may have t - \epsilon equivocate, and as 1 - t - \epsilon < 2/3 < 1/2 + t/2, it may be impossible for a large enough quorum to form to finalize any more blocks. However, for nodes that run at t \leq 1/3, the proof should remain the same.
Furthermore, I think this scheme changes the forkchoice that nodes should run. Namely, starting from the highest justified block may not make sense, as different nodes have different opinions on what is justified. This seems fixable by having nodes start their forkchoice from their last finalized block, but again this might complicate the plausible liveness proof…
Finally, I’m not sure how this interacts with how the current approach has the chain store the last_justified_slot
and only include attestations that attest to this. This approach seems to make it impossible to remove the fault tolerance threshold for validators (as we need them all to see the same last_justified_slot
)…
As an interesting side note, this forkchoice and the above safety oracle (N + 1 justified blocks in a row) seem to fit fine into the CBC Casper framework. That is - I think the “mini-fault tolerance crisis” was a property just of the class of estimators we were considering, rather than CBC as a whole