RNG exploitability analysis assuming pure RANDAO-based main chain

vbuterin · April 24, 2018, 8:40pm

Let us suppose that, for pure proof of stake, the main chain uses a simple RANDAO-based RNG. That is, the main chain has an internal state, R, and every validator, upon depositing, commits to the innermost value in a hash onion, H(H(H(.....S.....))); the state stores the “current commitment value” C_{V_i} for each validator V_i. To create a block, a validator V_i must include the preimage of the C_{V_i} value (call it H^{-1}(C_{V_i})), and the following state changes happen: (i) C_{V_i} := H^{-1}(C_{V_i}), (ii) R := R\ xor\ H^{-1}(C_{V_i}); that is, the current commitment value for that validator is set to the preimage, so next time the validator will have to unwrap another layer of the onion and reveal the preimage of that, and the preimage is incorporated into R.

This has the property that when some validator gets an opportunity to propose a block, they only have two choices: (i) make a block, where they can only provide one specific hash value, the preimage of the previous value they submitted, or (ii) not make a block. The validator knows what the new R value would be if they make a block, but not the R value if they don’t (as it’s based on the next validator’s preimage, which is private information); the validator can either choose the value that they know or a value that they don’t know, the latter choice coming at a cost of their block reward B.

Now, suppose that this value is used as an RNG seed to sample a large set of N validators (eg. 200 validators per shard, multiplied by 100 shards, would give N = 20000). Suppose the reward for being selected is S. Suppose that a given participant has portion p of participation in both the main chain and the shards; hence, they have a probability p of creating the next block, and they get an expected N * p slots selected each round, with standard deviation \sqrt{N * p} (it’s a Poisson distribution). The question is, when the validator gets selected, could they profit by skipping block production?

The validator knows ahead of time how many slots they would get selected for if they make a block; this value is sampled from the probability distribution with mean N * p * S and standard deviation \sqrt{N * p} * S. By not making a block, they sacrifice R, but get to re-roll the dice on the other distribution and get an expected N * p * S. By the 68 - 95 - 99.7 rule, the advantage from re-rolling will only exceed three standard deviations 0.15% of the time. If we are ok with exploitation being profitable at most 0.15% of the time, then we want R \ge 3 * \sqrt{N * p} * S. If main-chain block rewards are comparable to total shard rewards, then R \approx N * S \gt\gt 3 * \sqrt{N * p} * S even for p = 0.5 (above p = 0.5, re-rolling the dice actually once again becomes less profitable because there’s less room to get lucky in some sense). Hence, in a simple model there is little to worry about.

One further optimization (thanks @JustinDrake) is that if the validator sampling is split into shards, where in each shard the ability to validate is dependent on a proposal being available in that shard, then we can make the random source for that shard be R xor’d with a preimage reveal from the proposer. This ensures that the validator making the main-chain block can only see the expected results for the portion of shards that they are a proposer of (in expectation, portion p). This restricts the attacker’s analysis to those shards, with standard deviation of the reward \sqrt{N * p * p} * S = \sqrt{N} * p * S; this makes lower main-chain rewards even safer.

We could look at more complex models, where we take into account the possibility of validators skipping one round because they see that they have both proposal slot 0 and slot 1, but if they wait until slot 1 then they also have slot 0 for the child of slot 1. These situations can be mitigated by penalizing absence of proposers; in any case, however, if total main-chain rewards are comparable to total shard-chain rewards, shard-chain rewards are a pretty negligible contribution to the attacker’s profitability calculus.

In short, I really don’t think we need fancy threshold signature randomness or low-influence functions or any other fancy math; a simple RANDAO mechanism as described above with total main chain rewards being comparable to total shard chain rewards is enough.

clesaege · April 25, 2018, 1:58pm

That’s only valid under the assumption that there is no collusion with the next validator. This may not hold in practice. Validators can publish their preimages if they wish and everyone can verify them.
To avoid coordination (collusion), we could set up a early reveal penalization scheme such that everyone knowing H^{-1}(C_{V_i}) of a validator can publish it to have the deposit of the validator steal a part of the validator deposit and have another part burnt.
This way, if some validator were to publish its preimage early, everyone could steal ETH from it.
Getting a transaction showing an early preimage reveal included in a block, may in practice, only be doable by the previous validator (because it would prefer to make this transaction itself than including the transaction from another party). But that still prevent trustless reveal of one’s preimage to the previous validator.

But, this preimage penalization scheme could be bypassed by validators colluding. Colluding validators could make a smart contract which would penalized them (by burning part of ETH staked in a smart contract), if they include an “early reveal penalization” transaction in a block they make, thus parties wanting to collude with this validator would know that they can safely reveal it their preimage as soon as it is its turn to make a block.

Again, if there is collusion, multiple validators could skip their turn in a row if this makes them being drawn more.

vbuterin · April 25, 2018, 2:14pm

Agree! I actually proposed this exact thing a few years ago: On Anti-Pre-Revelation Games | Ethereum Foundation Blog

clesaege · April 25, 2018, 2:22pm

Yeah, even if it can be bypassed, it’s better than nothing as bypassing it by the exposed solution would induce a capital lockup opportunity cost.

JustinDrake · April 25, 2018, 3:25pm

If we are confident we want RANDAO for sharding and pure PoS should we consider including it in sharding phase 1?

Simplicity and forward compatibility: The scheme is fairly simple to implement. It feels the costs of moving to RANDAO midway through the sharding roadmap (disruption) outweigh those of implementing it upfront (delayed initial launch).
Design flexibility: Moving away from blockhashes expands the sharding design space. RANDAO is “offchain-friendly”, for example allowing low-variance 5-second periods with small proposer lookahead.
Reuse and network effects: RANDAO in sharding phase 1 would lay the groundwork for the pure PoS block proposal mechanism, and allow the wider dApp ecosystem to access a high-quality semi-enshrined alternative to blockhashes.

In short, I really don’t think we need fancy threshold signature randomness or low-influence functions or any other fancy math

It would be interesting to get feedback from projects going in other directions (e.g. Dfinity, Cardano, Algorand). One rebuttal may be that RANDAO doesn’t have “guaranteed output delivery” (GOD) at every period. A single participant skipping (i.e. not revealing his preimage) would cause all shards to freeze for 5 seconds. This central point of failure may be a DoS vector, or at least a quality-of-service liability, especially in the context of a bribing attacker.

One mitigation is to heavily penalise skipping, but that may introduce a griefing/discouragement vector. Another mitigation is to XOR n > 1 preimages at every round, but that increases grinding opportunities and messaging overhead.

Another direction I’m exploring is to have a hybrid scheme which is RANDAO by default, and if a participant skips then a committee can quickly fill the gap (e.g. with a threshold scheme to force-reveal the preimage) to nullify both the grinding and bribing attacks.

vbuterin · April 25, 2018, 4:12pm

The problem is that to work optimally the main-chain RANDAO should be something that gets included in every block, which we can’t guarantee if it’s purely a layer-2 mechanism.

A single participant skipping (i.e. not revealing his preimage) would cause all shards to freeze for 5 seconds.

That depends entirely on how the randomness is used. If we have a rule that there is one shard collation per main chain block, then yes, one main chain participant skipping causes the shard waiting time to double. But if the shard collation timing is separate from the main chain blocks, and it uses an RNG source from some reasonably recent main chain block hash (eg. 5 blocks ago), then it’s much less problematic.

“Guaranteed output delivery” is really just another way of saying “guarantee that the main chain will grow”. Of course the main chain will grow; the question is just how quickly and on how fine grained a level are we relying on each individual main chain block?

clesaege · April 25, 2018, 6:42pm

I agree that this scheme, if not perfect is better than the one with low-influence functions described in the mauve paper (https://cdn.hackaday.io/files/10879465447136/Mauve%20Paper%20Vitalik.pdf, p5). Because a low influence function increases the difficulty for one party to “reroll” the random number as their contribution to the number alone is probably not enough to change it. However, it increases the effect of a collusion attack.

Low Influence Function Issues

If we make a random number using the proposed low influence function of the mauve paper (i.e. making a majority vote on each bit of the seed), a small part of the validators can give the same random number contribution (their preimage) making it highly likely to have the seed be this random number contribution.
For example assume there is 100 participants and 20 of them form a cartel and collude by having 1111111… as their preimage.
If the 80 others don’t colluded, the sum of their vote for each bit will be a binomial distribution of mean 40 and standard deviation 4.47.
Adding that to the 20 colluding participants (mean 20, standard deviation 0), the amount of bitwise vote for 1 will have a mean of 60 and a standard deviation 4.47, so it’s highly unlikely (p=0.016) for a bit to get more 0 votes than 1 votes.
So the low influence functions can be attacked easily by a minority of participants if they can coordinate.
They then could choose their preimage such that we enter into a loop where only cartel participants are drawn.

Topic Proposal

The topic proposal can still be attacked by collusion of the stakers as described in my first post. The problem with those attacks is that they are self-reinforcing, even if initially not revealing due to a collusion (or simply having a big stake) only gives you a really small edge, you will then be drawn slightly more than the others stakers. This then means that you (as a large staker or a cartel) will then be able to make slightly more blocks, thus getting a slightly higher edge due to the cases where it’s better not to reveal.
So the edge you get will grow faster and faster due to this positive (from the cartel point of view) retro-action. So from a seemingly small starting edge, we can end up to a cartel getting to make most of the blocks.

I think if we want a random number generator which can’t be manipulated, we really need to use more advanced techniques like threshold signatures (since we are in the honest majority model anyways) or sequential proof of work.

vbuterin · April 25, 2018, 9:34pm

I think there are ways that we could work on the RANDAO game in order to make it less long-run exploitable, by making it so that there is no net payoff to getting extra validation slots. Here is how this could be done.

Suppose that a validator on average gets selected once per T blocks, and gets a reward of R. A “naive RANDAO” gives them a reward of R for participation, and 0 for non-participation. A slightly less naive construction detects non-participation (ie. when you get selected and don’t make a block), and gives a reward of -R in that case.

We can move part or all of the reward into a passive reward that increments over time, and rely almost exclusively on penalizing non-participation. Suppose on average validators show up p of the time (eg. p = 0.95). A validator would get a reward of p * R every T blocks (this can be calculated via deposit scale factors like in Casper FFG), then if they get selected, they get a reward of (1-p) * R if they make a block, and are penalized (1 + p) * R if they no-show. This achieves the following properties:

Expected long-run reward structure same as before (before = giving +R for participation and -R for no-show every time a validator is selected)
Same incentive as before to make a block as opposed to no-showing, not taking into account RNG manipulation
For an average-performing validator, no incentive to manipulate randomness

Another way to think about this is, we remove incentives for RNG manipulation by making a system where instead of rewards coming from taking advantage of opportunities to make blocks, rewards mostly simply come from just having your deposit there, and penalties come from not taking advantage of opportunities to make blocks.

JustinDrake · April 26, 2018, 8:39am

Why should the RANDAO be included in every block for sharding? (For pure PoS this is something that is easily achieved by construction.)

I’m optimistic that the random beacon can be “offchain native”, i.e. be produced and immediately consumable offchain. This would allow for shards to have low variance 5-second periods, have a small lookahead, and not have to worry about timing of the main chain.

At a high level we want an offchain game to generate a random beacon where the outputs enjoy high offchain finality. For RANDAO we have a simplified consensus game which can be “hardened”:

There are only small “headers” (the beacon chain) and so no hard data availability problem for “bodies”.
The beacon chain is “quasi-deterministic” in the sense that the game is predetermined modulo skipping.
By notarising the beacon chain (very similar to what was suggested here for the shards themselves) we can greatly tame offchain forking. This is especially true if we assume a global clock and honest majority.
Using PoSW we may be able to “enforce” the clocking, further reducing forking opportunities.
The notarisation process in 3) can be complemented with onchain notarised checkpoints (honest-majority or uncoordinated majority). Even with relatively low-frequency checkpoints (say, one checkpoint every 10 minutes) the shards can still progress (with high confidence) using offchain randomness freshly produced every 5 seconds.
We may be able to remove forking altogether (producing a deterministic beacon chain, basically killing the consensus game) with a “patch-the-hole” threshold scheme complementing RANDAO. I’ll write a post shortly. The basic idea is that a committee can force-reveal the preimage of a participant that doesn’t show up.

I’m leaning quite heavily towards separating collation timing from main chain blocks. Having said that, how does using a reasonably recent main chain block hash work?

If the shards have 5-second periods and some main chain block takes one minute to produce (because of variance), does this mean that the shards will be stuck with the same RNG output (the one 5 blocks ago) 12 times in a row?

In addition to a possible additional “buffer” lookahead, does this introduce a grinding attack because of shard-to-main-chain synchronisation? That is, the concept of “5 blocks ago” is subjective from the point of view of the shards, so at every period there is an ambiguity as to which blockhash is the one that is 5 blocks old.

vbuterin · April 26, 2018, 12:55pm

In addition to a possible additional “buffer” lookahead, does this introduce a grinding attack because of shard-to-main-chain synchronisation? That is, the concept of “5 blocks ago” is subjective from the point of view of the shards

Ah sorry, to be clear, by “5 blocks ago” I actually mean “the start of the previous period”. The shards always know what the main chain period is, because the DAG structure enforces it (the first shard block in a given period needs to reference the main chain block hash at the start of that period). That said, if we don’t have this tight linking, then I would be inclined to say, just set up a separate RANDAO in each shard. RANDAOs are not exactly complex or expensive to set up and maintain, so there’s not that much of a loss from this.

rcconyngham · July 18, 2018, 3:01pm

Unearthing an old thread here, but I have a question about this: It feels like people also always have the choice to create blocks with arbitrary numbers of skips as well, right? (as in: Oh I didn’t see all those other blocks being generated, sorry )

Anyway, there might be incentives to always generate lots of these skip blocks, just in case the main chain will also create one or more skips and my skip chain might be longer?
For example, say the current chain is [A, B, C, D], with current next producer E. Then whoever would be the second block producer X after C could produce the alternative chain [A, B, C, skip, X]. Now if E does not produce a block, this alternative chain is just as long as the “main chain” and the next block producers would have to decide which one to choose? This might give more choice to influence the RNG than what was proposed above?
[In the worst case, E = X, in that case E actually has the choice between three options]

Does this make sense or is there some mechanism built in to stop this or do skips work in an entirely different way?