Avalanche RANDAO – a construction to minimize RANDAO biasability in the face of large coalitions of validators

jrhea · November 13, 2018, 11:46pm

I’m still digesting this, but i think I found a mistake in the second bullet of the Second Step:

v3 and v4 compute a1a2G, from which they compute a shared integer S34 = int(a3a4G) and publish S34G

should be:

v3 and v4 compute a3a4G, from which they compute a shared integer S34 = int(a3a4G) and publish S34G

(Sorry for the hand typed quote…for some reason when I used the quote feature the formatting was wacky)

rcconyngham · November 14, 2018, 9:19am

You are, of course, right. Fixed that and a couple more typos. Let me know if you have any questions, I’m sure I could do better in explaining some of this.

jrhea · November 14, 2018, 5:07pm

This is an interesting idea. I think I get the gist of it, you are essentially organizing the validators into a binary tree and performing a commit/reveal ceremony at each level of the tree. This essentially takes away the opportunity (or minimizes the likelihood) for a validator to bias the entropy bc if they choose not to reveal, then the other proposer can continue the ceremony without them.

Quick question:

If was to rerun your simulations…obviously, i need to change n and f accordingly, but it also looks like this line:

print zip(np.average(results,0),["Layer %d compromised" % i for i in range(8)] + ["Tree compromised"])

is looping through the layers of the tree and using the avg results of each trial to display the prob of the layer being compromised. I am assuming that instead of using range(8), I can just calc the number of layers like this: int(math.log(n,2)) + 1 . I just want to make sure I am understanding.

rcconyngham · November 14, 2018, 5:41pm

Yes, exactly. Let me know if you need any other help with the code. The by-layer probabilities are likely not so relevant for most purposes, the overall biasability given by

np.average(results,0)[-1]

is what you should look at first (by layer analysis might be helpful if you want to improve the tree layout though).

jrhea · November 14, 2018, 6:50pm

Cool thanks, that helps.

So I am able to run the code and understand it well enough to print out some diagnostic information when a layer is compromised/infiltrated and while looking at those scenarios it struck me that they should be easy enough to flag in a real world situation, right? If, for example, a coalition works together to bias the entropy and they recognize that they are in a configuration that would allow their bias to influence the tree, then they would all have to decide to not reveal, right? Wouldn’t this be something that we could detect and flag? If so, it seems like we could either:

at a protocol level force a re-shuffling that splits up the coalition so that it is futile to attempt biasing the tree (if the coalition is big enough, then they could stall the blockchain, but I’m thinking that level of centralization would affect any consensus scheme)
or just flag it as suspicious and let dapps know that the result is questionable

Am I way off-base here? It seems like it would cost a lot of money to indefinitely stall the chain.

rcconyngham · November 15, 2018, 8:58am

Yes, for sure, four or eight validators collaborating not to reveal should be an extremely rare event in normal circumstances. For example, a service that wants to notify users if something fishy is going on with the entropy could work like this:

observe the number of “no-shows” among single revealers, which gives you the base probability p_\text{reveal}.
For all higher layers, observe whether the number of revealed secrets is significantly less than 1-(1-p_\text{reveal})^l, where l is the layer number. If it is, than that’s a clear warning sign that an entropy manipulation attack is going on.

Of course, this does not work very well if only a single or very few very high value outcomes are manipulated.

Otherwise, it’s an interesting idea to use the non-reveal information to “separate” validators that might be controlled by the same person. But note that most of the validators in the attack don’t have to be active at all, the attacker just needs to know their secret. So this would only have an effect on attack that is done repeatedly over a long period. But it might have an effect when the attacker is trying to take over the complete beacon change, as in this analysis: RANDAO beacon exploitability analysis, round 2

poemm · November 15, 2018, 11:41pm

Excellent work with the Monte-Carlo analysis.

Interesting that at ratio f=0.5, the probability of bias is \approx 0.5. We observed similar behavior in the subcommittee scheme (linked in the original post), and did a combinatorial analysis to find that this behavior is because the cases of bias and no bias are compliments of each other.

Perhaps the Avalanche scheme is best used alone. But it is generic enough to also be used with the subcommittee scheme (linked above), having each subcommittee use Avalanche to agree upon their shared secret.

Again, great work and great ideas.

rcconyngham · November 16, 2018, 8:31am

Yes, I also found this in my committee analysis. I agree that it is connected to the fact that for f=0.5, the probability of being able to bias a secret is the complement of the probability of being able to know that secret.

Agree, I think it is an interesting mixture that it can be seen as only the scheme that is used to share secrets, but it also quite naturally leads to a reveal scheme by itself.

vbuterin · November 17, 2018, 2:29am

Huh? How can you compute G * ab knowing only G, G*a and G*b? Isn’t that literally a violation of computational diffie hellman hardness? Or am I missing something?

JustinDrake · November 17, 2018, 9:39am

That’s not the setup. One group (v_1 and v_2) knows a and G*b, and another group (v_3 and v_4) knows G*a and b. (Specifically, a = S_{12} and b = S_{34}.) A zkproof is then used for public verifiability that G*a*b was correctly constructed.

vbuterin · November 17, 2018, 11:43am

Ah, I see where I messed up. I thought S_{12} was a_1a_2G the elliptic curve point.

vbuterin · November 17, 2018, 12:41pm

This should be fairly simple using SNARKs as all the computations are very simple EC operations.

I think you actually don’t need a SNARK, you just need elliptic curve pairings. Proving that S_{12}G was correctly constructed can simply be done by checking e(S_{12}G, G) = e(a_1G, a_2G).

rcconyngham · November 17, 2018, 3:32pm

I also hope that the construction is actually simple enough that full-blown zk proofs won’t be needed. As far as I can see, an elliptic curve pairing could be used to show that a_1 a_2 G was correctly derived, which is a cool insight Unfortunately, for this construction to work, we need to keep a_1 a_2 G secret, and derive the shared secret of v_1 and v_2 from it, which is S_{12} = \text{int}(a_1 a_2 G) (where int is a suitable mapping from EC elements to integers). S_{12} should not be publicised (as it would allow anyone to derive all the further steps in the tree), only S_{12} G will be public. I think the pairing idea does not immediately work to show that S_{12} G was correctly derived, or am I missing something here?

Still really hope to find some better crypto that can prove correct derivation. SNARKs is just the fallback (but according to @barryWhiteHat the SNARKs needed here would be very simple).

barryWhiteHat · November 17, 2018, 7:42pm

Yeah should be just a hash function and then a few encryptions so the previous committers can decrypt the commited value. I rather the simple version too. But perhaps we will want something more complicated and then the snark will be more useful. It gives us scope to turn this into a much more reactive secret sharing scheme if need be.

vbuterin · November 17, 2018, 10:34pm

Right, I see. I still feel like there’s some simple elliptic curve construction that should be able to do what you want, but that conversion from a_1a_2G into an integer definitely seems like a challenging thing to prove.

Perhaps the solution might be to make the secret that goes one level up be a_1a_2, or is revealing that value (and hence revealing a_1 to the user that submitted a_2 and vice versa) unacceptable?

rcconyngham · November 18, 2018, 12:42pm

This may be possible, then v_1 and v_2 would use a public key encryption scheme* to share their secrets a_1 and a_2 with each other in the first round, thus creating a shared secret integer a_1 a_2. But now the problem is that they would have to prove that this encryption was done correctly, and they did not “cheat” their partner by not encrypting the secret that they committed to.
So it then boils down to proving that a piece of asymmetric encryption was done correctly. Not sure if this is a simplification?

Another effect of this is that, should v_2 pull out after v_1 has already shared their secret with them, then v_2 will still know the shared secret all the way down in the tree. I think one interesting property of the original construction is that “when you’re out, you’re out”: Once you fail to share at any given stage, you will not be “in” on the remainder of the secrets. This may be good (failing to share might mean you’re manipulating, so it’s better that you don’t know the secret) or bad (of course, it could be an advantage that a validator who was offline earlier can still come in and fill in later parts of the process, should other validators now be offline or maliciously withholding).

*) We can’t use the shared secret a_1 a_2 G for this encryption scheme because we want to publish this one later, and then everyone would know their secret which is counter to the point of this scheme. So we need a separate public key encryption scheme here.

burdges · November 18, 2018, 12:48pm

If we accept using zkSNARKs, then we could run both the ECDH and the subsequent Pederson commitment on JubJub, and then verify them with zkSNARKs on BLS12-381. I think ZCash have pushed verification of scalar multiplications on JubJub down to like 6 constraints, but the ECDH presumably takes more, so this sounds speedy but not actually “fast”, and verification costs some pairings.

Instead, we could go faster by fixing the height and building some tower of curves, so that each layer boils down to some simple two-signer VRF like constriction. It’d require doing elliptic curve arithmetic on many different curves, likely using Weierstrass arithmetic from the AMCL library, so ugly but still far faster than zkSNARKs, especially for verification.

We can likely make this go fast-ish without any exotic curve constructions using an additive blinding: We have commitments A_i = a_i G and the $i$th party reveals f(a_1 a_2 G) where f regards a curve point as an integer. Our 1st participant creates:

some B = b A_2 and B' = f(B) G,
NIZK that b A_2 is the ECDH of B and A_2
NIZK that (a_1 + b) A_2 is the ECDH of A_1 + B and A_2
NIZK that f((b + a_1) A_2) G's scalar is the curve addition of the scalars of f(a_1 A_2) G and f(b A_2) G.

Now our final NIZK is quite nasty because the arithmetic in the scalar must happen in the base field, but curve addition is easy enough that doing it sounds plausible.

In fact, we should likely replace f with some encoding of the curve point as multiple scalars, maybe like 14 or more since our addition formula has degree six, or maybe some CRT trick improves that. I think doing this encoding badly technically exposes bits from a_1 A_2, so maybe another layer of blinding is required.

rcconyngham · November 18, 2018, 7:20pm

Just trying to understand your post here. I think your f is what I called “int”, trying to be very suggestive. We agree that what we’re trying to do is

Given two commitments a_1 G and a_2 G, give a proof that \text{int}(a_1a_2 G) G was computed correctly, without revealing a_1 or a_2 or a_1a_2 G or, for that matter, \text{int}(a_1a_2G).

Is it right that you are basically trying to prove that statement by constructing a series of other DH secrets (b A_2, (a_1 + b) A_2) which can be published, and proving correctness along the way?

I think you meant B = b G?

Wouldn’t publishing (a_1 + b) A_2 mean that anyone could compute a_1 A_2 by subtracting (a_1 + b) A_2 - B, which kind of defeats the purpose? Or is the intention to do this step without publishing (a_1 + b) A_2 (just an intermediate step inside an NIZK)?

burdges · November 18, 2018, 9:14pm

Yes oops. If you meant b A_2 instead of B then yes it’s broken as written. I’d wanted to avoid verifying any scalar multiplications, or encoding f evaluation, inside the NIZK, but botched it. We should probably avoid scalar multiplications inside NIZKs if possible, but actually encoding f evaluation sounds tolerable, so we might repair my answer that way:

We cannot hide f(a_1 A_2) G of course. We might however hide b A_2 by combining the first and third NIZK and encoding f inside this NIZK, so probably several range proofs. Avoids scalar multiplication still…

Also, I think the DLEQ proofs in PrivacyPass suffice for the middle NIZK that proves an ECDH, but this combinations of the first and third gets way messier. I think the hash function evaluation could stays outside the NIZK here, so maybe this makes the curve tower the fastest and most compact solution.

kladkogex · November 19, 2018, 10:31pm

My interpretation is that an unbiasable RNG (common coin) is probably very much related to consensus in synchronous, because “revealing” very much means a synchronous broadcast.

From this perspective, it could be that unbiasable RNG for > 50% of bad guys is provably impossible, in a similar way to a synchronous consensus for more than 50% of bad guys.