Exploring Worst Case Scenarios in ETH 2

The economics of staking has been a hot topic lately. @econoar has done a good job putting the incentives in concrete terms for everyone to digest. See his post on Economic Incentives for Validators and @vbuterin’s post on Average-case improvements to reduce validator costs for some background. In this thread, I want to encourage people to explore the improbable worst case scenarios that could cause the protocol to fail.

Validator Withdrawal Delay Period

I’ll go out on a limb and kick-off the discussion by focusing on the withdrawal delay that validators will be subjected to when they want to exit from their validator role. The crux of the issue concerns the tradeoff between the value of a short withdrawal period to the validator vs the security benefits the network enjoys with a longer withdrawal period.

@vbuterin and others have explored the idea of a fixed withdrawal period, a withdrawal period that is proportional to the perceived risk of an attack, etc. My concern lies in the fact that I haven’t seen many public discussions that explore pathological scenarios that could break the protocol. Here is my attempt at fabricating one.

A plausible scenario

We all want to believe that Ethereum 2 will be wildly successful. One certainty that comes with success is mainstream adoption, institutional investors, new financial products, etc. This is great, but with it comes the very real possibility that entities that hold (or have access to) large sums of ETH will want to earn interest off of it. We know that Coinbase, Binance, etc hold large sums of ETH in their dungeons. It is logical that they will decide that they want to dedicate some percent of the ETH that they are holding to staking. In fact, I am sure they are already talking about it.

That virtually guarantees that we will have a healthy amount of ETH to secure the chain. This sounds great, right?

Not so fast.

The problem

Given the issues that exchanges have had in the past, it isn’t difficult to imagine a scenario that causes a Mt. Gox event to happen. The difference this time is that the network is secured by stakers. If enough people panic and decide to remove their funds from the exchange, then this could cause one of these worst case scenarios where validators begin to exit en masse.

Wait. Why is this an issue?

This demonstrates a plausible scenario that could compromise the security of the network.

How does compromise the network?

With more validators exiting the network than entering the network…this could lead to a situation where more than 1/3 of the validators have an uptime less than the weak subjectivity period (the time it takes for a block to finalize).

Is there any hope?

Of course.

The point I am trying to make is that we should talk through these scenarios and simulate different ingress and egress distributions to see how the protocol is affected.

I’ve read discussions, but I haven’t seen a finalized plan for throttling the validator exit rate.

If there is one, then what is it?

How safe are the edge cases?

Are we certain that it isn’t possible to have 1/3 of the validators with an uptime less than the weak subjectivity period?

Final Thoughts

There are a lot of brilliant people working on different aspects of these problems, but I am disturbed at people’s hesitation to share (even on this forum) until they have had their ideas peer reviewed and formally written up. We need to get over this fear of being wrong and be more willing to receive constructive criticism.

On that note…if anyone has already worked out a solution, ( or if I am just flat out wrong ), then please let me know. I am curious to hear the explanation.

6 Likes

Absolutely correct line of thought. Another thing that our small crew of law people is working on are various legal attack vectors that can be used by and/or against validators to essentially lock up staked funds, or otherwise put a cloud on their title. This can have an opposite worst case effect to the one you describe here in that it essentially forces validators to keep their stake, which is sub-optimal from the perspective of the validator.

Relatedly, the same off-chain legal processes that can enjoin validators from removing funds can theoretically be used to force validators to remove staked ETH. The worst case scenario here is a legal removal order (injunction, seizure, forfeiture, choose your poison …) that conflicts with the protocol, which leads to an outright governance conflict. Off-chain governance norms require one thing; on-chain governance protocols require another thing. That’s a disaster waiting to happen. No way to anticipatorily code around it, so the only way to prepare for it is to … prepare for it.

Working on an analytical piece right now, The Legal Structure of PoS Blockchains. The intercourse between exchanges, validators, devs, and users is central to that story, in manifold ways. If there are issues that you think are priority areas that should be addressed, pls share what you think those are.

6 Likes

The worst case scenario here is a legal removal order (injunction, seizure, forfeiture, choose your poison …) that conflicts with the protocol, which leads to an outright governance conflict.

Thank you for sharing this. Good to know that this is being looked at.

Working on an analytical piece right now, The Legal Structure of PoS Blockchains . The intercourse between exchanges, validators, devs, and users is central to that story, in manifold ways.

Brilliant. I would love to read this when you are ready to share.

If there are issues that you think are priority areas that should be addressed, pls share what you think those are.

I will (and hopefully others) think about other issues that need to be considered and post them here. I have to admit that the legal aspect of this caught me off-guard. It sounds obvious now that you mention it, but I just hadn’t entertained that line of reasoning. I will think on this more.

3 Likes

I would like to share a potential mitigating solution to the problem of a rapid increase in validator exits.

A lot of clever ideas have been shared that focus on limiting the rate of validators exiting, but I haven’t heard anyone mention one method that (at a surface level) seems like it would be worth exploring. This potential solution is centered on the idea of bootstrapping new validators with the state or checkpoint of existing/exiting validators.

In order to ensure the security of the network, 2/3 (or more) of the validators must have an uptime >= the weak subjectivity period. If I am not mistaken, this is why we impose a withdrawal delay on validators. Vitalik (and others) have mentioned (several times) the need for new validators to be provided a checkpoint that the user can provide from a trusted source (e.g. EtherScan). This is fine, but it would be nice if (at a protocol level) we could somehow guarantee that, on average, a new validator’s bootstrapped uptime is no worse than the current uptime average among current (or exiting) validators.

Potential Solution 1
It would be interesting if we could pair outgoing validators with incoming ones and allow the new validators to be provided a checkpoint by the outgoing validator. In this case, anonymity is not an issue since that validator is leaving; however, this causes issues when the # incoming validators > # outgoing validators. Perhaps someone can come up with a clever modification. :thinking:

Potential Solution 2
Another solution would be to have the validator query the PoW chain for stored checkpoints that validators are responsible for voting on and keeping current. This would be a great way to utilize the PoW chain’s objectivity to bootstrap each new validator. This, of course, creates a strong reliance on the existing PoW chain in order to bootstrap each new validator with a trustworthy checkpoint.

I am curious what @JustinDrake, @vbuterin and everyone else thinks about this. Thanks.

1 Like

@cleanapp did you happen to see this?

You might have some good input

Thank you so much for sharing. Yeah – saw this come across the wire earlier. Will be preparing a response for sure on behalf of the CleanApp Foundation, given the enormity of the stakes. As it comes together, will definitely post drafts and welcome any and all suggestions for improvement.

1 Like

Would also love to read your work on the legal implications of POS.

2 Likes

Another scenario to consider is the possibility of turning the beacon chain, pow chain into microservices. This could become a serious centralization concern. These microservices would be a great value to validators that want a light weight setup. It is also very likely to be implemented by organizations with large amounts of ETH to stake. If they can connect N validators to a single beacon chain or to a reverse proxy that load balances connections to beacon chains, then we are arguably in the same boat we are today with mining pools

1 Like

Thanks for starting this discussion.

I thought the purpose of these delays is to enable us to slash the validator if any malicious activity from the past is detected after she decided to withdraw?

Speaking of the problem of mass validator exits:

If we have 1024 shards, then we need (I believe) ~150,000 validators to be sure the system is secure (under the honest majority model) and operates smoothly. What happens if this number drops to e.g. 50,000? Does the shards just have to wait more to cross-link (something kind of equivalent to PoW main chain clogging), or…?

Hey @jrhea, thanks for starting this discussion.

So currently with the parameters we have in the event of a mass exit, it would take ~3.3 months in the average case for a validator to successfully exit, this should solve any weak subjectivity concerns given that the last finalized checkpoint would most likely be before the validator exits.

The current spec elaborates on it here:

@MihailoBjelic
In that case the committee size would re-adjust to account for the drop in the number of validators. Although with 50,000 validators you would get an average committee size of 50. The minimum safe threshold for committee sizes is 111, so I would guess in an extreme case like this you would have multiple commitees attesting to the state of a shard which would lead to a longer finalization period for crosslinks. Although this case hasn’t been really elaborated elsewhere

1 Like

Thanks @nisdas.

I don’t think it makes sense to reduce the committee size below the 111 validators/shard threshold, because committees basically become “useless” then. Instead, I would lock the committees size once they reach the threshold and then start to assign the same committee to multiple shards. That could do the job unless the drop is really extreme (validators hardware requirements are relatively low so we can’t expect the same committee to validate 10 shards at the same moment). This deserves more formal and in-dept analysis IMHO.

2 Likes

Sorry for the delay. Ya, I believe that they are both true. Here is a VB quote from another thread:

Thanks @jrhea, you’re right, both are definitely true.

1 Like

Hi – it took much longer than anticipated (and still not done), but here is the first part. You’ll see that it’s laying the groundwork for the next part of the puzzle, which will be analysis of the legal status of bETH (as well as the contract-ish linkages between stakers and other network participants). But hopefully it is constructive and helpful.