Rate-limiting entry/exits, not withdrawals


#1

Currently, validators are able to enter and leave the validator set relatively quickly: each time a validator set transition happens, 1/64 of the validator set can switch in or out, and so in the normal case, every validator can switch out within a day. There is a much slower queue for withdrawing, to prevent validators from all withdrawing as soon as they perform a large-scale attack before they can be penalized. I argue that this current status quo is suboptimal, and we should set the withdrawal delay to a minimal constant (eg. 1 day) and instead use a much more conservative bound on entry/exit rate to serve the same function.

I claim this is a good idea for a few reasons:

  • CBC Casper compatibility: part of CBC philosophy is that the chain does not need to converge on one specific mechanism to determine canonical “finality”; applications can choose what thresholds they use. The current validator set change mechanism (1/64 every time the chain finalizes) requires the chain to have a canonical finality oracle.
  • Light client friendliness: having a much slower bound on validator set change makes it easier for light clients to skip ahead a relatively long distance at a time.
  • Ability to resume finalization via a surge of friendly deposits: if there are not enough validators that are online, a fixed rate limit on entry/exit, that does not require the chain to finalize to proceed, would allow altruistic ETH holders to swoop in and join the validator set to cause it to resume finalization.
  • Resistance to discouragement attacks: a discouragement attack (link to mini-paper here) involves an attacker (with >=33% stake) causing a medium amount of disruption to consensus, with the purpose of making it unprofitable for others to validate. This drives others to leave, making further attacks cheaper and more profitable. Making it simply not possible for validators to leave quickly (intuition: you joined the army, the fort is under attack, you have to stay around to defend it, being on-call for that sort of thing is the job description!) is the best known strategy to increase the cost of discouragement attacks.

The alternative is simple to implement:

  • Reduce the maximum number of validators that can enter or exit from 1/64 of the total to a much lower fraction, or even a square root or a constant (think: 1-3 months to rotate the entire validator set)
  • Repurpose the withdrawal queue into a entry/exit queue. Repurpose exit_slot as the slot when either (i) a deposit was processed or (ii) an exit was triggered. Post-exit withdrawal is now a fixed length of time (possibly extendable with proofs of custody)

Comparing LMD GHOST implementations
#2

Very cool. The main tradeoff that jumps out here is the increased cost validators bear in the case they want to stop validating - and it’s not even increased capital lockup costs, but rather the fact that they have to stay online and keep generating messages.

Are there any other disadvantages this scheme has over the previous one that I’m missing?

Ability to resume finalization via a surge of friendly deposits :

It seems like the answer is obviously no, but is there any case where “lots of deposits can enter without finality” actually makes things worse? I’m wondering if it might make some weird discouragement attack more powerful (or something), as this change expands the attackers set of strategies when finality is not being reached…


#3

Very cool. The main tradeoff that jumps out here is the increased cost validators bear in the case they want to stop validating - and it’s not even increased capital lockup costs, but rather the fact that they have to stay online and keep generating messages.

Capital lockup costs are unchanged; it’s the stay-online requirement that’s stronger. Though it’s worth keeping in mind that the current system also has a stay-online requirement, as you need to respond to proof of custody challenges.


#4

Sure. This strengthens the stay-online requirement though, as validators must be online per-epoch as compared to per-response-deadline.


#5

Agree! Though OTOOH validators can earn revenue throughout this extra time period, whereas in the current design they’re just sitting in limbo.


#6

I think this is interesting and it would seem to me that the pros outweigh the cons.

On the economics side, we’d essentially be taking staking from a money market type product to a smaller term product. Although, it sounds like in your proposal that’d be only a month or so? I’d say from a validator standpoint and attractiveness of investment, they’d only demand slightly more interest. The one caveat is that since the staking rate is floating, a validator technically does not know what his interest rate will be over period of lockup. Probably a smaller concern but one I’d say is worth considering.

What we’re seeing in open finance so far is that instant lending is not offering a very big return, because the borrower set is much smaller than the lending set. I see some fixed term products coming online that are hoping to increase that rate but I’d expect almost all validators to think of staking as a longer term investment anyways. As long as this is known up front I don’t really see much concern.


#7

I think the staking community would tolerate a reasonable delay in withdrawal time. Choosing to stake is not like having funds in a savings or checking account, it’s closer to a deposit into CDs & T-Bills. It’s understandable that a withdrawal would not be instant.

Off the cuff, a 1 to 2 day withdrawal period seems fine. 3-4 days, starts to feel like hardship. 5 days or longer would be quite the ask. Depending on the length, you might expect staking pool services to crop up that provide immediate withdrawal liquidity to customers as a service or for a fee (an unintended centralization vector perhaps? You could imagine stakers moving to a Coinbase or RocketPool for their “instant withdrawal times”).

Of course, all of these dynamics (reward rate, slashing risk, withdrawal time) will play into the decision to stake & weighed against alternative uses of the capital, or more specifically alternative uses of the Eth in the open finance world.


#8

What would happen here is that 1 day would be the “happy case” withdrawal time, and something like 3-6 months would be the “worst case” withdrawal time if everyone is trying to withdraw at the same time. I’m expecting the happy case to be the normal case, but still trying to come up with better ways to reason through that.


#9

1 day withdrawal time is completely fine, imo. The 3-6 months is a very long time if we look at it from a market perspective for obvious reasons. Though, I can’t imagine many scenarios (besides actual attacks) where everyone would be trying to withdraw at once.

In saying this, I speculate that a lot of the people who will be staking are those that are currently just holding their ETH in cold storage so this long withdrawal period may not be an issue for them.

I also imagine that there will be some sort of derivative product that tracks the underlying ‘withdraw in progress’ ETH so that people could ‘offload’ their stake before the withdrawal is finalized.

Also, do validators have to wait until the withdraw is finalized before they can re-stake their ETH?


#10

I did a simulation of actual withdrawal times assuming two simplifications: (i) a Zipf’s law (ie. power law with power=1) distribution of validator deposit sizes, (ii) each validator has a 1/D chance of deciding to start exiting any given day.

Here’s the code: https://github.com/ethereum/research/blob/2d3ed6e42087d5b14cdf107c897e8d3e5db3ee7a/exit_queue_tests/exit_queue_tester.py

The results were fairly bimodal: if you set a rule that the entire validator set can withdraw after N days (ie. 1/N of the set per day), and D > N, then we get validators being able to withdraw almost instantly. Here’s N = 180 and D = 360:

Total delays in days
21759:  11.318 (min 12.534)
10879:  6.528 (min 6.267)
5439:  3.643 (min 3.133)
2719:  1.995 (min 1.566)
1359:  1.088 (min 0.783)
679:  1.010 (min 0.391)
339:  0.989 (min 0.195)
169:  0.836 (min 0.097)
84:  0.942 (min 0.048)
42:  0.866 (min 0.024)
21:  0.925 (min 0.012)
10:  0.928 (min 0.006)
5:  0.933 (min 0.003)
2:  0.943 (min 0.001)
1:  0.952 (min 0.001)

(delays are sometimes lower than the minimum because the minimum is calculated based on the target total deposit size which doesn’t perfectly match the actual one)

Now here’s D = 180, N = 240 (only the top 5 rows for compactness):

21759:  12.667 (min 12.534)
10879:  8.149 (min 6.267)
5439:  5.059 (min 3.133)
2719:  3.866 (min 1.566)
1359:  3.003 (min 0.783)

Now D = 180, N = 180:

21759:  26.618 (min 12.534)
10879:  25.083 (min 6.267)
5439:  23.793 (min 3.133)
2719:  22.048 (min 1.566)
1359:  22.355 (min 0.783)

And D = 180, N = 120:

21759:  76.153 (min 12.534)
10879:  73.770 (min 6.267)
5439:  74.317 (min 3.133)
2719:  74.556 (min 1.566)
1359:  74.288 (min 0.783)

This makes me think that things will be fine but we would benefit from some explicit policy to discourage validators from exiting too quickly. Perhaps have the exit queue favor “older” validators in some way.