Sticking to 8192 signatures per slot post-SSF: how and why

Clarification of Premises

To keep the discussion fair and fundamental, it’s important to clearify the difference in circumstances between when we started discussing PoS (Casper FFG) and committee-based finality, and the present.

  1. Due to the convenience of liquidity for restaking/LSD tokens, pools have become incentivized towards a winner-takes-all tendency.
  2. The emergence of temporary and inexpensive block space due to Proto-danksharding.

In these changing circumstances, the trade-offs of the proposed philosophical pivot approaches (1), (2), and (3) in VB’s post are:

(1) The protocol becomes very simple and efficient. The number of nodes itself decreases, and the pool management methods may become complicated depending on the situation.

(2) The protocol itself maintains its complexity. The efficiency of block space increases, and censorship resistance is maintained as it is.

(3) The complexity of the protocol itself increases compared to before. The barrier to entry for staking is reduced.

Overall, it seems that the main challenges are improving the situation where the 32 ETH upper (not lower) limit is no longer meaningful, and it’s hindering efficiency in block space and bandwidth, and addressing the current concentration in pools.

Clarification and Resolutions of the Problems

Problem 1: DVT should be made simple, excluding cryptography.

Fundamentally, the ‘go all-in on decentralized staking pools’ approach of philosophical pivot option (1) aims to balance solo staking with large-scale pools. While this may sacrifice liveness, it ensures that safety is always maintained. This characteristic aligns with switchable PoW mining pools. It could be said that this is a direction already proven to work.

One question arises regarding the link mentioned: Are technologies like MPC and SSS really necessary for DVT? In my view, what’s essential is the ability to switch pools, and for that, only the following two components are necessary:

  1. The ability to invalidate a signing key within two blocks.
  2. The inability of pools to discern whether a solo staker is online.

As it’s been discussed a lot in the Ethereum PoS discussion space, the safety of PoW is secure because if someone attempts to control 51%, miners can notice that a fork chain is being mined by a pool and switch pools accordingly. In the current PoS system, since the signing key is delegated to the pool, one might become an attacker against her will and just watch herself being slashed. To change this, the holder of the withdraw key should be able to immediately cancel their signing key if they discover it’s being used for double voting. In implementing this principle, a finality of about 2 slots, rather than SSF, seems preferable.

Once this principle is introduced, as long as pools cannot determine whether solo stakers are online, they would be too fearful to attempt double voting. This would allow solo stakers to go offline. Of course, there’s a possibility that pools might speculatively attempt double voting, but as with the current PoS system, attackers would not recover after being slashed, making it not worthwhile.

Problem 2: Few Solo Stakers

The reason for the lack of solo stakers is their lack of confidence in their network environment, even before considering the risk of slashing. Moreover, an increase in solo stakers on AWS is essentially meaningless from the perspective of decentralization. These issues can be resolved by the specific measures mentioned in the above DVT (problem 1) section, which is to create a state where solo stakers are indistinguishable as being online or offline and are delegating to pools.

An important point is that solo stakers going offline does not guarantee absolute safety from being slashed; it only significantly reduces the likelihood of being slashed. This characteristic is a major difference from the current solo staking situation, where going offline often leads to a high probability of being slashed.

Problem 3: Block Size

One reason for the large block size in PoS is the presence of a bit array indicating whether each signer has signed. (The signatures themselves can be discarded some time after block approval.)

The reasons for this array include:

  1. To prevent rogue key attacks.
  2. For use in slashing.
  3. For reward calculations.

For 1) and 2), like signatures, it should be no problem to discard them after they are sufficiently approved. However, with 3), especially now with Proto-danksharding, it seems possible to balance reward calculation and block space reduction by publicly displaying the Merkle tree for a certain period and then discarding it.

Specifical steps:

  1. Divide the bit array, which flags the signers, into several parts and make them leaves of a Merkle Tree.
  2. Place all the leaves of the Merkle Tree in a blob.
  3. Include only the Merkle Root in the block.

Each person can use a Merkle proof to prove their rewards, and with Recursive ZKP, all can be combined into a single withdrawal transaction. This could potentially reduce the size from 8192 bits to about 256 bits. If any wrong root, the majority of the validators always can ignore the block.

Problem 4: Censorship Resistance

It is discussed in this thread that solo stakers, who produce blocks without going through pools, are key to censorship resistance in the core protocol.

Personally, I believe this is a drawback of stakers not being able to switch pools. Once they can switch as per the above procedure, it simply becomes a matter of stakers not choosing pools that have implemented censorship programs. Generally, addresses that are censored are likely to pay higher fees to get through, and market principles should ensure higher profits for pool operators who do not censor. Therefore, the approach (1) can maintain a degree of censorship resistance.

Problem 5: MEV

This is a problem that wouldn’t be discussed so lightly if it were easy to solve. However, as long as the Rollup Centric Roadmap continues, it seems appropriate to support Layer 2 solutions that aim for Based Rollup in the protocol, if there is an opportunity to do so.

TL;DR of my personal opinion

Adopting Approach 1 (DVT) with switching pools is the easiest way to support solo stakers and to minimize the block size. It actually leaves almost no problem. The pool managers can not use the stake to perform a 51% attack if and only if the withdrawal keys can stop it and the pool can not tell whether or not the withdrawal key holders are online to keep watching the behaviors of the pool. The others are also worth considering, but Approach (1) is what we can call the simplification of PoS.
The matter is how to make the switchable staking pool with the shortest finality. I guess it takes 2 slots.

4 Likes

Great! This has been a major concern of mine with the change.

I think its okay for staking to cost a modest amount of consumer hardware and internet. With the Ethereum on ARM effort, you can build a staking machine for under $1k. With verkle trees, I understand we’ll be able to reduce state size by a meaningful amount as well again. If we can keep the delegated staking ecosystem favouring these social principles of decentralisation and being conscious of where and who they allocate their capital to, we can cause the LSPs to compete on this axis of decentralisation, rather than solely on yield or worse rehypothecation.

Yes I agree with this take, and would echo what @pradavc and @tbrannt have said below around not going ‘all in’ on DV based staking at the expense of motivated individuals being able to participate independently, to allow for the most permissionless long tail to survive (these still can be as distributed validators but ‘solo/indie DVs’, not ones from a big club with firm rules because you’re collectively managing millions of dollars of other people’s money instead of your own.) At Obol we have designed heavily toward these DV clusters being independent of one another and of different flavours and variants, along with different governance and stewardship. Eschewing homogeneity in favour of heterogeneity on as many axes as possible.

To reiterate my view from my OP, I’m generally aligned with @kassandra.eth and @pradavc in the below. With the extra specific suggestion of bringing an SSF “duty” in as the non-canonical source of finality at first, and if we’re happy with it post implementation and it needs no tweaking, then we deprecate Casper. All the while, favouring designs that will allow the long tail to participate in the core protocol as much as feasible (consumer hardware and internet, one-to-two digit eth), and pushing the ‘app layer’ of delegated staking to leverage DVT to bring in a wider audience of participating node operators into their products.

2 Likes

In considering either approach one or two, it’s important to contemplate the migration strategy for the current beacon chain. A simple hard fork might no longer be viable due to extensive changes. This could necessitate another merge-like event, such as an in-flight transfer or the creation of a new proof of stake chain. However, the challenge lies in ensuring that these changes do not disrupt the execution side

1 Like

I actually feel like a series of hard forks might work here!

Most of SSF can be implemented as (i) reducing the epoch length to 3, (ii) changing the rules for get_active_validator_indices, and a few further tweaks to attestation inclusion rules and incentives.

4 Likes

The operators of those large validators will be much more susceptible to government censorship and regulatory pressure. The homestakers today are the ones that can single handedly uphold Ethereum values no matter what. Ten thousand distributed homestakers spread over the entire globe are unstoppable. It would be a gigantic loss for Ethereum. And a massive hit for Ethereum narrative. It would be just another PoS chain. Also there is plenty of people who would actively fight against that, entire communities/dapps exist around homestaking, its ontop a philosophical issue. This is a landmine that I wouldnt touch.

4 Likes

I’m a solo validator index number below 10000 and I 'm proud of it. Withdrawing to stake with some liquid stake protocol always hunt my mind and loosing my validator index number is the only thing that stops me doing it.

1 Like

Btw this should be used as one metric for reputation. Why can’t reputation be a criteria for participating in the committee?

3 Likes

It’s DPoS. The problem of DPoS is already well discussed.
That’s why this part should remain somehow.

Approach 3 sounds interesting and might be cool to explore more.
I did some charts showing the impact of the parameters M and k. Largest validator balance set to 2^{18}, then plugging in the numbers:

  • The higher M, the closer it gets to approach 2.
  • A higher M might increase economic security
    • 2/3 of the validators would be above M

Approach 1 sounds a bit scary - we’d basically depend a lot on tradfi. Reputation-gated sounds scary too. I think this could harm censorship resistance (even with having ILs). We saw at the Kraken example how threatening a company led to all of their validators now engaging in censorship. Same applies to 60% of the relay market. Too fragile to go all in (yet). We’d first need ILs and/or encrypted mempools to make sure it doesn’t backfire.
Approach 2 is very similar to 1 while allowing solo-stakers to engage as the last line of defense. Since solo stakers could easily fork away, this sounds plausible.
Approach 3 sounds like the biggest improvement from the status quo, despite the increased in-protocol complexity.

5 Likes

This sounds like a debate on a performance vs security trade-off for a finality gadget.

A finality gadget that uses 10k signatures gives you good performance (latency) for SSF but you are worried of the reduced security (and accountability).

Note that using random sampling (say a VRF), or (secret) fair sampler, the security of k consecutive agreements should roughly accumulate stake - this is a latency tradeoff that gives more accountability as the block is buried deeper over time.

If that is not enough you can have multiple finality gadgets, a fast one with 10k signatures and a slower one with say 100k (or any other number) of signatures. The 100k one can run every 20 slots and take 20 slots to complete etc. This is somewhat similar to approach 2.

Unlike approach 2, having two gadgets (or three etc) that only differ in the VRF sampling probability makes their relationship rather egalitarian and simple (same code). Moreover different clients (or different transactions) can wait for confirmations from different gadgets.

For example maybe doing a transaction that contains more than 1m eth would prefer to wait for multiple 10k confirmations or for a 100k signature confirmation (or confirmation from all stake…). While a block with total value of 10k eth might be okay with a single 10k consensus confirmation, etc

2 Likes

Lets hope for the best. And we cannot swich pool this was major issue

@vbuterin

Would any of these approaches affect prevrandao in so far as it being accessible? It is already used as a source of entropy for some dapps.

This post urged me to write down what participating in ethereum is for me (and maybe others), this might help aim for the right solution

  • Influence/ participate - Individuals/ small groups that can participate and secure ethereum is a powerful idea.
  • Decentralization - ethereum can’t be “taken down” or manipulated by a single entity
  • Economics - No-one expects to get rich but economics play a major decision driver for many stakers (pooled/ institutions and individuals)

I like approach 1, for obvious reasons, but it needs to consider the above points to make it work.

DVT can be run by home stakers easily, it does have limitations in term of the number of consensus participants. Theoretically 1,000 operators on a cluster can be made to work (similar to committee based approach and BFT limitations).
This means 1,000*4,096 = up to 4M “operators” which seems pretty good.

The above requires better key management and secret sharing, potentially the ability to change validation keys to facilitate cluster set changes without compromising security.

Another DVT benefit is that the individual validator is much harder to compromise. Considering “very large” operators will exist, just by the nature of how staking works, it’s better if they are part of a DVT cluster than running a few 4K ETH validators on their own

Thanks @vbuterin & the community for this amazing thread, as always! :pray:

In my humble opinion, +1 to approach-3 with further decentralization through enshrined staking pools, DVT & light clients over time! This would enhance the decentralization ethos that we all love about Ethereum!

1 Like

I feel like I’m missing something obvious, but how does solution 2 solve the problem?

Isn’t the light layer equivalent to what we have today (and in fact worse because there are no minimum)?

See Signature Merging for Large-Scale Consensus to answer your first question

I’m confused about how the incentive structure would work for the light layer. If there’s no slashing vulnerability, but one does exist for the “heavy” nodes, what would be the incentive for anyone to run heavy? Maybe the staking rewards on the light layer would be lower (or zero), but if they’re non-zero, light staking effectively becomes a risk-free rate on ETH.

I have a question related to approach 1 (4096 validators) and how it relates to your sharding and DAS proposal post (and danksharding more generally).

Given that 4096 fits in a single committee size, would all the nodes be forced to download all the data? Or would a reed-solomon encoding scheme like eigenDA where all nodes download a fraction of the RS encoded chunks only be used?

Does @vbuterin usually follow up in these threads? I’ve also got a bunch of other questions but don’t want to be screaming into the void.

Elliptic curve addition is a pretty basic primitive, and those core primitives have been studied for years. So the optimizations are likely to come from aggregation, likely via caching aggregate (an engineering problem).

That said there are interesting research in large-scale BLS aggregation, see Web3 Foundation Accountable Light Client Systems for Secure and Efficient Bridges — Research at W3F

And also RecProofs from Lagrange Labs as aggregation in SNARKS is very very expensive: https://www.lagrange.dev/recproofs

So zkBridges team and coprocessor teams that want to prove Casper in ZK (https://research.polytope.technology/zkcasper) likely have some interesting optimizations to reduce the amount of work to do.

Now this is something I’m quite interested in and if people want benchmarks on various hardware, I can update the one I did for @asn for Devcon VI (Batch additions by mratsim · Pull Request #207 · mratsim/constantine · GitHub)

I’ll be happy to build any cryptographic optimizations into Constantine for actual measuremements.

On my Ryzen 7840U (low-power, 15W, 8 cores, laptop CPU), individual EC add with various coordinate system:

And with batch addition, serial and parallelized:

I think at 1.3ms, single-threaded, the cryptography is plenty fast, and the delay will be the aggregators’ topology and networking, see Signature Merging for Large-Scale Consensus

4 Likes