A minimal sharding protocol that may be worthwhile as a development target now

Given the possibility of yet more changes to the sharding 1.1 spec, and developers’ concerns that they are building something that could get changed again, I wanted to offer something that is worthwhile as a development target to shoot for right now, and will be on the path toward implementing the final protocol:

  1. Anyone can call addHeader(period_id, shard_id, chunks_root) at any time. The first header to get included for a given shard in a given period gets in, all others don’t. This function just emits a log.
  2. For every combination of shard and period, N collators (now called “notaries”) are sampled. They try to download the collation body corresponding to any header that was submitted. They can call a function submitVote(period_id, shard_id, chunks_root). This function just emits a log.
  3. Clients read logs. If a client sees that in some shard, for some period, a chunk has been included and >= 2N/3 notaries voted for it, it accepts it as part of the canonical chain.

Notice that this protocol is extremely simple, and lacks “notary skin in the game” (slashing conditions that make it risky to vote for collations unless you actually downloaded the full data at that time) but it is under some assumptions a complete protocol, and offers an opportunity to build and test all of the base infrastructure, including:

  • The capability of having 100 separate shard p2p networks, and building and sending collations across those networks
  • The ability to read logs emitted by an SMC
  • The ability to send transactions that call functions of the SMC
  • The ability for a client to maintain a database of which collation roots it has downloaded the full body for
  • The ability of a validator to (i) log in, (ii) detect that it has been randomly sampled, switch to the right p2p network, and start doing stuff, (iii) log out
11 Likes

Is there a write up for the 1.1 spec?

I think Vitalik is referring to the currently written up phase 1 spec.

We’ve made progress on the research side recently which points towards a revamping of the protocol. We have a cleaner proposer-notary separation, an alternative to windback, a stronger availability challenge. Some other ideas we are exploring:

  • Decoupling collation headers from the main chain, or only explicitly exposing fully notarised collations or checkpoints, maybe using ideas from Dfinity and Bitcoin-NG
  • Fork-free proposal chains, and rollback mechanisms
  • Variable-size and variable-threshold notarisation
  • A new form of signature aggregation
  • Strengthened shard finality in case of main chain reorg
  • Sidechaining parts of the SMC into a manager shard

In short, the research side of things is in flux. It will probably take a few weeks for ideas to surface on ethresear.ch and for the dust to settle. A new spec (hopefully all round better) may come out in a couple months or so.

The above minimal sharding protocol is a good starting point for implementers. Agreeing on the p2p networking stack (transport layer based on libp2p or otherwise, plus a gossip layer with a channel per shard) seems like a valuable thing to do irrespective of the higher level protocol details.

10 Likes

Request for renaming and ordering for consistency:

  • addHeader(shard_id, chunk_root, period)
    • period_id -> period
    • chunks_root -> chunk_root
3 Likes

Does this mean we can reuse the get_eligible_collator and collator_pool in the sharding phase1 spec here?

What’s the purpose of this function? Why can’t submitVote be called without it?
For every shard and for every notary member there shoud be a submitVote called?

What happens when there are less than N*(2/3) notaries voting for a shard collation?

2 Likes

It seems like it would be better if there was some intermediary staging before clients tried decoding what the canonical chain is. Otherwise it might be difficult to read. Maybe have a canonical headers section of finalized blocks that receive 2N/3 votes? That way clients don’t need to deal with computing what the canonical chain is.

What’s the purpose of this function? Why can’t submitVote be called without it?

addHeader is there to propose headers. submitVote is there to submit a vote approving a header. The two are different.

What happens when there are less than N*(2/3) notaries voting for a shard collation?

Then no collation gets accepted in that period.

We could move the computation into the SMC itself, and have it emit finalized headers as logs, which would simplify the work of light clients at the cost of requiring more gas.

I think this should be extended to

If a client sees that in some shard, for some period, a chunk has been included, >= 2N/3 notaries voted for it and all other accepted headers have a lower period number, it accepts it as part of the canonical chain.

Otherwise it’s possible that a very old chunk that’s been missing just a single vote gets accepted, leading to a huge reorg at the execution layer (or whatever interprets the chunks).

Also, this prevents two chunks being accepted for the same period.

(Unless this is really just supposed to be a placeholder SMC in which case it doesn’t really matter of course.)

What is the incentive for doing do?

Ah, sorry I failed to mention that there’s a time limit for notaries to notarize collations; notarizations for one period have to be submitted before the start of the next period.

1 Like

What is the rationale for allowing anyone to call addHeader ? Someone could just spam a shard by continuously calling the function and submitting invalid collation headers, thus not allowing any new collations to be added to the canonical shard chain

What is the rationale for allowing anyone to call addHeader ?

It will almost certainly be replaced, we just don’t know by what yet. Hence why I said “minimal sharding protocol”. It’s there as a stub.

Can this sampling return different N notaries on different branches of a shard?

What does “included” mean here? addHeader() has been called on the chunk?

Now I guess the sampling happens not on the managed shard but somewhere else, so the sampling is not affected by forks and branches of shards.

But it sometimes sounds like voting happens in the shard, not somewhere else

in some shard, … >= 2N/3 notaries voted for it

so, it might be the case, notaries are chosen in the shard, and voting is recorded in the shard. Different votes are counted on different branches, pointing to different canonical chains.

I’m guessing

  • it’s never mentioned but there is the main chain
  • SMC is deployed on the main chain
  • addHeader and sutmitVote are interfaces of SMC (shard management contract?)

Yes, yes, and yes. :smiley:

1 Like