Cross-Shard Messaging System

adiasg · September 26, 2019, 2:30am

This post outlines an in-protocol cross-shard messaging system by proposing changes to the block validity and fork choice rules. This is a part of the CBC Casper sharding proposal, but can be easily applied to later phases of Eth2.0. A description of this can be found in the CBC Casper draft paper in Section 4.6

1. Objective

Eth2.0 supports cross-shard messaging on an out-of-protocol basis, meaning that users have to manually perform some operations to deliver a cross-shard message. The protocol handles the delivery of the receipt of the transaction from a shard chain to the beacon chain, and the user has to create a transaction on the recipient shard that refers to this receipt.

The goal is to have a protocol which handles the end-to-end delivery of the cross-shard message, in which the user only has to generate a transaction on the first shard chain.

2. Proposal

2.1. Block Structure

Blocks now contain 2 new logs, one for sent messages (in this block) and received messages (in this block) each.

2.2. Messages

Messages are objects that contain:

sender_shard_ID
recipient_shard_ID
final_destination_shard_ID
target_block on the recipient shard
blocks_to_live parameter
payload_tx

Any cross-shard message is going to have to make 2 hops - Shard A to Beacon Chain, and Beacon Chain to Shard B. sender_shard_ID and recipient_shard_ID are for the specific hop, and final_destination_shard_ID identifies Shard B.

The message can be received in blocks on the recipient shard between heights height(target_block)+1 and height(target_block)+blocks_to_live.

In the above figure, the target_block is b2 and blocks_to_live is 2. This message can be received only in blocks b3 & b4.

2.3. Block Validity & Fork Choice

The atomicity of sends and receives needs to be maintained; either a message is sent in the sender shard and received in the recipient shard (eventually), or it does not appear anywhere.

I’ll skip the obvious ones like: each message can only be received once, chains can only receive messages destined for themselves, etc.

2.3.1. Shard Chains

The shard chain fork choice follows the beacon chain fork choice, and these modifications enforce the send-receive atomicity:

If the beacon chain sends a shard some message, then the target_block is in the fork choice on the shard chain.
If the beacon chain receives some message from a shard, the sender block is in the fork choice on the shard.
If the beacon chain does not receive a shard chain message by it’s expiry, then the sender block is orphaned in the shard.
Any shard chain block that doesn’t receive a yet-unreceived message by it’s blocks_to_live expiry is orphaned.

The below figure illustrates the first rule:

The below figure illustrates the second rule:

2.3.2. Beacon Chain

The beacon chain block validity rule enforces that messages received in a block are immediately redirected and sent to the final destination. If a message is in the receive log of a block, it must also appear on the sent log of the block with:

sender_shard_ID identifying the beacon chain
recipient_shard_ID as the final_destination_shard_ID of the received message
appropriate target_block and blocks_to_live on the recipient shard

Since the shard chain fork choices are dependent on the beacon chain fork choice, the beacon chain fork choice must follow this rule to avoid chaos in the shards:

The beacon chain cannot send to/receive from one fork of a shard and later send to/receive from a conflicting fork of the shard. (“send to a fork” refers to the fork containing the target_block)

The purpose of this fork choice rule is to avoid the below situation:

3. How Are Messages Traveling Between Shards?

Messages can get from the beacon chain to the shards/from shards to the beacon chain naturally because of the way validators are distributed:

Messages from the beacon chain to a shard chain are seen by all validators on the shard, since they are also validators on the beacon chain. Validators on the shard chain also have all necessary information for executing the fork choice.
Messages sent from a shard to the beacon chain are collected by the respective crosslink committee and included in the crosslink information on the beacon chain. It is important to note that not all beacon chain validators have access to necessary information for executing the beacon chain fork choice and block validity rules. The requirement is that atleast one of the validators on the shard chain provides fraud proofs in case the stated rules are broken.

adlerjohn · September 26, 2019, 2:51pm

I’m admittedly still not familiar with the nomenclature used for Casper, but is that supposed to be worded like that? I would assume that all beacon chain validators have enough information to execute the beacon chain fork choice rule—if not, how would they be able to stay in consensus? I would also assume that all beacon chain validators can execute a beacon chain block validity function—if not, how would they be able to stay in consensus? A fraud proof for an invalid crosslink shouldn’t invalidate any beacon chain block, otherwise economic finality isn’t possible.

adiasg · September 26, 2019, 3:41pm

Let’s consider the below situation:

Note that the beacon chain fork choice rule is broken in this case:

The beacon chain validators who are’t in the shard committee or crosslink committee will not be able to detect the above situation on their own. The assumption is that at least one validator from the shard committee or the crosslink committee will provide a fraud proof for this to the beacon chain. Up til here, this is similar to the situation where an invalid state of a shard gets into the crosslink on the beacon chain.

The purpose of invalidating that beacon chain block is to maintain the send-receive atomicity:

You’re right that this isn’t the best way to go about it, since a colluded crosslink committee can have the power to invalidate a part of the beacon chain at their will sometime in the future.

Perhaps we can have a rule which allows the beacon chain to invalidate specific previously sent/received messages, which has the effect of invalidating the corresponding shard block which received/sent the message?

vbuterin · September 27, 2019, 2:55am

This seems like it requires the beacon chain to do work for every cross-shard message that takes place (at the very least, it must contain it in the receive and send log). How does this avoid causing the beacon chain to incur O(C^2) overhead, causing it to quickly get overloaded?

In general, it seems like any scheme where the beacon chain explicitly processes messages, rather than processing roots of large sets messages, has this property.

adiasg · September 27, 2019, 6:06am

Ah, that’s right. This puts an O(C^2) load on the beacon chain.

This starts making more sense in the CBC sharding proposal, which is a multi-level hierachical sharding scheme.

The root shard only processes messages which are routed between the depth 1 siblings. Through proper load balancing and message routing costs, we can avoid the O(C^2) overloading. E.g., we can charge higher fees to process message routed through shards at lower depth. The goal is to encourage (or force through in-protocol load balancing) the clustering of frequently communicating entities in lower shard subtrees.

vbuterin · September 27, 2019, 11:54pm

Right, I see; so this assumes the CBC-sharding hierarchical shard system.

I’m personally not a proponent of that system; I think it would put too much load on the root shard. For example assuming uniform activity across shards, fully 50% of all activity would have to go through the root chain. I’m sure in reality most activity is not uniform, but if even 5% of activity is uniform then that’s already a maximum cap of 40x on scalability.

adiasg · September 28, 2019, 3:44am

About above calculation: If 5% of cross-shard activity is uniform, then 2.5% of all cross-shard messages are passing through the root shard, and there is a 40x scalability limit.

However, the load from 2.5% of all cross-shard messages is not the same as 2.5% of the total load of the system. If we assume that cross-shard messaging consumes 10% of the system load, then the root shard bears 0.25% of total system load, leading to a 400x scalability limit.