Cross Shard Locking Scheme - (1)

Yes - smartcontract internal state is already enough for concurrency since the private data stored within a smartcontract is guaranteed to have sequential access.

What is needed is modify EVM to provide a way to send messages from EVM to other smartcontracts. In the simplest case one would add a pre-compiled “send_message” smartcontract.

Then another question is how to modify Solidity to add language features for sending messages and defining a function which is called when the other party responds to your message.

Many languages, such as Erlang and Scala have convenient ways to define asynchronous messaging/continuations.

For Solidity the simplest would be to add a “message” keyword, that would take as an argument the solidity function which will called when a response to the message is received.

1 Like

I don’t think it has to be application specific. You don’t have to lock the state of the entire contract, you can just lock the specific storage keys accessed by the smart contract. Each key can represent a different ticket. Am I missing something?

A good locking system should allow multiple concurrent writes to the state of the system, otherwise that defeats the whole point of sharding anyway.

1 Like

I had imagined a scheme where you program a smart contract and also specify what parts of state the scheme accesses, or alternatively build a separate function that tells you which part of the state can be accessed based on given inputs to the smart contract, or even have a compiler do that.

Right, the client submitting the transaction can just execute the transaction and include which storage keys the smart contract is accessing in the transaction. Which is what is happening anyway in the “access_list” parameter of the new proposed transaction format. I don’t think the programmer has to explicitly define what needs to be locked.

Regardless of what cross-shard atomic commit scheme is used, I think there should be some inherent concept of a “unit of concurrency” anyway – that is, some state that people can only write to one at a time, synchronously (e.g. a storage key). At the moment Ethereum doesn’t really have that, because miners can include transactions in the chain in any arbitrary order, which means if you submit a transaction to the blockchain, it might not have the result you expected, if for example someone else submits another transaction for the same contract 1 second before you do. For cross-shard atomic commit to work well, I think this should be fixed so that transactions have to be processed in the correct order, because this could result in some bad inconsistencies in the state of the system.

FW: @vbuterin
I think you are right @musalbas. An issue is that access lists require trust from the point of view of the client submitting the transaction. A malicious client could specify more than needs to be accessed. However, a validator executing the code after a client could just minimise these access lists, and then request the locks. If the validator requests state that’s too large, he can be slashed, by including his transaction along with an access list minimised version on chain.

You could also associate a refundable rental fee associated with holding onto locks, which would be great if validators where the ones who “bought” locks, rather than clients. The rental fee is returned to validators if they release locks in a timely manner after a transaction has been added to the blockchain.

I’m going to try to re-express this in my own framework.

Suppose there is a transaction that needs to access resources R[1] … R[n] on shards S[1] … S[n] (we’ll say for simplicity one resource per shard; the generalization to multiple resources in one shard is obvious), where the “home shard” is S[1]. First, a transaction needs to be included in every S[i], which calls the LOCK opcode with arguments (T, R[i], S[1]), for some transaction identifier T. This locks resource R[i], preventing its state from being modified, and targets the lock to S[1]. Every shard’s state maintains a data structure {shard: locks}, storing for every other shard the set of locks that are targeted to that shard.

It is understood that in order to execute the state at block N of any shard S[i], the client needs to have available the state roots of block N-10 of every other shard, and must also download the entire currently active set of locks targeted to S[i] of every shard (this can be trivially downloaded and authenticated against the state roots of the other shards). Now, when a transaction whose hash is T executes, it can read all locked resources R[1] … R[n], perform some computation, and then suggest new contents for the state of each resource. The shard creates a “release lock” message, targeted to each shard, which also specifies what to set the resource’s new state to; additionally, the shard stores in its own internal state a record saying that that instance of a resource lock cannot be used again. Whenever this release message is included into any other shard via a Merkle proof, that shard sets the new state of the resource to the given value, and the lock is deleted.

This does require any node executing a block on a shard to be aware of more data from other shards, but it does seem to work.


An alternative mechanism, where execution can depend on state roots only (reminder: for simplicity I’m that state roots can be used to Merkle-prove logs, so we don’t need to worry about state and receipt roots separately), is as follows:

  1. The resource must be locked on every shard, and the lock generates a simple log that specifies the target shard and transaction ID.
  2. The receipts proving that all locks have been made need to be included into the target shard.
  3. The transaction on the target shard issues logs, that contain the new values for the locked resources.
  4. The values can be unlocked and set to their new values on the original shard by including the receipts generated by the target shard.

This keeps the model of what cross-shard knowledge is required for executors to have the same as it was before, and pushes the work of actually performing the cross-shard operation to users.

I actually like this; it seems fairly simple.

2 Likes

This is similar to the method used in OmniLedger. This works well for applications where the user is locking a state that only they can modify, such as spending a UTXO, because if they lock a UTXO that belongs to them, but then do nothing, then the user is only harming themselves. But if the user locks an object that can be accessed by anyone, such as a hotel ticket that can be bought by anyone, then if the user locks a hotel ticket and does nothing, then no one else can buy that hotel ticket. Pushing the cross-shard operations to clients means that we also rely on the client to be honest to guarantee liveness. In 2PC-style protocols (i.e. S-BAC), the liveness property relies on the shard being honest instead.

1 Like

Not necessarily. Remember that whoever wants to buy the hotel ticket next can just submit all of the required transactions to unlock the object themselves. Though there definitely are complexities around gas accounting here that need to be understood much better.

What happens when there is a deadlock then, and two conflicting locks are held on two different shards? We either to trust that the clients will resolve this on their own, or have a timeout for locks to deal with bad clients as proposed in the original post, which means no one can buy that hotel ticket for the duration of the timeout.

Indeed there are questions about gas accounting, i.e. should we charge for failed transactions? If we don’t charge for failed transactions, then an attacker can keep locking an object for free, preventing anyone else from spending it, under this model.

A timeout is not strictly speaking necessary, if we use a wound-wait approach which could be executed by the shards.The shards should be resolving deadlock, even if they are not doing most of the work acquiring locks.

Is that similar to the wound-wait mechanism proposed? There is still a question what happens if T_j never releases its locks in case 2, no?

If the shards should be resolving the deadlock, that begins to look like 2PC. :wink:

1 Like

What happens when there is a deadlock then, and two conflicting locks are held on two different shards?

How is that possible? Control of what a resource is locked to is based on the shard on which the resource is located, so you can’t have two contradictory locks.

Yes, definitely would want to borrow that part from your scheme. It’s clever. :wink: However, will say that I think the benefit of a wound wait is that not all locks beed to be acquired in the same round.

Suppose two transactions T_1 and T_2 might both need resources A and B. T_1 has a lock on A and T_2 has a lock on B in different shards.
We need to find a way for resolving these deadlocked transactions.
One way is to say: all locks are sent to all shards, then deadlock can be avoided. However, that would reduce the benefits of scaling as data for all transactions would be sent to all shards. So instead, we just send lock data to relevant parties, but then we need a way to resolve deadlock.I suggested a wound-wait approach.

Let’s say you have a transaction T1 that needs access to two resources, located on two different shards, such that it needs to do the following operations:

  1. LOCK (T1, R[0], S[0])
  2. LOCK (T1, R[1], S[1])

Then let’s say another transaction T2 comes in that wants access to the same resources:

  1. LOCK (T2, R[0], S[0])
  2. LOCK (T2, R[1], S[1])

Then let’s say this is what actually gets executed:

  1. LOCK (T1, R[0], S[0])
  2. LOCK (T2, R[1], S[1])

Now, neither T1 or T2 can proceed, hence deadlock.

1 Like

Ah, I see. I think something like two-phase commit should solve this right? So expanding my proposal from above:

  1. All desired resources R[1] … R[n] on shards S[1] … S[n] must be locked, and each lock generates a log that contains the transaction ID, target shard and the complete list of resources and shards needed; the tuple (tx ID, target shard, resource list) is called the “lock ID”.
  2. The log in shard S[i] needs to be included in all shards S[1] … S[n] except S[i] itself.
  3. Suppose that any shard S[i] receives an attempt to lock a resource, with lock ID L, but that resource is already locked. Then, shard S[i] issues a log that can be used in other shards to repeal any lock with lock ID L.
  4. Suppose that a shard S[i] discovers that it has received a notification of a lock of all resources R[1] … R[n], all with the same lock ID, and it does not see that any of those resources have been unlocked because of a conflict. Then, it issues a log confirming the locks, which can then be read by the target shard.
  5. If every shard S[1] … S[n] issues a confirmation log, then the target shard can execute the transaction, and issue logs containing the new state of every resource.
1 Like

Yes, that’s almost how Sharded Byzantine Atomic Commit works, but these corner cases are accounted for. :slight_smile:

1 Like

Ok, reading that paper now.

My current understanding is: if an attempt is made to lock a resource on some shard with some lock ID, and this attempt fails because the resource is already locked, then this issues a log that can be used to unlock any resources in other shards that were locked in the same lock ID. If a single shard receives notifications of logs from all shards, then it knows that there are no shards that could have issued a log that would repeal any of the other locks, and so it can safely publish a confirm message.

1 Like

That sounds about right, I would add that a failed attempt to lock a resource on some shard with some lock ID, specifically means that some shard has produced a log to lock one of its resources to some shard ID, but another shard has produced a log to lock another resource required by the same lock ID, to another lock ID, thus deadlock. This is something that could happen if for example these two shards produce a block at the same time with conflicting locks. These two shards, upon receipt of a log from each other to show they have conflicting active locks, should release the locks in their respective lock IDs (abort transaction).

Given that shards have an honest majority, that would guarantee liveness of the protocol. To reduce the chance of repeated aborts (i.e. due to a malicious peer that is carefully timing the sending of conflicting transactions to different shards), we can perhaps add some non-enforced ‘etiquette’ consensus rules, for example, an implicit ordering can be applied on the set of shards, and if there were two previously aborted conflicting transactions, the shards will try again, sequencing the transaction that was locked by the shard with the highest order.

1 Like

Folks - I would like to support what Mustafa is saying …

The fundamental decision is whether the master that manages all of the locks is a smart contract that lives on one of the shards or a user.

If it is a smart contract, then its operation is guaranteed. This means that:

  • when all the slaves tell the master they have locks
  • the master will tell back to all the slaves to commit and
  • the slaves will commit.

The beauty of all of this is that all parties are smartcontracts, so operation of each of them is guaranteed. So there is literally no way go get to a problem like a stale lock, unless the source code of one of the smart contracts is screwed. So it is much much better than regular database transactions where a particular database can crash which can lead to a stale protocol, and then you have to introduce timeouts.

In the protocol above there are no timeouts and no need for them.

If the user manages the locks, then as Mustafa is saying, the user can simply withold messages, because he is a bad guy, or because he crashed. In this case the entire system gets stuck, and the only way to resolve this is to introduce timeouts, which are evil things for concurrent systems, because you then need to constantly think about one timeout interacting with another timeout etc.

Also for the system above the only abstraction that needs to exist in the EVM is a notion of asynchronously reliably sending messages from one party to another.

In fact, this system will be the most reliable distributed locking system in the history of the universe, because particular databases in distributed database systems can crash, and shards cant totally crash, so it will be a very useful product