Cross Shard Locking Scheme - (1)

Let’s say you have a transaction T1 that needs access to two resources, located on two different shards, such that it needs to do the following operations:

  1. LOCK (T1, R[0], S[0])
  2. LOCK (T1, R[1], S[1])

Then let’s say another transaction T2 comes in that wants access to the same resources:

  1. LOCK (T2, R[0], S[0])
  2. LOCK (T2, R[1], S[1])

Then let’s say this is what actually gets executed:

  1. LOCK (T1, R[0], S[0])
  2. LOCK (T2, R[1], S[1])

Now, neither T1 or T2 can proceed, hence deadlock.

1 Like

Ah, I see. I think something like two-phase commit should solve this right? So expanding my proposal from above:

  1. All desired resources R[1] … R[n] on shards S[1] … S[n] must be locked, and each lock generates a log that contains the transaction ID, target shard and the complete list of resources and shards needed; the tuple (tx ID, target shard, resource list) is called the “lock ID”.
  2. The log in shard S[i] needs to be included in all shards S[1] … S[n] except S[i] itself.
  3. Suppose that any shard S[i] receives an attempt to lock a resource, with lock ID L, but that resource is already locked. Then, shard S[i] issues a log that can be used in other shards to repeal any lock with lock ID L.
  4. Suppose that a shard S[i] discovers that it has received a notification of a lock of all resources R[1] … R[n], all with the same lock ID, and it does not see that any of those resources have been unlocked because of a conflict. Then, it issues a log confirming the locks, which can then be read by the target shard.
  5. If every shard S[1] … S[n] issues a confirmation log, then the target shard can execute the transaction, and issue logs containing the new state of every resource.
1 Like

Yes, that’s almost how Sharded Byzantine Atomic Commit works, but these corner cases are accounted for. :slight_smile:

1 Like

Ok, reading that paper now.

My current understanding is: if an attempt is made to lock a resource on some shard with some lock ID, and this attempt fails because the resource is already locked, then this issues a log that can be used to unlock any resources in other shards that were locked in the same lock ID. If a single shard receives notifications of logs from all shards, then it knows that there are no shards that could have issued a log that would repeal any of the other locks, and so it can safely publish a confirm message.

1 Like

That sounds about right, I would add that a failed attempt to lock a resource on some shard with some lock ID, specifically means that some shard has produced a log to lock one of its resources to some shard ID, but another shard has produced a log to lock another resource required by the same lock ID, to another lock ID, thus deadlock. This is something that could happen if for example these two shards produce a block at the same time with conflicting locks. These two shards, upon receipt of a log from each other to show they have conflicting active locks, should release the locks in their respective lock IDs (abort transaction).

Given that shards have an honest majority, that would guarantee liveness of the protocol. To reduce the chance of repeated aborts (i.e. due to a malicious peer that is carefully timing the sending of conflicting transactions to different shards), we can perhaps add some non-enforced ‘etiquette’ consensus rules, for example, an implicit ordering can be applied on the set of shards, and if there were two previously aborted conflicting transactions, the shards will try again, sequencing the transaction that was locked by the shard with the highest order.

1 Like

Folks - I would like to support what Mustafa is saying …

The fundamental decision is whether the master that manages all of the locks is a smart contract that lives on one of the shards or a user.

If it is a smart contract, then its operation is guaranteed. This means that:

  • when all the slaves tell the master they have locks
  • the master will tell back to all the slaves to commit and
  • the slaves will commit.

The beauty of all of this is that all parties are smartcontracts, so operation of each of them is guaranteed. So there is literally no way go get to a problem like a stale lock, unless the source code of one of the smart contracts is screwed. So it is much much better than regular database transactions where a particular database can crash which can lead to a stale protocol, and then you have to introduce timeouts.

In the protocol above there are no timeouts and no need for them.

If the user manages the locks, then as Mustafa is saying, the user can simply withold messages, because he is a bad guy, or because he crashed. In this case the entire system gets stuck, and the only way to resolve this is to introduce timeouts, which are evil things for concurrent systems, because you then need to constantly think about one timeout interacting with another timeout etc.

Also for the system above the only abstraction that needs to exist in the EVM is a notion of asynchronously reliably sending messages from one party to another.

In fact, this system will be the most reliable distributed locking system in the history of the universe, because particular databases in distributed database systems can crash, and shards cant totally crash, so it will be a very useful product

The way it needs to work, is there are two master smart contracts M1 and M2.

  1. Both smart contracts send messages to resources A and B to acquire a lock.

  2. M1 acquires lock on A and M2 acquires lock on B

  3. M1 gets a message that locking B failed, and M2 gets a message that locking A failed.

  4. Both transactions fail, M1 sends a message to A to unlock, and M2 sends a message to B to unlock.

So there is no deadlock.

To summarize, the simplest protocol is:

  1. The user send a message to the master smart contract to initiate the transaction.

  2. The master smart contract sends a message to slave contracts on all shards to lock resources.

  3. Each slave contract either locks the resource and replies to the master “OK”, or fails to lock, and replies to master “Sorry, already locked”

  4. If the master gets all OKs, the master tells all slaves “proceed” and they proceed with transaction.

  5. If the master gets at least one “Sorry”. then it tells all slaves “abort”

In this procedure, the complexity is N where N is the number of slaves. The master can live on any of the shards or on main chain. Since all messages are reliable, the system will never deadlock unless one of the shards is Byzantine.

It’s worth noting that at this point I consider yanking to be a superior alternative to all of these locking techniques. So the process becomes:

  1. User sends a message to resource A to yank it to the same shard as resource B
  2. Resource A arrives at the same shard.
  3. (i) user does an intra-shard tx involving both A and B, or (ii) someone else uses resource A in some way, including possibly yanking it to another shard, whichever comes first

There’s no concept of “locked” and “unlocked”; every resource is at all times either “locked to” a particular shard (by being located inside of that shard) or “in transit” (unusable by any shard).

“As an example, a hotel booking contract might work by having a function, reserve(), that reserves a hotel room, instantiates a contract that represents the hotel room, and that contract then contains a move_to_shard(uint256 shard_id) function which allows anyone to yank the contract to another shard.”

Interesting … I think it can be reformulated in terms of locks.
What you mean by yanking is equivalent to creating a shared lock.

  1. Alice sends a message to the hotel booking contract to lock a room. The hotel booking contract creates a lock receipt.

2 The receipt is shared in a sense that anyone (not only Alice) can use it it

  1. Then both Alice and Bob can send a lock receipt to the train booking contract to book a train. The good thing is that even if Alice abandons after locking a room, Bob can use her lock receipt to either book a train or release the lock on the room back to the hotel contract. So there is no deadlock.

A problem potentially though is that since Alice does two actions, Eve can potentially interfere. For instance Alice books a room, and Eve immediately uses Alice’s receipt to buy a refundable train ticket, so Eve essentially does a DoS on Alice. Another problem is that some of the code needs to run in Alice’s client, so essentially developers will have an additional piece to code.

A modified version, which does not have this problem would work like that:

  1. Alice sends a message to the hotel contract with a pass-through request for a train ticket.

  2. Hotel contract locks the room, and sends a cross-shard message the train contract to buy a room.

  3. The train contract either issues a ticket or rejects. It then replies to the hotel contract.

  4. When the hotel contract receives the message from the train contract, it either completes the reservation, or rejects.

This is very much equivalent to yanking imho but does not involve users running any glue code. A good thing about this is that it only involves two messages.

In any case any locking scheme can be implemented as a library on top of asynchronous messaging … So EVM may just need to implement asynchronous message send/receive calls …

There’s no concept of “locked” and “unlocked”; every resource is at all times either “locked to” a particular shard (by being located inside of that shard) or “in transit” (unusable by any shard).

This is beginning to sound a lot like the concept of “ownership” in Rust, which I am quickly falling in love with. Given that ownership in Rust is a key feature that makes it concurrency friendly, is there an analogy to it in cross-shard resource management?

Yanking sounds like “borrowing” in the context of Rust.

Here’s a related comment in an EIP: Pessimistically locked SLOAD by homakov · Pull Request #718 · ethereum/EIPs · GitHub.

A change to gas accounting to only charge gas for the net change in storage at the end of a frame.
The latter would make mutexes and other concurrency primitives affordable, as well as other use cases that require temporary storage that persists across calls.

This would be particularly useful in the case of synchronous contract calls across shards. Research compendium - HackMD