On-chain scaling to potentially ~500 tx/sec through mass tx validation


#33

I would like to assist with this; is there a GitHub repo for this idea that includes the contract and off-chain code that maintains the branches of tx?

We are working towards a PoC for non-fungible tokens at:

But, there is a lot of interest in making it work with accounts and balances, although that comes with many more edge cases and complexities compared to non-fungible tokens. At the moment the aim seems to be ‘keep it simple, and deliverable’.


#34

Thanks for the answer, @vbuterin. :slight_smile:

I already read it before and now I’ve read it again.

Firstly, I want to say I fully support everything you wrote in the post :fist::slight_smile:, I’m almost convinced that is the best (if not the only) way for Ethereum to achieve its full potential. :ok_hand: Btw, I see these future L2 execution engines as more or less the same thing as Plasma chains, am I right? :no_mouth:

However, my understanding is that, in this future setup, L1/main chain should have two functions/purposes:

  1. Process disputes/exits
  2. Guarantee L2 checkpoints/roots/headers availability and ordering (not the availability and ordering of all the data)

I fail to understand how L1 can ensure availability of all the L2 data (like proposed here or in the roll_up’s current implementation)? One can imagine a future where we have thousands of L2 chains/engines with millions of Tx/s cumulatively, can this all be on the main chain?


#35

Btw, I see these future L2 execution engines as more or less the same thing as Plasma chains, am I right?

No, they’re quite different. An L2 execution engine is basically an alternative way of interpreting data that is published to the main chain, with two-way convertibility between assets inside and outside this alternate universe-inside-the-same-universe enforced by a smart contract. This is not quite an L2 execution engine but it’s fundamentally similar.

Plasma relies on the great majority of data being outside the chain, and also relies on liveness assumptions, exit mechanisms, etc, to ensure some notion of safety in the event that whoever publishes the data commitments to chain starts publishing bad commitments.

There’s two very distinct classes of constructions here, with different properties; I think both are valuable.


#36

Oh, so those will coexist. Interesting, thank for the clarification. :slight_smile:

I believe there is a typo somewhere here, it’s not clear what are you explaining (L2 engines or Plasma chains), and it’s really funny because in my head this description fits both. :slight_smile:

Oh, I see, the main difference is that L2 engine publishes all the data (all transactions) on the chain (just like in this proposal and in roll_up), while Plasma keeps most of the data off-chain, right?


#37

Would be really cool to implement SC hashtags like this, no? How far are we from implementing / using this?


#38

Yep! L2 execution engines only put execution off-chain and keep data on-chain, Plasma, channels and related systems try to put both execution and data off-chain.


#39

Got it, thanks! :pray: :slight_smile:


#40

Let me rephrase what @vbuterin is proposing.

  1. Relayer reserves a slot of several blocks.

  2. Relayer takes the merkle roots from the previous update and updates them using the transaction he accepted, and then posts the corresponding ZKSNARKS to the chain, as well as the updated Merkle roots.

  3. Another relayer reserves a slot and processes more transactions (slot reservation may be an auction).

There are several problems:

An obvious problem that needs to be addressed is chain reorgs.

If I post after some other guy, I rely on the Merkle root that was just posted by him. But someone with enough computational power can re-org the chain making the Merkle root I based my computations on irrelevant.

B. The second problem is time it takes to generate proofs. If I base my zksnarks proofs on some Merkle root and if it takes 20 minute for me to generate the proofs, the Merkle roots may be updated by someone else, making my calculations absolete.

So, if it takes me 20 min to generate the proofs, the time slot needs to be at least 20 min, correct? I can not let someone else (another relayer) update merkle roots why I am generating the proofs.

C. Another question which is not addressed is that if there are multiple relayers for the same merkle root, they have to somehow pass the Merkle tree to each other. If I am a relayer, where do I get the updated Merkle tree from? It looks like I need to get it (or at least the transactions) from the relayer that updated the tree before me. How does this happen?

D. Data availability is another problem. As a user how do I get access to the tree? What if relayers withold it from me?


#41

D. Data availability is another problem. As a user how do I get access to the tree? What if relayers withold it from me?

You can construct the entire tree purely based on information that has been published to chain. I believe this answers ( C ) as well.

But someone with enough computational power can re-org the chain making the Merkle root I based my computations on irrelevant.

Sure, but in practice even a single block reorg happens <10% of the time and a two block reorg happens extremely rarely, so I’m not sure this is an issue in practice.

B. The second problem is time it takes to generate proofs. If I base my zksnarks proofs on some Merkle root and if it takes 20 minute for me to generate the proofs, the Merkle roots may be updated by someone else, making my calculations absolete.

Agree that proof generation time needs to be fast enough and this could be the limiting factor! Though if this becomes a problem then the chain as a whole could still reach ~500 tx/sec with multiple transaction batching gadgets of this type (eg. for different tokens) running in parallel.


#42

Understood, I missed that part, sorry

I am not sure this can be resolved so easily … If a transaction involves a $1B transfer, it will make an incentive for an attacker to do an reorg. I think you need to analyze more carefully and show that a reorg is not a security problem in case of a powerful attacker incentivized by a large payoff


#43

If there is a reorg of the ETH blockchain, it can reorg this system. If the blockchain is finalized up to some point, then this system is also finalized up to that point. It piggybacks on the blockchain directly, so it just inherits its reversion and finality properties.


#44

In addition, if many blocks are reverted, the relayers can just resubmit the proofs they already constructed since a reorg should technically only revert back some of the roots commited. Reorgs should not propose a new history root if we consider that a single relayer per epoch (i.e. time window to aggregate signature + build proof) can commit a new root.


#45

If registering, deposits and withdrawals will update the Merkle root, doesn’t this impact the relayers who are building a proof based on a previous Merkle root? They have to build a new proof based on the new Merkle root. Do they need to start over or is is possible to reuse part of the now obsolete partial proof?


#46

Is it possible to also structure layer2 exec engine’s transaction logging on base chain using zksnark? This also seems like a promising direction.


#47

Have you considered using aggregated signatures like this http://crypto.stanford.edu/~dabo/papers/aggreg.pdf

I have a feeling you could aggregate signatures first, and then do a SNARK on the aggregated signature (in your scheme you are doing SNARK on all ECDSA signatures of all transactions, correct me if I am wrong). The SNARK generation time could be faster for the aggregated signature verification vs verification of each ECDSA.

Another potential improvement is to combine relayers with miners. If miners are relayers, than it seems that you do not need to reserve slots because mining already provides a way to resolve concurrent access.
In other words, only the miner of a particular block can be a relayer for the transactions batched in this block.

Yet another potential improvement is to count zksnark proof computation performed by the miner towards the PoW of the block. In other words the PoW of a block would be the regular PoW + ZKSnark PoW. In this way you essentially get the ZKSnark computation for free, since the miner would correspondingly decrease the regular PoW part.

ZKSnarks can be parallelized as described here

Since the miner anyway gets paid several thousand USD for each block, the miner could spend money to do a highly parallel computation so the proof would be generated much faster in time less than the block time.


#48

If I’m not mistaken, this requires L1 support. I’m wondering if there are other L2 solutions for the collision problem. Various L2 execution market protocols I’m aware of face the same challenge ( Ethereum Alarm Clock, meta-txes, batched NFTs, event oracle, etc.).

Particularly, if the recent research into RANDAO and VDFs could offer insight here, for randomly selecting a batch tx proposer for a short period.


#49

I have. However, an N-message N-signer aggregate signature takes N+1 pairings to verify, so it’s just as hard (taking into account constant factors even harder) to verify than this, and also it cannot prove Merkle tree updates.

The other low-tech solution here is to use an aggregate signature and just publish a new Merkle root with a security deposit, allow any relay transaction to specify and previous relay transaction as a parent, and have the contract maintain a kind of “internal blockchain” of relay transactions; the fork choice rule is “recursive first published” ie. if any parent has multiple children the child that was published earlier wins, but the fork choice would only be “finalized” after two weeks; in the meantime, anyone can challenge any computation that was made and claim a reward, in which case the entire subtree of that computation would be thrown out. I think that’s worth looking into as well, though it does have a liveness assumption.


#50

In this case miners probably would not mind a L1 update since it will potentially bring them additional money. Also it seems that if miners are doing wasteful computations anyway and zksnarks require lots of computations, it is good to replace waste with zksnarks. I think one should not be totally scared of changing the main chain …


#51

Interesting thread! I believe this direction has lots of merit for scaling blockchains. It could be made more efficient (2 orders of magnitude) if you use STARKs instead of SNARKs:

  1. The gas-cost-per-public-input of STARK is ~ 100X smaller than that of SNARK, meaning one can go with Vitalik’s initial, simpler, solution, rather than using a more complex NP statement in which the inputs are treated as part of the private input of the SNARK. (explanation: If there are N inputs, each input costs roughly 2 log N field operations for the STARK verifier, translating to less than 400 gas even when N=1,000,000. A SNARK verifier costs an elliptic curve operation for each input, and that’s ~ 40,000 gas).

  2. As pointed out, proving time is the main bottleneck. Our (StarkWare’s) work-in-progress STARK prover is at least an order of magnitude faster than the fastest SNARK prover and we expect further improvements down the line. tl;dr 500 tx/sec should be doable with STARKs.

We’ll provide more details at the zk-STARK workshop in DevCon4, including exciting updates on proof size.


#52

Here is an idea how to fix it:

a) users submit transactions to the relayer
b) the relayer concatenates the transactions and sends back the concatenated transaction CT to all users
c) every user signs the CT and sends the signature back
d) the relayer concatenates the signatures into a Boldyreva BLS multisignature - note that Boldyreva multisignature verification takes constant time because the multisignature is on the same message
e) Then you do ZKSNARK on the Boldyreva multisignature verification which will be constant time (not linear in n). Hopefully this ZKSnark will be much faster since it will not be linear in the number of users.

Note that if one of the users witholds the signature on the concatenation it is not a problem -the relayer will wait for a timeout T and then create a subgroup multisignature for the users that responded.