On-chain scaling to potentially ~500 tx/sec through mass tx validation

MihailoBjelic · September 23, 2018, 9:26pm

Oh, so those will coexist. Interesting, thank for the clarification.

I believe there is a typo somewhere here, it’s not clear what are you explaining (L2 engines or Plasma chains), and it’s really funny because in my head this description fits both.

Oh, I see, the main difference is that L2 engine publishes all the data (all transactions) on the chain (just like in this proposal and in roll_up), while Plasma keeps most of the data off-chain, right?

kingflurkel · September 23, 2018, 10:24pm

Would be really cool to implement SC hashtags like this, no? How far are we from implementing / using this?

vbuterin · September 23, 2018, 11:41pm

Yep! L2 execution engines only put execution off-chain and keep data on-chain, Plasma, channels and related systems try to put both execution and data off-chain.

MihailoBjelic · September 24, 2018, 12:07am

Got it, thanks!

kladkogex · September 24, 2018, 3:10pm

Let me rephrase what @vbuterin is proposing.

Relayer reserves a slot of several blocks.
Relayer takes the merkle roots from the previous update and updates them using the transaction he accepted, and then posts the corresponding ZKSNARKS to the chain, as well as the updated Merkle roots.
Another relayer reserves a slot and processes more transactions (slot reservation may be an auction).

There are several problems:

An obvious problem that needs to be addressed is chain reorgs.

If I post after some other guy, I rely on the Merkle root that was just posted by him. But someone with enough computational power can re-org the chain making the Merkle root I based my computations on irrelevant.

B. The second problem is time it takes to generate proofs. If I base my zksnarks proofs on some Merkle root and if it takes 20 minute for me to generate the proofs, the Merkle roots may be updated by someone else, making my calculations absolete.

So, if it takes me 20 min to generate the proofs, the time slot needs to be at least 20 min, correct? I can not let someone else (another relayer) update merkle roots why I am generating the proofs.

C. Another question which is not addressed is that if there are multiple relayers for the same merkle root, they have to somehow pass the Merkle tree to each other. If I am a relayer, where do I get the updated Merkle tree from? It looks like I need to get it (or at least the transactions) from the relayer that updated the tree before me. How does this happen?

D. Data availability is another problem. As a user how do I get access to the tree? What if relayers withold it from me?

vbuterin · September 24, 2018, 3:17pm

D. Data availability is another problem. As a user how do I get access to the tree? What if relayers withold it from me?

You can construct the entire tree purely based on information that has been published to chain. I believe this answers ( C ) as well.

But someone with enough computational power can re-org the chain making the Merkle root I based my computations on irrelevant.

Sure, but in practice even a single block reorg happens <10% of the time and a two block reorg happens extremely rarely, so I’m not sure this is an issue in practice.

B. The second problem is time it takes to generate proofs. If I base my zksnarks proofs on some Merkle root and if it takes 20 minute for me to generate the proofs, the Merkle roots may be updated by someone else, making my calculations absolete.

Agree that proof generation time needs to be fast enough and this could be the limiting factor! Though if this becomes a problem then the chain as a whole could still reach ~500 tx/sec with multiple transaction batching gadgets of this type (eg. for different tokens) running in parallel.

kladkogex · September 24, 2018, 3:24pm

Understood, I missed that part, sorry

I am not sure this can be resolved so easily … If a transaction involves a $1B transfer, it will make an incentive for an attacker to do an reorg. I think you need to analyze more carefully and show that a reorg is not a security problem in case of a powerful attacker incentivized by a large payoff

vbuterin · September 24, 2018, 3:30pm

If there is a reorg of the ETH blockchain, it can reorg this system. If the blockchain is finalized up to some point, then this system is also finalized up to that point. It piggybacks on the blockchain directly, so it just inherits its reversion and finality properties.

PhABC · September 24, 2018, 3:48pm

In addition, if many blocks are reverted, the relayers can just resubmit the proofs they already constructed since a reorg should technically only revert back some of the roots commited. Reorgs should not propose a new history root if we consider that a single relayer per epoch (i.e. time window to aggregate signature + build proof) can commit a new root.

denett · September 24, 2018, 5:04pm

If registering, deposits and withdrawals will update the Merkle root, doesn’t this impact the relayers who are building a proof based on a previous Merkle root? They have to build a new proof based on the new Merkle root. Do they need to start over or is is possible to reuse part of the now obsolete partial proof?

cchx0000 · September 24, 2018, 7:56pm

Is it possible to also structure layer2 exec engine’s transaction logging on base chain using zksnark? This also seems like a promising direction.

kladkogex · September 25, 2018, 7:04am

Have you considered using aggregated signatures like this http://crypto.stanford.edu/~dabo/papers/aggreg.pdf

I have a feeling you could aggregate signatures first, and then do a SNARK on the aggregated signature (in your scheme you are doing SNARK on all ECDSA signatures of all transactions, correct me if I am wrong). The SNARK generation time could be faster for the aggregated signature verification vs verification of each ECDSA.

Another potential improvement is to combine relayers with miners. If miners are relayers, than it seems that you do not need to reserve slots because mining already provides a way to resolve concurrent access.
In other words, only the miner of a particular block can be a relayer for the transactions batched in this block.

Yet another potential improvement is to count zksnark proof computation performed by the miner towards the PoW of the block. In other words the PoW of a block would be the regular PoW + ZKSnark PoW. In this way you essentially get the ZKSnark computation for free, since the miner would correspondingly decrease the regular PoW part.

ZKSnarks can be parallelized as described here

Since the miner anyway gets paid several thousand USD for each block, the miner could spend money to do a highly parallel computation so the proof would be generated much faster in time less than the block time.

sinamahmoodi · September 25, 2018, 8:01am

If I’m not mistaken, this requires L1 support. I’m wondering if there are other L2 solutions for the collision problem. Various L2 execution market protocols I’m aware of face the same challenge ( Ethereum Alarm Clock, meta-txes, batched NFTs, event oracle, etc.).

Particularly, if the recent research into RANDAO and VDFs could offer insight here, for randomly selecting a batch tx proposer for a short period.

vbuterin · September 25, 2018, 11:52am

I have. However, an N-message N-signer aggregate signature takes N+1 pairings to verify, so it’s just as hard (taking into account constant factors even harder) to verify than this, and also it cannot prove Merkle tree updates.

The other low-tech solution here is to use an aggregate signature and just publish a new Merkle root with a security deposit, allow any relay transaction to specify and previous relay transaction as a parent, and have the contract maintain a kind of “internal blockchain” of relay transactions; the fork choice rule is “recursive first published” ie. if any parent has multiple children the child that was published earlier wins, but the fork choice would only be “finalized” after two weeks; in the meantime, anyone can challenge any computation that was made and claim a reward, in which case the entire subtree of that computation would be thrown out. I think that’s worth looking into as well, though it does have a liveness assumption.

kladkogex · September 25, 2018, 11:57am

In this case miners probably would not mind a L1 update since it will potentially bring them additional money. Also it seems that if miners are doing wasteful computations anyway and zksnarks require lots of computations, it is good to replace waste with zksnarks. I think one should not be totally scared of changing the main chain …

EliBenSasson · September 25, 2018, 12:33pm

Interesting thread! I believe this direction has lots of merit for scaling blockchains. It could be made more efficient (2 orders of magnitude) if you use STARKs instead of SNARKs:

The gas-cost-per-public-input of STARK is ~ 100X smaller than that of SNARK, meaning one can go with Vitalik’s initial, simpler, solution, rather than using a more complex NP statement in which the inputs are treated as part of the private input of the SNARK. (explanation: If there are N inputs, each input costs roughly 2 log N field operations for the STARK verifier, translating to less than 400 gas even when N=1,000,000. A SNARK verifier costs an elliptic curve operation for each input, and that’s ~ 40,000 gas).
As pointed out, proving time is the main bottleneck. Our (StarkWare’s) work-in-progress STARK prover is at least an order of magnitude faster than the fastest SNARK prover and we expect further improvements down the line. tl;dr 500 tx/sec should be doable with STARKs.

We’ll provide more details at the zk-STARK workshop in DevCon4, including exciting updates on proof size.

kladkogex · September 25, 2018, 1:36pm

Here is an idea how to fix it:

a) users submit transactions to the relayer
b) the relayer concatenates the transactions and sends back the concatenated transaction CT to all users
c) every user signs the CT and sends the signature back
d) the relayer concatenates the signatures into a Boldyreva BLS multisignature - note that Boldyreva multisignature verification takes constant time because the multisignature is on the same message
e) Then you do ZKSNARK on the Boldyreva multisignature verification which will be constant time (not linear in n). Hopefully this ZKSnark will be much faster since it will not be linear in the number of users.

Note that if one of the users witholds the signature on the concatenation it is not a problem -the relayer will wait for a timeout T and then create a subgroup multisignature for the users that responded.

kladkogex · September 25, 2018, 2:20pm

Nice :-))) Can the be done in parallel?? I think SNARKS can be done in parallel

Is the zk-STARK workshop going to be inside or outside of devcon4 walls? We have some crypto developers from our company coming to Prague but they were not able to buy a ticket …

denett · September 25, 2018, 5:24pm

I don’t think it requires a internal blockchain with a fork choice rule, because the contract can prevent a split and incorrect blocks can be rolled back.

I was thinking about the following:
Relayers register with a deposit. A relay block consists of the transaction data, the hash of the transaction data, previous state Merkle root and the new state Merkle root. The contract checks the hash of the transaction data and if the previous Merkle root matches.

When the block is incorrect, anybody can submit a proof and rollback the incorrect block and all blocks following it, collecting the deposits of the relayers.

For withdrawals we will need a waiting period (a week?). If you are in a hurry, you can offer a small fee, so that somebody else can volunteer to do the waiting for you. Withdrawals on the queue will be rolled back as well when they are based on an incorrect block, so the volunteers need to make sure the chain is correct

For every-bodies money to be save, one honest user needs verify the chain, once a week.

gluk64 · September 26, 2018, 3:36am

What about proof sizes as bottleneck? In Vitalik’s proposal STARKs would have to be included in root chain transactions and they are orders of magnite larger than SNARKs. Does StarkWare address this?