On-chain scaling to potentially ~500 tx/sec through mass tx validation

vbuterin · September 22, 2018, 12:06pm

I know there has been work recently on generating large ZK-SNARKs in clusters. I think in the long run, proof generation will in many cases be done by a specialized class of “miners” running GPUs and more specialized hardware. I would definitely love to see data from someone trying to make a full scalable implementation! The nice thing about the scheme is that it’s not vulnerable to 51% attacks, so if bitmain ends up producing 90% of the proofs that’s totally fine, as long as there is someone to step in and take their place if they suddenly stop participating.

barryWhiteHat · September 22, 2018, 2:27pm

The cost of a ZK-SNARK verification with the latest protocols is ~600000 gas, and we can add ~50000 gas for overhead (base tx cost, log cost, storage slot modifications…). Otherwise, each transaction included costs 68 gas per byte, unless it is a zero byte in which case it goes down to 4 gas.

I think you need to pass the transactions to the sanrk as a public input?

If so I think this is quite expensive as during zksnark verification you need to do a pairing addition and pairing multiplication per public input. I found this out after we spoke last. With roll_up where we pass two 32 bytes input per tx we can only have 22 public inputs before we run out of gas.

So we have thought a bunch about how to overcome this at roll_up. We originally wanted to pass every leaf that is updated as a public input. But we can only do 22 transactions or 44 inputs, tho we were not using the latest snark verification method.

So we started thinking about this and thought about making a merkle tree of the public inputs and pass that. Pass all the other inputs privately (which is free) and then build the same merkle tree inside the snark and ensure that they match.

But the problem with this is that if you use sha256 which is cheap in the EVM it gets very expensive inside the snark in terms of proving time. If you use pedersen commitments it is very cheap in the snark but you pay a lot more in the evm, (we think atleast i have not gotten around to testing that yet :/) .

There are some very interesting proposals for hash functions that are cheap inside the snark and the evm see @HarryR mimic based construction but that needs some academic review before i think we can safely use it. So that could be useful for compressing the public input.

PhABC · September 22, 2018, 2:29pm

To register, a user needs to provide a Merkle branch showing some index i, where either i=0 and A[i]=0 , or i > 0 and A[i] = 0 and A[i-1] != 0 . The Merkle tree is updated so that A[i] now equals the msg.sender’s address,

This seems to only allow 1 registration per block in practice. If both Bob and Alice try to update the same Merkle root for slot i, then one will fail.

To deposit or withdraw, a user needs to provide a Merkle branch showing some index i, where A[i] equals the msg.sender’s address, along with the corresponding branch for B[i] , and the amount they want to deposit or withdraw, D… The contract then updates the Merkle root so that B[i] += D .

This also seems hard to achieve in practice, since users technically always need to know what the current root is in order to provide the proper path and the root changes with every transaction.

Note that anyone can be a relayer; there is no assumption of even an untrusted special “operator” existing. Because all the data needed to update the Merkle tree goes on chain, there are no data availability issues.

This also has the problem that having multiple relayers will lead to collisions and only one will succeed. This introduce a race condition to generate the proof, which can be fine, but now relayers might start to only include the highest fees to make sure they don’t lose the race, so higher cost for everyone.

I believe most of these concerns are addressed by having an operator or close set of operators and a deposit/withdraw queue.

kladkogex · September 22, 2018, 4:22pm

I think you need to provide analysis for double spend attacks - what if there are multiple relayers, and you try to double spend by going through two relayers concurrently?

When you deposit do you deposit to a single relayer or multiple relayers?

PhABC · September 22, 2018, 4:27pm

AFAIK, you can’t double spend since the root will update to reflect the new state and all txs need to be sequentially processed and be valid.

kladkogex · September 22, 2018, 4:48pm

I wonder if this is similar to what I proposed sometime ago ;-)) I did not have zksnarks in it though )

MrChico · September 22, 2018, 7:13pm

Wouldn’t transactors need to include a nonce as well?

kfichter · September 22, 2018, 7:41pm

There’s an edge case here for ERC20s where the value received via transferFrom is less than D. Not huge, but something to consider for implementation.

kfichter · September 22, 2018, 7:53pm

We might be able to get away with only having the transaction info in calldata, but otherwise logs would add another 8 gas per byte. Maybe worth having it in the marginal cost (instead of the 50k overhead) since it scales with the number of txs.

vbuterin · September 22, 2018, 7:58pm

I think you need to pass the transactions to the sanrk as a public input? If so I think this is quite expensive as during zksnark verification you need to do a pairing addition and pairing multiplication per public input. I found this out after we spoke last. With roll_up where we pass two 32 bytes input per tx we can only have 22 public inputs before we run out of gas.

This is why I suggested putting using hash of the submitted data as the public input, and then computing the hash on chain as that is only 6 gas per 32 bytes for SHA3 (for SHA256 it’s somewhat more but still tiny). FWIW I do agree this whole thing is expensive in terms of prover time, though given that I expect the relayers will be GPU farms so it’s less of an issue than it is in, say, zcash where regular nodes need to be able to make proofs in a few seconds.

This seems to only allow 1 registration per block in practice. If both Bob and Alice try to update the same Merkle root for slot i , then one will fail.

An alternative is to allow relayers to batch Merkle proofs, though they would not be able to be paid for registrations. Topups and withdrawals can definitely be done through relayers in a way where the relayers get paid though.

This also has the problem that having multiple relayers will lead to collisions and only one will succeed. This introduce a race condition to generate the proof, which can be fine, but now relayers might start to only include the highest fees to make sure they don’t lose the race, so higher cost for everyone.

One option would be to require relayers to reserve the chain for a period of 2 blocks (possibly <30000 gas cost) before they can submit a block.

I believe most of these concerns are addressed by having an operator or close set of operators and a deposit/withdraw queue.

By “queue” do you mean “everyone pre-submits what address they’ll register with, and then that gets put in a queue, and then people have to submit a second transaction to make the Merkle branch, but they’ll do that knowing in advance exactly what Merkle branch they’ll need”? If so I do agree that’s another elegant way to solve the problem.

I wonder if this is similar to what I proposed sometime ago ;-)) I did not have zksnarks in it though )

This scheme is definitely not Plasma. Plasma relies on liveness assumptions and an exit mechanism; there are none of those features here.

Wouldn’t transactors need to include a nonce as well?

Ah yes you’re right. Will update asap.

There’s an edge case here for ERC20s where the value received via transferFrom is less than D . Not huge, but something to consider for implementation.

If the value receives is less than D, then the entire transaction should fail. Actually, is it even legal according to the standard for a transferFrom request asking for D coins to return less than D coins?

We might be able to get away with only having the transaction info in calldata, but otherwise logs would add another 8 gas per byte. Maybe worth having it in the marginal cost (instead of the 50k overhead) since it scales with the number of txs.

Not need to log the data. Just have a log pointing out that a relay transaction was made, an assert to check that the relayer is an EOA, and then clients can scan through the block data.

nickjohnson · September 22, 2018, 8:06pm

Is it actually practical to do ECRecover in a snark? Wouldn’t that require an extremely complex snark circuit?

vbuterin · September 22, 2018, 8:07pm

With the baby jubjub curve, it might actually not be that bad.

PhABC · September 22, 2018, 11:57pm

Checking is a signature is valid for eddsa (with baby jubjub curve) takes about 200k constaints in roll_up, which is maybe 30 seconds to build a proof on a decent laptop. This can most likely be brought down quite a bit still however. Using secp2561curve will be more computationally demanding

vbuterin · September 23, 2018, 12:49am

400k seems incredibly high. Why is that? If we use schnorr verifying a signature is just two multiplications, an addition and a hash, so on average ~769 additions and a hash, so it seems like the hash would be the largest part of it.

PhABC · September 23, 2018, 2:02am

Sorry, I meant 200k, I just edited previous comment. Roll-up currently uses full-rounds of SHA256, so we could cut the number of contraints here almost by half for the hash function. Otherwise, there are a lot of checks done currently that might not be necessary.

Here’s where the contraints are built for the signature validation ;

github.com

barryWhiteHat/baby_jubjub_ecc/blob/620dbb661a8a24b29eb92fd488201b988609db9e/baby_jubjub_ecc/eddsa.cpp#L127


    jubjub_isOnCurve2.reset( new isOnCurve <FieldT> (pb, r_x_packed[0], r_y_packed[0], a, d, "Confirm r point is on the twiseted edwards curve"));


    jubjub_pointMultiplication_lhs.reset( new pointMultiplication <FieldT> (pb, a, d, b_x, b_y, S, lhs_x, lhs_y, " lhs check ", 256));
    jubjub_pointMultiplication_rhs.reset( new pointMultiplication <FieldT> (pb, a, d, pk_x_packed[0], pk_y_packed[0], h_bits, rhs_mul_x, rhs_mul_y, "rhs mul ", 253));
    jubjub_pointAddition.reset( new pointAddition <FieldT> (pb, a, d, rhs_mul_x[252], rhs_mul_y[252] , r_x_packed[0] , r_y_packed[0], rhs_x, rhs_y , "rhs addition"));
}






template<typename FieldT, typename HashT>
void eddsa<FieldT, HashT>::generate_r1cs_constraints()
{


    //constraint the inputs a,d , x_base, y_base
    //we make sure that the user passes the curve values
    this->pb.add_r1cs_constraint(r1cs_constraint<FieldT>({a} , {1}, {168700}),
                           FMT("a == 168700", "eddsa"));
    this->pb.add_r1cs_constraint(r1cs_constraint<FieldT>({d} , {1}, {168696}),
                           FMT("d == 168696", "eddsa"));


    this->pb.add_r1cs_constraint(r1cs_constraint<FieldT>({b_x} , {1}, {FieldT("17777552123799933955779906779655732241715742912184938656739573121738514868268")}),

barryWhiteHat · September 23, 2018, 2:06am

FWIW I do agree this whole thing is expensive in terms of prover time, though given that I expect the relayers will be GPU farms so it’s less of an issue than it is in, say, zcash where regular nodes need to be able to make proofs in a few seconds.

In the case where you use a GPU farm to perform the proving step do you also need a GPU farm to perform the trusted setup? How do the two operations differ in complexity?

400k seems incredibly high. Why is that? If we use schnorr verifying a signature is just two multiplications, an addition and a hash, so on average ~769 additions and a hash, so it seems like the hash would be the largest part of it.

I use sha256 in a bunch of places because i was unsure what was safe to combine points or remove discard data. But when i optimize it we should be able to come down quite a lot. Its POC at the moment.

gluk64 · September 23, 2018, 3:21am

The correct sum is slightly higher: 68 * 3 (from) + 68 * 3 (to) + 68 * 1 (fee) + 68 * 4 * 2 (amount) + 68 * 2 (nonce), or 1156 gas. A more realistic assumption for the foreseeable future is 1000 tx per SNARK (according to very optimistic calculations we had with the rollup team). So it adds roughly 650 overhead, bringing us to ~1.8k gas per tx.

A batched transfer of tokens with the same parameters (32 bit value, 24 bit address) will cost around 18k gas per transfer. So it’s 10x, not 50x.

10x gas reduction is an order of magintude improvement, of course, but it has a relatively strong cap, and the economic overhead imposed by the SNARK proof computation is also very signficant at the moment. Especially the need to compute hashes optimized for verification in EVM.

Solving the data availability problem through relying on the Ethereum root chain not as 100% data availability guarantor, but rather as the court of the final appeal seems a lot more promising direction, because: 1) it can scale almost indefintely, 2) it simplifies the SNARK circuit, making transfers cheaper and a working MVP a more attainable goal. See our discussion in the roll_up github.

vbuterin · September 23, 2018, 10:34am

Why? The amount is a single value.

A batched transfer of tokens with the same parameters (32 bit value, 24 bit address) will cost around 18k gas per transfer. So it’s 10x, not 50x.

10x relative to batched transfers. The status quo does not involve batched transfers in most cases; in the 50x I’m including efficiency gains both from SNARK validation and from compression and a batching relay protocol.

Solving the data availability problem through relying on the Ethereum root chain not as 100% data availability guarantor, but rather as the court of the final appeal seems a lot more promising direction, because: 1) it can scale almost indefintely, 2) it simplifies the SNARK circuit, making transfers cheaper and a working MVP a more attainable goal. See our discussion in the roll_up github

I think both directions are valuable; using the ethereum chain as a data availability guarantor makes many other things simpler, and in the long run scalable “sharded” ethereum is basically being designed as a data availability guarantor first and foremost so it’s a quite sustainable long-term direction (see layer 2 execution engines as described here; what I’m describing here is in some sense a candidate for the first layer 2 execution engine). My main short-term concern with Plasma-like constructions is that they depend on specific operators to stay online, and Plasma is inherently non-universal in some respects because of the extra game theoretic requirements.

loredanacirstea · September 23, 2018, 10:39am

What I did not understand is how is the correct order of transactions in a batch actually enforced?

vbuterin · September 23, 2018, 10:50am

What I did not understand is how is the correct order of transactions in a batch actually enforced?

The SNARK proves that each transaction is valid in its position in the batch.