A zk-evm specification

This is amazing work!
Are there any even draft implementations of this?
How long would you estimate that it would take to prove EVM execution (in seconds per gas)?
What would you expect are the least efficient (in seconds per gas) operations?

We don’t have such an estimate as of now - we are just starting the implementation. We think however that the prover time will likely be dominated by commitment generation - of which there are very many. It also means that there will be a lot of parallelization. As for the second per gas efficiency, we believe that the least efficient operations would be those that involve modular arithmetics (MOD, ADDMOD, MULMOD), on 256 bits, as they involve a significant number of costly word comparison operations.

Are you sure that all the operations (including calls and returning from calls) actually do cost O(1) constraints (or at most O(log(n)) per operation?

That is an excellent question: while, for a given clock cycle, the total number of constraints to verify does not depend on the opcode being executed- for some opcodes (CALL/RETURN opcodes for instance), the execution spans multiple clock cycles, which increases the total size of the commitments to generate. For instance, for a CALL opcode, one has to prove/verify that the whole CALLDATA has been loaded correctly in the new execution environment - the number of clock cycles will scale linearly with the length of the CALLDATA. Besides, one has to initialize the new RAM (and in some cases the new Storage) of the contract being called - the number of clockcycles will scale linearly with the total maximum size of the RAM (in EVM words) at the end of execution of the called contract (for the storage it will scale linearly with the number of storage cells accessed). A similar reasoning applies to the RETURN opcode as the number of clockcycles needed to process the opcode depends on the total length of the data returned to the caller contract.

Thanks a lot for your feedback. Please don’t hesitate if you have any other comment on the specification, or if you would like us to clarify some points !

3 Likes

@tchapuischkaiban Hello, I found a suspected error while reading the paper, is there a mistake in the memory address here?

There is a small typo here indeed. The memory address range should be [0x13a2; 0x13ab] for the location of the returned values (instead of [0x13a2; 0x12ab]). Is it what you were thinking about ? Thanks for pointing this out !

Besides, there is another typo for the second interval, the address range should be [0xaabe, 0xaac7] instead of [0xaace, 0xaac3].

1 Like

As a matter of convention, shouldn’t the smaller number in the range come first?

1 Like

Hi, I think the address range should be [0x13a2, 0x13aa] which include nine bytes, the same as [0xaabe, 0xaac7]

Is there any link (for example, on arxiv/iacr) if this paper get revised?

Hi, I think the address range should be [0x13a2, 0x13aa] which include nine bytes, the same as [0xaabe, 0xaac7]

@JiangXb-son, No, 0x13aa - 0x13a2 == 8, while 0xaac7 - 0xaabe == 9.

Is there any link (for example, on arxiv/iacr) if this paper get revised?

@HAOYUatHZ We haven’t put the document on arXiv/iacr or submitted it to a conference. We may do it in the future though. Of course, we will publish the link here if it happens.

In word comparison, I see you used a table from 256x256 of all byte comparisons. Couldn’t you have a single table from 256 to positive/negative, and plookup on B1-B2 instead?
Maybe this will even let you do this comparison in 16bit words instead.

3 Likes

Hi @spapinistarkware, you raise an interesting point! It seems to me that your solution works and allows for a smaller lookup table indeed, but would also require slightly restructuring the constraints of the Word Comparison module. We would either have to do

  • 2 separate plookup checks (one to verify the “Bytehood” of B1 and B2, a second one to verify the sign of their difference),
  • or verify the “Bytehood” of B1 and B2 in the execution trace itself and perform 1 plookup check to get the sign of the difference,
  • or do everything directly in the execution trace.

Currently our one plookup proof takes care of both “Bytehood” check and the comparison bit B1 < B2. I’m not sure which solution is best.

2 Likes

Supposed to be a closed interval? isn`t it?
Looking for your reply

Hi, @OlivierBBB I have a question that on page17, the Prev_PC at step104 should be 92?

Supposed to be a closed interval? isn`t it?
Looking for your reply

Judging from the numerical example this interval is closed on its left bound and open for the right bound (indeed, we should correct the right range symbol)

Hi, @OlivierBBB I have a question that on page17, the Prev_PC at step104 should be 92?

No, 93 is the right value here: when we return from the smart contract being called, we have to execute the instruction that comes immediately after CALL (which is at PC 93)
Following the same logic, the previous PC for the steps 355-360 should be 355 (that’s another typo).

Ok, I know the ideas. thanks, @tchapuischkaiban for your reply.

@tchapuischkaiban Hi, I have some questiions about ram:

  1. As described in example 2.1.2, if read/write a word which interior offset is not 0, parent will dispatch two task, let child read/write one word address after another which means child read/write only one word address each time. But why 6.1 say “Store to/load from the memory at most 32 bytes at two consecutive word addresses”? If child read/write only one word address each time, shouldn’t CRAM_BWd_Offset always be equal to Curr_Wd_Offset?
  2. In child ram, data is read/write word by word or byte by byte in a row?

This is great and very detailed descriptions of the operational mode.

If there is an operational semantics for this zk-extension, it could be fed into existing models of the EVM, e.g. the K-framework. https://github.com/kframework/evm-semantics
This may help designing a prototype implementation/simulator quickly and, as the K-tools generate the simulator you may be able to debug/tune the semantics.

2 Likes

Nice spec :slight_smile:

To improve the readability of this paper, we have chosen to provide full constraint systems for a few (representative) modules only, which we strive to describe as comprehensively as possible. Other modules, like the main execution trace and the storage module, have been fully designed - however, given the complexity of these components, we chose to postpone slightly their publication. If interested, you may contact us directly for more information on these modules.

Could you please see if you can share the other modules you designed? My curiosity is piqued. Thanks!

@tchapuischkaiban Hi, I have some questiions about ram:

  • As described in example 2.1.2, if read/write a word which interior offset is not 0, parent will dispatch two task, let child read/write one word address after another which means child read/write only one word address each time. But why 6.1 say “Store to/load from the memory at most 32 bytes at two consecutive word addresses”? If child read/write only one word address each time, shouldn’t CRAM_BWd_Offset always be equal to Curr_Wd_Offset?
  • In child ram, data is read/write word by word or byte by byte in a row?

@Softcloud:

  • First note that, in any case (even if the interior offset is 0), the parent RAM will treat the RETURN operation (the one of example 2.1.2, and any operation that requires interaction between two different memories) using a succession of READ/WRITE requests to the child RAM. Besides, you are right: what is described in the example 2.1.2 is an optimization. If the starting reading interior offset is non zero, we have deliberately chosen to read the last bytes of the first word of the called contract returned memory to perform more efficiently the further READ operations (that would have a zero starting interior offset) - hence, we do not exploit deliberately the fact that one can read/write at most two consecutive word addresses. This read/write at two consecutive word addresses property is however very useful for operations like MSTORE/MLOAD as they can be sent as a single request to the child RAM.
  • In the child RAM we can read/write single bytes of the 32-byte words contained at word multiple addresses - hence, in a single row we can either read/modify a single byte of a 32-byte word, or read/replace the full 32-byte word contained at a given word multiple address (optimization for requests that have a zero interior offset).

This is great and very detailed descriptions of the operational mode.
If there is an operational semantics for this zk-extension, it could be fed into existing models of the EVM, e.g. the K-framework.
This may help designing a prototype implementation/simulator quickly and, as the K-tools generate the simulator you may be able to debug/tune the semantics.

@franck44, thanks a lot for your positive feedback, we will have a look at the K-framework !

Could you please see if you can share the other modules you designed? My curiosity is piqued. Thanks!

@spartucus, we will provide the full low level module design in another publication that would be accompanied with some performance metrics from our implementation.

3 Likes

Sin7y Labs, the research team behind Ola, has made a Chinese translation of this paper. I’ll attach it here for anyone who is interested. Please let us know if you have any questions.

2 Likes

This is interesting, thank you for this effort. Have you had a look at our updated spec? Besides some general ideas nothing of the document from this post / thread will make it into our zk-evm. BTW: the updated spec I linked to previously is also out of date (e.g. we redid arithmetic from the ground up.)

Thanks for the feedback! We’ve been studying the latest specifications recently, which was amazing. The Chinese translation attached above was done last year. We never wanted to publish it until being asked by many people interested in zkEVM :sweat_smile:. We’ll continue our work in this field and hopefully help more people understand the history, the status quo, and the future of zkEVM.

1 Like