opML is All You Need: Run a 13B ML Model on Ethereum

0x1cc · October 24, 2023, 7:30am

Written by Conway (0x1cc) and Suning Yao (msfew)

TL;DR

Hyper Oracle’s zkOracle enables trustless arbitrary compute, unlocking onchain AI and ML. We present opML, a flexible and performant approach until we launch zkML.

opML (Optimistic Machine Learning) ports AI model inference and training/fine-tuning onchain using an optimistic verification mechanism.
opML provides ML proofs with low cost and high efficiency compared to zkML. The hardware requirements for opML are low: opML can run a large language model, e.g. a 7B-LLaMA (the model size is around 26GB) on a common PC without GPU, while zkML takes around 80 minutes to generate a proof of a 1M-nanoGPT model.
opML adopts a verification game (similar to Truebit and optimistic rollup systems) to guarantee decentralization and verifiability of the ML compute.
The conclusion paragraph for this post was generated by Llama 2 in opML with 10 seconds of time for loading the model into memory and 2 minutes of time for output generation in the CPU without quantization (unoptimized).
Hyper Oracle is the first open-source implementation of opML. You can use Stable Diffusion and LLaMA 2 in opML live here: https://ai.hyperoracle.io/. Try running a 13B Llama in opML now!

0. Onchain ML

Ethereum brings the possibility of onchain, verifiable computing. However, due to the network’s resource overhead and performance limitations, it is not yet possible to support fully onchain ML or AI computing on mainnet.

Implementing onchain ML or AI can give machine learning computation all the advantages of current onchain computing, including:

Fairness
Verifiability
Computational Validity
Transparency
Decentralization

Both opML and zkML are methods for verifiably proving that a specific model has been used to generate a particular output. This is useful for the following reasons:

Many ML models are closed-source, meaning there is a lack of transparency regarding aspects like parameters and model size.
Even if a model is open-source, there’s no concrete evidence to confirm that the specified model was actually used for a given output, rather than, for instance, arbitrarily generating the results.
With an increase in artificially generated content, a method to 100% prove that an output is valid can be useful. Consider proving that an image is the work of an AI model as opposed to an artist.

1. zkML

a) Introduction

Just as a ZK Rollup’s offchain computation is written as arithmetic circuits to generate a succinct zk proof of the computation, and thereafter the proof is verified onchain, machine learning models can be “circuitized” and proved on Ethereum.

In this way, we can use zkML to provide onchain inference of machine learning models.

b) Implementation

For the implementation of zkML, there are usually two approaches:

Specific ZK circuit: Implementing different specific ZK circuits for different models (thus meeting the different needs of each use case, such as floating point precision)
General zkML Runtime: Adding ZK to ML models with a common runtime (just like zkEVM for EVM programs)

However, the two solutions above still have to overcome some difficulties due to the nature of ZK.

c) Challenge

The biggest challenge of any zk system is the proving overhead (1,000x compute overhead).

This means that no matter how fast the evolution of zk proof systems or hardware/software/model-specific optimizations are, zkML performance will always be worse than normal ML inference, with inevitable proving overhead in terms of latency and cost.

In addition, there are several other issues that need to be addressed and optimized in zkML: quantization, memory usage, circuit size limit, etc.

Currently, only small models can be implemented as zkML. For larger models, such as GPT3.5, it is currently not possible due to the above limitations. Therefore, we need another solution to implement practical onchain ML for large models now.

2. opML

a) Introduction

ZK Rollups use zk proofs to prove the validity of computation, while Optimistic Rollups use fraud proofs to secure the computation.

An optimistic verification approach can be used for onchain ML technology in addition to the verification of rollup computation.

By comparison, zkML and opML secure onchain ML with different approaches:

zkML: performs ML inference computation + generates ZK proofs for each ML inference
opML: performs ML inference computation + re-executes a segment of ML inference computation onchain when the computation is challenged

opML solves the computationally expensive problem of zkML by generating proofs only when necessary.

b) Implementation Overview

For now, opML is very rare on the market (the only open source project is OPML-Labs/opml: OPML: OPtimistic Machine Learning on Blockchain (github.com)). For an opML implementation, two main components are needed:

Compiler: allows ML models to be interpreted onchain for fault proof challenges
Mechanism design: ensures a reasonable challenge period to ensure that the whole optimistic system is economically reasonable and the challenge can be censorship resistant

Compared to financial applications (e.g. rollup transfers or cross-chain applications), security for the computation of an ML model is less demanding. Such computation in ML may be inherently stochastic and not deterministic under some conditions. For a combination of reasons, we can reconsider the security requirements for the performance of onchain ML, but note that opML is still verifiable and trustless.

For opML the challenge period can be shorter, because it is not a rollup like Optimism, which involves a lot of financial operations and maintains a public ledger. When optimized, the challenge period can be like a sovereign rollup of a few minutes or even a few seconds. Overall, opML’s challenge period is likely to be less than 10 minutes, rather than the typical 7 days of OP Rollup.

c) Single-Phase Verification Game: Onchain ML Model in EVM

The one-phase pinpoint protocol works similarly to referred delegation of computation (RDoC), where two or more parties (with at least one honest party) are assumed to execute the same program. Then, the parties can challenge each other with a pinpoint style to locate the disputable step. The step is sent to a judge with weak computation power (smart contract on blockchain) for arbitration.

In one-phase opML:

We build a virtual machine (VM) for offchain execution and onchain arbitration, ensuring its equivalence between the offchain VM and the onchain VM implemented in a smart contract.
To ensure the efficiency of AI model inference in the VM, we have implemented a lightweight DNN library specifically designed for this purpose instead of relying on popular ML frameworks like Tensorflow or PyTorch. Additionally, a script is provided that can convert Tensorflow and PyTorch models to this lightweight library.
The cross-compilation technology has been applied to compile the AI model inference code into the VM program instructions.
The VM image is managed with a Merkle tree. Only the Merkle root stands for the VM state will be uploaded to the onchain smart contract.

The bisection protocol will help to locate the dispute step, then the step will be sent to the arbitration contract on the blockchain.

Performance: We have tested a basic AI model (a DNN model for MNIST classification) on a PC. With our unoptimized implementation, we are able to complete the DNN inference within one second in the VM, and the entire challenge process can be completed within one minute in a local Ethereum test environment.

d) Multi-Phase Verification Game: Unlock Full Potential with Acceleration

Limitations of One-Phase Pinpoint Protocol

The one-phase verification game has a critical drawback: all computation must be executed within the Virtual Machine (VM), preventing us from leveraging the full potential of GPU/TPU acceleration or parallel processing. Consequently, this restriction severely hampers the efficiency of large model inference, which also aligns with the current limitation of the referred delegation of the computation (RDoC) protocol.

Transitioning to a Multi-Phase Protocol

To address the constraints imposed by the one-phase protocol and ensure that opML can achieve performance levels comparable to the native environment, we propose an extension to a multi-phase protocol. With this approach, we only need to conduct the computation in the VM in the final phase, resembling the single-phase protocol. For other phases, we have the flexibility to perform computations that lead to state transitions in the native environment, leveraging the capabilities of CPU, GPU, TPU, or even parallel processing. By reducing the reliance on the VM, we significantly minimize overhead. This results in a remarkable enhancement in the execution performance of opML, almost akin to that of the native environment.

The following figure demonstrates a verification game consisting of two phases (k = 2). In phase-2, the process resembles that of a single-phase verification game, where each state transition corresponds to a single VM micro-instruction that changes the VM state. In phase-1, the state transition corresponds to a “Large Instruction” encompassing multiple micro-instructions that change the computation context.

The submitter and verifier will first start the verification game in phase-1 using a bisection protocol to locate the dispute step on a “Large Instruction”. This step will be sent to the next phase, phase-2. Phase-2 works like the single-phase verification game. The bisection protocol in phase-2 will help to locate the dispute step on a VM micro-instruction. This step will be sent to the arbitration contract on the blockchain.

To ensure the integrity and security of the transition to the next phase, we rely on the Merkle tree. This operation involves extracting a Merkle sub-tree from a higher-level phase, thus guaranteeing the seamless continuation of the verification process.

Multi-Phase opML

In this demonstration, we present a two-phase opML approach, as utilized in the LLaMA model:

The computation process of machine learning, specifically Deep Neural Networks (DNN), can be represented as a computation graph denoted as G. This graph consists of various computation nodes, capable of storing intermediate computation results.
DNN inference is essentially a computation process on the aforementioned computation graph. The entire graph can be considered as the inference state (computation context in phase-1). As each node is computed, the results are stored within that node, thereby advancing the computation graph to its next state.

Therefore, we can first conduct the verification game on the computation graph (at phase-1). On the phase-1 verification game, the computation on nodes of the graph can be conducted in a native environment using multi-thread CPU or GPU. The bisection protocol will help to locate the dispute node, and the computation of this node will be sent to the next phase (phase-2) bisection protocol.
In phase-2 bisection, we transform the computation of a single node into Virtual Machine (VM) instructions, similar to what is done in the single-phase protocol.

It is worth noting that we anticipate introducing a multi-phase opML approach (comprising more than two phases) when the computation on a single node within the computation graph remains computationally complex. This extension will further enhance the overall efficiency and effectiveness of the verification process.

Performance Improvement

Here, we present a concise discussion and analysis of our proposed multi-phase verification framework.

Suppose there are n nodes in the DNN computation graph, and each node needs to take m VM micro-instructions to complete the calculation. Assuming the speedup ratio on the calculation on each node using GPU or parallel computing is \alpha. This ratio represents the acceleration achieved through GPU or parallel computing and can reach significant values, often ranging from tens to even hundreds of times faster than VM execution.

Based on these considerations, we draw the following conclusions:

Two-phase opML outperforms single-phase opML, achieving a computation speedup of \alpha times. The utilization of multi-phase verification enables us to take advantage of the accelerated computation capabilities offered by GPU or parallel processing, leading to substantial gains in overall performance.
Two-phase opML reduces the space complexity of the Merkle tree. When comparing the space complexity of the Merkle trees, we find that in two-phase opML, the size is O(m + n), whereas in single-phase opML, the space complexity is significantly larger at O(mn). The reduction in space complexity of the Merkle tree further highlights the efficiency and scalability of the multi-phase design.

In summary, the multi-phase verification framework presents a remarkable performance improvement, ensuring more efficient and expedited computations, particularly when leveraging the speedup capabilities of GPU or parallel processing. Additionally, the reduced Merkle tree size adds to the system’s effectiveness and scalability, making multi-phase opML a compelling choice for various applications.

Performance: We have tested a 7B-LLaMA (the model size is around 26GB) on a common PC without GPU. Using our unoptimized implementation, the entire challenge process can be completed within 20 minutes in a local Ethereum test environment. (We can definitely make it faster using GPU and some optimization technologies of the fraud proof system!)

e) Consistency and Determinism

In opML, ensuring the consistency of ML results is of paramount importance.

During the native execution of DNN computations, especially across various hardware platforms, differences in execution results may arise due to the characteristics of floating-point numbers. For instance, parallel computations involving floating-point numbers, such as (a + b) + c versus a + (b + c), often yield non-identical outcomes due to rounding errors. Factors like programming language, compiler version, and operating system can also influence floating-point computations, causing further inconsistencies in ML results.

To tackle these challenges and guarantee the consistency of opML, we employ two key approaches:

Fixed-point arithmetic, also known as quantization technology, is adopted. This technique enables us to represent and perform computations using fixed precision rather than floating-point numbers. By doing so, we mitigate the effects of floating-point rounding errors, leading to more reliable and consistent results.
We leverage software-based floating-point libraries that are designed to function consistently across different platforms. These libraries ensure cross-platform consistency and determinism of the ML results, regardless of the underlying hardware or software configurations.

By combining fixed-point arithmetic and software-based floating-point libraries, we establish a robust foundation for achieving consistent and reliable ML results within the opML framework. This harmonization of techniques enables us to overcome the inherent challenges posed by floating-point variations and platform disparities, ultimately enhancing the integrity and dependability of opML computations.

3. opML v.s. zkML

a) Overview

	opML	zkML
model size	any size (available for extremely large models)	small/limited (due to the cost of ZKP generation)
validity proof	fraud proof	zero-knowledge proof
requirement	Any PC with CPU/GPU	large memory for ZK circuit
cost	low (inference and training can be conducted in native environment)	extremely high (generating a ZKP for ML inference is extremely high)
Finality	delay for challenge period	No delays (wait for ZKP generation)
security	crypto-economic incentives for security	cryptographic security

*: Within the current opML framework, our primary focus lies on the inference of ML models, allowing for efficient and secure model computations. However, it is essential to highlight that our framework also supports the training process, making it a versatile solution for various machine learning tasks.

b) Performance & Security

Both zkML and opML present distinct advantages and disadvant The main difference is flexibility (performance) and security.

zkML allows for on-chain ML execution with the level of highest security, but it is limited by proof generation time and other constraints.
opML facilitates on-chain ML execution with enhanced flexibility and performance, but is not secured by mathematics and cryptography like ZK.

The security model of opML is similar to the current optimistic rollup systems. With the fraud proof system, opML can provide an any-trust guarantee: any one honest validator can force opML to behave correctly.

Suppose there exists only one honest node, and all other nodes are malicious. When a malicious node provides an incorrect result on chain, the honest node will always check the result and find it incorrect. At that time, the honest node can challenge the malicious one. The dispute game (verification game) can force the honest node and the malicious node to pinpoint one concrete erroneous step. This step will be sent to the arbitration contract on the blockchain, and will find out the result provided by the malicious node is incorrect. Finally, the malicious node will be penalized and the honest node can get a corresponding reward.

c) Cost of Onchain ML

In terms of cost, opML has a great advantage over zkML due to the huge cost for generating ZKPs.

Untitled

Proof Generation Time

The proof time of zkML is long, and significantly grows with increasing model parameters. But for opML, computing the fraud proof can be conducted in the nearly-native environment in a short time.

Proof Time for zkML from ModulusLabs

Here is an example of a significantly long proving time of zkML. It runs a nanoGPT model inference with 1 million parameters inside a zkML framework and takes around 80 minutes of proving time.

Memory Consumption

The memory consumption of zkML is extremely huge compared to the native execution. But opML can be conducted in a nearly-native environment with nearly-native memory consumption. Specifically, opML can handle a 7B-LLaMA model (the model size is around 26GB) within 32GB memory. But the memory consumption for the circuit in zkML for a 7B-LLaMA model may be on the order of TB or even PB levels, making it totally impractical. Indeed, currently, no one can handle a 7B-LLaMA model in zkML.

Memory consumption for zkML from ModulusLabs

d) Finality

We can define the finalized point of zkML and opML as follows:

zkML: zk proof of ML inference is generated (and verified)
opML: challenge period of ML inference has ended

For the finality of the two, we can compare and conclude that: though opML needs to wait for the challenge period, zkML also needs to wait for the generation of the ZKP.

When the proving time of zkML is longer than the challenge period of opML (it happens when a ML model is large), opML can reach finality even faster than zkML!

When the ML model is large, opML is all you need!

Further Optimization with ZKFP (ZK Fraud Proof)

Furthermore, we can utilize multiple checkpoints and multi-step ZK verifier to provide a zk-fraud proof to speed up the finality of opML. Assuming that there are N steps in the opML fraud proof VM. Then the number of interactions would be O(\log_2(N) ). One optimization approach to reduce the number of interactions is to use a multi-step on-chain arbitrator, instead of only one-step on-chain arbitrator.

This can be done by using a zkVM, e.g., zkWASM, RISC Zero, Cairo VM, which can provide zk-proof for the multi-step execution. Suppose the zkVM can generate a zk-proof for m steps, then the number of interactions can be reduced from O(\log_2(N) ) to O(\log_2(N/m)). Besides, instead of using a bisection protocol with only 1 checkpoint, we can provide multiple checkpoints (for example k (k >1) checkpoints) in the dispute game (verification game). At this time, we can further reduce the number of interactions from O(\log_2(N/m) ) to O( \log_{k+1}(N/m) ). With these optimizations, we can greatly speed up the finality of opML, and it is possible to achieve almost instant confirmation!

For our previous work, we collaborated with EthStorage and Delphinus Lab on zkGo (a minimum modified Go compiler to produce Wasm code compatible with ZK prover). As a result of these modifications, the execution of the compiled L2 Geth Wasm code can now be run using zkWASM. This can be a solid foundation for ZKFP.

5. Conclusion

The following is a conclusion generated by LLaMA 2 in opML.
Loading the model into memory took 10 seconds, and generating output 2 minutes with CPU without quantization (unoptimized).

In conclusion, the optimistic approach to onchain AI and machine learning offers significant advantages over existing methods. By embracing this novel approach, we can harness the power of AI while ensuring the integrity and security of the underlying blockchain system. opML, in particular, has shown impressive results in terms of efficiency, scalability, and decentralization, making it an exciting development in the field of AI and blockchain technology. As the use cases for onchain AI expand, it is essential that we continue to explore innovative solutions like opML, which can help us unlock the full potential of these technologies while maintaining the highest standards of transparency, security, and ethical responsibility.

6. Acknowledgement

Thanks to Will Villanueva and Luke Pearson for their insightful feedback and review!

About Hyper Oracle

Hyper Oracle is a programmable zkOracle protocol that makes smart contracts smarter with arbitrary compute and richer data sources. Hyper Oracle offers full security and decentralization for trustless automation and onchain AI/ML so builders can easily create next-gen dApps.

hyperoracle.io | x.com/hyperoracle | github.com/hyperoracle

fewwwww · October 24, 2023, 7:36am

It’s great to showcase the work of opML, and here are some links for open-source contribution and testing:

Stable Diffusion and Llama 2 Live in opML: https://ai.hyperoracle.io/
GitHub repo of opML: GitHub - hyperoracle/opml: OPML: OPtimistic Machine Learning on Blockchain
Branch for running Llama 7B and 13B model: GitHub - hyperoracle/opml at llama
Wiki for opML: Home · hyperoracle/opml Wiki · GitHub

Willem-ChainSafe · October 25, 2023, 2:01am

Hi, Thanks for the interest post.

Something you didn’t touch on that might be worth mentioning is data availability of the input.

The security relies on any honest validator being able to replicate the computation and prove fraud but they can only do that if they have access to everything needed to repeat the computation - the program and the input data. It is explicitly stated that for this system only the Merkle root of the VM containing the input data is posted on-chain.

A viable attack would be to post the VM Merkle root and result of a (possibly fraudulent) inference computation but keep the input data hidden. No one would be able to replicate the computation to prove it fraudulent and the attacker could safely wait for the challenge period to elapse.

Rollups avoid this problem by ensuring all inputs are made available through calldata on the L1. A solution here might be to do the same or perhaps introduce a ‘data availability challenge’ mechanism whereby watchers can force the submitter to post the input data only if they believe it is being withheld.

0x1cc · October 25, 2023, 6:43am

You bring up a crucial aspect of security - data availability. To achieve the any-trust security level of opML (any one honest validator can force opML to behave correctly), we assume the data availability i.e., input data, and AI model should be accessible to all verifiers, which necessitates the use of a Data Availability (DA) layer. We are actively exploring various options for this aspect of opML. Introducing a ‘data availability challenge’ mechanism sounds interesting!

When it comes to ensuring data availability for input data, using calldata on L1 appears to be a practical solution, particularly for small input. However, when dealing with larger input data or accommodating user-uploaded AI models (if we allow users to upload their own trained models), we need to explore more efficient methods to make the data readily available to validators.

Data availability is a critical component of opML’s security model, and we appreciate your valuable input in this regard.

Mirror · October 25, 2023, 7:48am

I am glad to see this kind of work, but there are two points that need clarification in this article:

There is a significant difference in the security between the Sovereign OP expansion chain and the Ethereum OP expansion chain. Further explanation is needed regarding the experimental data and proof methods for the class of Sovereign OP expansion proposals.
The article mentions a comparison of the proof efficiency between the two, but it does not mention the core functionality of zkML: privacy ML. Can OPML replace zkML’s privacy ML capabilities? Please provide examples for parameter explanation and comparison.

0x1cc · October 25, 2023, 1:36pm

Thank you for your interest and questions.

Currently, our primary focus is on Ethereum adoption and implementation. You can refer to the opML repo for the implementation details. Regarding the Sovereign OP expansion, we may consider it as a future work.
Regarding privacy, opML, in its current form, does not offer the same level of privacy guarantees as zkML. In opML, all verifiers need access to the data, which inherently reduces privacy. If strong privacy guarantees are a priority, zkML remains a better choice.

Jommi · October 26, 2023, 7:09am

If we are to run arbitrary inference tasks on an ML model, and utilize an optimistic approach to achieve trustless compute - that means that part of the trustlessness comes from some kind of economic incentive to prove fault.

How would we approach that here? What would the asserter stake/bond? How much should they stake/bond?

0x1cc · October 26, 2023, 2:31pm

You’re absolutely correct. In an optimistic approach like opML, incentivizing validators to check and ensure the correctness of results is also important. Much like the current optimistic rollup systems, validators/submitters/challengers would be required to stake a certain amount of funds. If the submitter provides incorrect results and this is discovered by a challenger, the submitter will face a penalty in the form of a slashing, and the challenger will be eligible to claim the corresponding rewards.

The exact details of the incentive mechanism, including the specific amounts to be staked, are still under design and development. The aim is to create a mechanism that effectively encourages participants to act honestly and to challenge potentially incorrect results. You might find it helpful to refer to a paper published by Offchain Labs, the team behind Arbitrum, for insights into how such incentive mechanisms can be structured: Incentive Schemes for Rollup Validators.

Once our incentive mechanism design for opML is finalized, we will certainly share the details.

kartin · November 8, 2023, 1:07am

Empowering Ethereum smart contracts with LLaMA2 and Stable Diffusion heralds a new era for blockchain technology.

ChapeauDpaille · December 10, 2023, 7:47pm

This looks really interesting !
An architecture that aligns the DA mechanism and the incentive mechanism for the optimistic approach is necessary to achieve a sustainable economy.

fewwwww · February 11, 2024, 10:02pm

We published another post based on how opML can solve challenges of onchain AI proposed by Vitalik. zkML doesn’t work (now) because:

The zkML framework EZKL takes around 80 minutes to generate a proof of a 1M-nanoGPT model.
According to Modulus Labs, zkML has >>1000 times overhead than pure computation, with the latest report being 1000 times.
According to EZKL’s benchmark, the average proving time of RISC Zero is of 173 seconds for Random Forest Classification.

Here’s the full post if you want to read more:

0xemperor · February 14, 2024, 7:58am

Is there anywhere you write about the bisection protocol for neural networks more formally, or have implemented it?

0x1cc · February 14, 2024, 3:41pm

We can get a sequence by topologically sorting the nodes in the computation graph, and the bisection protocol is based on it. You can check the implementation in the opml github repository.

0x1cc · February 14, 2024, 3:54pm

Feel free to delve into the technical intricacies of opML outlined in our whitepaper: opML: Optimistic Machine Learning on Blockchain

Moreover, with opML, we introduce the Onchain AI Oracle (OAO), offering users direct access to AI models on-chain.

For an in-depth exploration of OAO, check out our blog: OAO: Onchain AI Oracle — ORA

You can also find additional insights on our Twitter post: Onchain AI Oracle