TL;DR
Hyper Oracle’s zkOracle enables trustless arbitrary compute, unlocking onchain AI and ML. We present opML, a flexible and performant approach until we launch zkML.
 opML (Optimistic Machine Learning) ports AI model inference and training/finetuning onchain using an optimistic verification mechanism.
 opML provides ML proofs with low cost and high efficiency compared to zkML. The hardware requirements for opML are low: opML can run a large language model, e.g. a 7BLLaMA (the model size is around 26GB) on a common PC without GPU, while zkML takes around 80 minutes to generate a proof of a 1MnanoGPT model.
 opML adopts a verification game (similar to Truebit and optimistic rollup systems) to guarantee decentralization and verifiability of the ML compute.
 The conclusion paragraph for this post was generated by Llama 2 in opML with 10 seconds of time for loading the model into memory and 2 minutes of time for output generation in the CPU without quantization (unoptimized).
 Hyper Oracle is the first opensource implementation of opML. You can use Stable Diffusion and LLaMA 2 in opML live here: https://ai.hyperoracle.io/. Try running a 13B Llama in opML now!
0. Onchain ML
Ethereum brings the possibility of onchain, verifiable computing. However, due to the network’s resource overhead and performance limitations, it is not yet possible to support fully onchain ML or AI computing on mainnet.
Implementing onchain ML or AI can give machine learning computation all the advantages of current onchain computing, including:
 Fairness
 Verifiability
 Computational Validity
 Transparency
 Decentralization
Both opML and zkML are methods for verifiably proving that a specific model has been used to generate a particular output. This is useful for the following reasons:
 Many ML models are closedsource, meaning there is a lack of transparency regarding aspects like parameters and model size.
 Even if a model is opensource, there’s no concrete evidence to confirm that the specified model was actually used for a given output, rather than, for instance, arbitrarily generating the results.
 With an increase in artificially generated content, a method to 100% prove that an output is valid can be useful. Consider proving that an image is the work of an AI model as opposed to an artist.
1. zkML
a) Introduction
Just as a ZK Rollup’s offchain computation is written as arithmetic circuits to generate a succinct zk proof of the computation, and thereafter the proof is verified onchain, machine learning models can be “circuitized” and proved on Ethereum.
In this way, we can use zkML to provide onchain inference of machine learning models.
b) Implementation
For the implementation of zkML, there are usually two approaches:
 Specific ZK circuit: Implementing different specific ZK circuits for different models (thus meeting the different needs of each use case, such as floating point precision)
 General zkML Runtime: Adding ZK to ML models with a common runtime (just like zkEVM for EVM programs)
However, the two solutions above still have to overcome some difficulties due to the nature of ZK.
c) Challenge
The biggest challenge of any zk system is the proving overhead (1,000x compute overhead).
This means that no matter how fast the evolution of zk proof systems or hardware/software/modelspecific optimizations are, zkML performance will always be worse than normal ML inference, with inevitable proving overhead in terms of latency and cost.
In addition, there are several other issues that need to be addressed and optimized in zkML: quantization, memory usage, circuit size limit, etc.
Currently, only small models can be implemented as zkML. For larger models, such as GPT3.5, it is currently not possible due to the above limitations. Therefore, we need another solution to implement practical onchain ML for large models now.
2. opML
a) Introduction
ZK Rollups use zk proofs to prove the validity of computation, while Optimistic Rollups use fraud proofs to secure the computation.
An optimistic verification approach can be used for onchain ML technology in addition to the verification of rollup computation.
By comparison, zkML and opML secure onchain ML with different approaches:
 zkML: performs ML inference computation + generates ZK proofs for each ML inference
 opML: performs ML inference computation + reexecutes a segment of ML inference computation onchain when the computation is challenged
opML solves the computationally expensive problem of zkML by generating proofs only when necessary.
b) Implementation Overview
For now, opML is very rare on the market (the only open source project is OPMLLabs/opml: OPML: OPtimistic Machine Learning on Blockchain (github.com)). For an opML implementation, two main components are needed:
 Compiler: allows ML models to be interpreted onchain for fault proof challenges
 Mechanism design: ensures a reasonable challenge period to ensure that the whole optimistic system is economically reasonable and the challenge can be censorship resistant
Compared to financial applications (e.g. rollup transfers or crosschain applications), security for the computation of an ML model is less demanding. Such computation in ML may be inherently stochastic and not deterministic under some conditions. For a combination of reasons, we can reconsider the security requirements for the performance of onchain ML, but note that opML is still verifiable and trustless.
For opML the challenge period can be shorter, because it is not a rollup like Optimism, which involves a lot of financial operations and maintains a public ledger. When optimized, the challenge period can be like a sovereign rollup of a few minutes or even a few seconds. Overall, opML’s challenge period is likely to be less than 10 minutes, rather than the typical 7 days of OP Rollup.
c) SinglePhase Verification Game: Onchain ML Model in EVM
The onephase pinpoint protocol works similarly to referred delegation of computation (RDoC), where two or more parties (with at least one honest party) are assumed to execute the same program. Then, the parties can challenge each other with a pinpoint style to locate the disputable step. The step is sent to a judge with weak computation power (smart contract on blockchain) for arbitration.
In onephase opML:
 We build a virtual machine (VM) for offchain execution and onchain arbitration, ensuring its equivalence between the offchain VM and the onchain VM implemented in a smart contract.
 To ensure the efficiency of AI model inference in the VM, we have implemented a lightweight DNN library specifically designed for this purpose instead of relying on popular ML frameworks like Tensorflow or PyTorch. Additionally, a script is provided that can convert Tensorflow and PyTorch models to this lightweight library.
 The crosscompilation technology has been applied to compile the AI model inference code into the VM program instructions.
 The VM image is managed with a Merkle tree. Only the Merkle root stands for the VM state will be uploaded to the onchain smart contract.

The bisection protocol will help to locate the dispute step, then the step will be sent to the arbitration contract on the blockchain.
Performance: We have tested a basic AI model (a DNN model for MNIST classification) on a PC. With our unoptimized implementation, we are able to complete the DNN inference within one second in the VM, and the entire challenge process can be completed within one minute in a local Ethereum test environment.
d) MultiPhase Verification Game: Unlock Full Potential with Acceleration
Limitations of OnePhase Pinpoint Protocol
The onephase verification game has a critical drawback: all computation must be executed within the Virtual Machine (VM), preventing us from leveraging the full potential of GPU/TPU acceleration or parallel processing. Consequently, this restriction severely hampers the efficiency of large model inference, which also aligns with the current limitation of the referred delegation of the computation (RDoC) protocol.
Transitioning to a MultiPhase Protocol
To address the constraints imposed by the onephase protocol and ensure that opML can achieve performance levels comparable to the native environment, we propose an extension to a multiphase protocol. With this approach, we only need to conduct the computation in the VM in the final phase, resembling the singlephase protocol. For other phases, we have the flexibility to perform computations that lead to state transitions in the native environment, leveraging the capabilities of CPU, GPU, TPU, or even parallel processing. By reducing the reliance on the VM, we significantly minimize overhead. This results in a remarkable enhancement in the execution performance of opML, almost akin to that of the native environment.
The following figure demonstrates a verification game consisting of two phases (k = 2). In phase2, the process resembles that of a singlephase verification game, where each state transition corresponds to a single VM microinstruction that changes the VM state. In phase1, the state transition corresponds to a “Large Instruction” encompassing multiple microinstructions that change the computation context.
The submitter and verifier will first start the verification game in phase1 using a bisection protocol to locate the dispute step on a “Large Instruction”. This step will be sent to the next phase, phase2. Phase2 works like the singlephase verification game. The bisection protocol in phase2 will help to locate the dispute step on a VM microinstruction. This step will be sent to the arbitration contract on the blockchain.
To ensure the integrity and security of the transition to the next phase, we rely on the Merkle tree. This operation involves extracting a Merkle subtree from a higherlevel phase, thus guaranteeing the seamless continuation of the verification process.
MultiPhase opML
In this demonstration, we present a twophase opML approach, as utilized in the LLaMA model:
 The computation process of machine learning, specifically Deep Neural Networks (DNN), can be represented as a computation graph denoted as G. This graph consists of various computation nodes, capable of storing intermediate computation results.
 DNN inference is essentially a computation process on the aforementioned computation graph. The entire graph can be considered as the inference state (computation context in phase1). As each node is computed, the results are stored within that node, thereby advancing the computation graph to its next state.
 Therefore, we can first conduct the verification game on the computation graph (at phase1). On the phase1 verification game, the computation on nodes of the graph can be conducted in a native environment using multithread CPU or GPU. The bisection protocol will help to locate the dispute node, and the computation of this node will be sent to the next phase (phase2) bisection protocol.
 In phase2 bisection, we transform the computation of a single node into Virtual Machine (VM) instructions, similar to what is done in the singlephase protocol.
It is worth noting that we anticipate introducing a multiphase opML approach (comprising more than two phases) when the computation on a single node within the computation graph remains computationally complex. This extension will further enhance the overall efficiency and effectiveness of the verification process.
Performance Improvement
Here, we present a concise discussion and analysis of our proposed multiphase verification framework.
Suppose there are n nodes in the DNN computation graph, and each node needs to take m VM microinstructions to complete the calculation. Assuming the speedup ratio on the calculation on each node using GPU or parallel computing is \alpha. This ratio represents the acceleration achieved through GPU or parallel computing and can reach significant values, often ranging from tens to even hundreds of times faster than VM execution.
Based on these considerations, we draw the following conclusions:
 Twophase opML outperforms singlephase opML, achieving a computation speedup of \alpha times. The utilization of multiphase verification enables us to take advantage of the accelerated computation capabilities offered by GPU or parallel processing, leading to substantial gains in overall performance.
 Twophase opML reduces the space complexity of the Merkle tree. When comparing the space complexity of the Merkle trees, we find that in twophase opML, the size is O(m + n), whereas in singlephase opML, the space complexity is significantly larger at O(mn). The reduction in space complexity of the Merkle tree further highlights the efficiency and scalability of the multiphase design.
In summary, the multiphase verification framework presents a remarkable performance improvement, ensuring more efficient and expedited computations, particularly when leveraging the speedup capabilities of GPU or parallel processing. Additionally, the reduced Merkle tree size adds to the system’s effectiveness and scalability, making multiphase opML a compelling choice for various applications.
Performance: We have tested a 7BLLaMA (the model size is around 26GB) on a common PC without GPU. Using our unoptimized implementation, the entire challenge process can be completed within 20 minutes in a local Ethereum test environment. (We can definitely make it faster using GPU and some optimization technologies of the fraud proof system!)
e) Consistency and Determinism
In opML, ensuring the consistency of ML results is of paramount importance.
During the native execution of DNN computations, especially across various hardware platforms, differences in execution results may arise due to the characteristics of floatingpoint numbers. For instance, parallel computations involving floatingpoint numbers, such as (a + b) + c versus a + (b + c), often yield nonidentical outcomes due to rounding errors. Factors like programming language, compiler version, and operating system can also influence floatingpoint computations, causing further inconsistencies in ML results.
To tackle these challenges and guarantee the consistency of opML, we employ two key approaches:
 Fixedpoint arithmetic, also known as quantization technology, is adopted. This technique enables us to represent and perform computations using fixed precision rather than floatingpoint numbers. By doing so, we mitigate the effects of floatingpoint rounding errors, leading to more reliable and consistent results.
 We leverage softwarebased floatingpoint libraries that are designed to function consistently across different platforms. These libraries ensure crossplatform consistency and determinism of the ML results, regardless of the underlying hardware or software configurations.
By combining fixedpoint arithmetic and softwarebased floatingpoint libraries, we establish a robust foundation for achieving consistent and reliable ML results within the opML framework. This harmonization of techniques enables us to overcome the inherent challenges posed by floatingpoint variations and platform disparities, ultimately enhancing the integrity and dependability of opML computations.
3. opML v.s. zkML
a) Overview
opML  zkML  

model size  any size (available for extremely large models)  small/limited (due to the cost of ZKP generation) 
validity proof  fraud proof  zeroknowledge proof 
requirement  Any PC with CPU/GPU  large memory for ZK circuit 
cost  low (inference and training can be conducted in native environment)  extremely high (generating a ZKP for ML inference is extremely high) 
Finality  delay for challenge period  No delays (wait for ZKP generation) 
security  cryptoeconomic incentives for security  cryptographic security 
*: Within the current opML framework, our primary focus lies on the inference of ML models, allowing for efficient and secure model computations. However, it is essential to highlight that our framework also supports the training process, making it a versatile solution for various machine learning tasks.
b) Performance & Security
Both zkML and opML present distinct advantages and disadvant The main difference is flexibility (performance) and security.
 zkML allows for onchain ML execution with the level of highest security, but it is limited by proof generation time and other constraints.
 opML facilitates onchain ML execution with enhanced flexibility and performance, but is not secured by mathematics and cryptography like ZK.
The security model of opML is similar to the current optimistic rollup systems. With the fraud proof system, opML can provide an anytrust guarantee: any one honest validator can force opML to behave correctly.
Suppose there exists only one honest node, and all other nodes are malicious. When a malicious node provides an incorrect result on chain, the honest node will always check the result and find it incorrect. At that time, the honest node can challenge the malicious one. The dispute game (verification game) can force the honest node and the malicious node to pinpoint one concrete erroneous step. This step will be sent to the arbitration contract on the blockchain, and will find out the result provided by the malicious node is incorrect. Finally, the malicious node will be penalized and the honest node can get a corresponding reward.
c) Cost of Onchain ML
In terms of cost, opML has a great advantage over zkML due to the huge cost for generating ZKPs.
Proof Generation Time
The proof time of zkML is long, and significantly grows with increasing model parameters. But for opML, computing the fraud proof can be conducted in the nearlynative environment in a short time.
Proof Time for zkML from ModulusLabs
Here is an example of a significantly long proving time of zkML. It runs a nanoGPT model inference with 1 million parameters inside a zkML framework and takes around 80 minutes of proving time.
Memory Consumption
The memory consumption of zkML is extremely huge compared to the native execution. But opML can be conducted in a nearlynative environment with nearlynative memory consumption. Specifically, opML can handle a 7BLLaMA model (the model size is around 26GB) within 32GB memory. But the memory consumption for the circuit in zkML for a 7BLLaMA model may be on the order of TB or even PB levels, making it totally impractical. Indeed, currently, no one can handle a 7BLLaMA model in zkML.
Memory consumption for zkML from ModulusLabs
d) Finality
We can define the finalized point of zkML and opML as follows:
 zkML: zk proof of ML inference is generated (and verified)
 opML: challenge period of ML inference has ended
For the finality of the two, we can compare and conclude that: though opML needs to wait for the challenge period, zkML also needs to wait for the generation of the ZKP.
When the proving time of zkML is longer than the challenge period of opML (it happens when a ML model is large), opML can reach finality even faster than zkML!
When the ML model is large, opML is all you need!
Further Optimization with ZKFP (ZK Fraud Proof)
Furthermore, we can utilize multiple checkpoints and multistep ZK verifier to provide a zkfraud proof to speed up the finality of opML. Assuming that there are N steps in the opML fraud proof VM. Then the number of interactions would be O(\log_2(N) ). One optimization approach to reduce the number of interactions is to use a multistep onchain arbitrator, instead of only onestep onchain arbitrator.
This can be done by using a zkVM, e.g., zkWASM, RISC Zero, Cairo VM, which can provide zkproof for the multistep execution. Suppose the zkVM can generate a zkproof for m steps, then the number of interactions can be reduced from O(\log_2(N) ) to O(\log_2(N/m)). Besides, instead of using a bisection protocol with only 1 checkpoint, we can provide multiple checkpoints (for example k (k >1) checkpoints) in the dispute game (verification game). At this time, we can further reduce the number of interactions from O(\log_2(N/m) ) to O( \log_{k+1}(N/m) ). With these optimizations, we can greatly speed up the finality of opML, and it is possible to achieve almost instant confirmation!
For our previous work, we collaborated with EthStorage and Delphinus Lab on zkGo (a minimum modified Go compiler to produce Wasm code compatible with ZK prover). As a result of these modifications, the execution of the compiled L2 Geth Wasm code can now be run using zkWASM. This can be a solid foundation for ZKFP.
5. Conclusion
The following is a conclusion generated by LLaMA 2 in opML.
Loading the model into memory took 10 seconds, and generating output 2 minutes with CPU without quantization (unoptimized).
In conclusion, the optimistic approach to onchain AI and machine learning offers significant advantages over existing methods. By embracing this novel approach, we can harness the power of AI while ensuring the integrity and security of the underlying blockchain system. opML, in particular, has shown impressive results in terms of efficiency, scalability, and decentralization, making it an exciting development in the field of AI and blockchain technology. As the use cases for onchain AI expand, it is essential that we continue to explore innovative solutions like opML, which can help us unlock the full potential of these technologies while maintaining the highest standards of transparency, security, and ethical responsibility.
6. Acknowledgement
Thanks to Will Villanueva and Luke Pearson for their insightful feedback and review!
About Hyper Oracle
Hyper Oracle is a programmable zkOracle protocol that makes smart contracts smarter with arbitrary compute and richer data sources. Hyper Oracle offers full security and decentralization for trustless automation and onchain AI/ML so builders can easily create nextgen dApps.