CuEVM - Achieving Millions of TPS on GPUs for fuzzing and beyond

minhhn2910 · May 14, 2026, 2:26am

Introducing CuEVM V2. We set out in 2024 aiming to harness GPU power for the Ethereum ecosystem with a GPU-native EVM that runs transactions in parallel, with fuzzing (massively testing slightly mutated transactions) as the first use case. Further use cases include transaction simulations, or even parallel execution for an L2 (with proper concurrency control). Since the CuEVM V1 I presented at Devcon SEA (40-80k TPS, link to the talk), we have redesigned and reimplemented the entire codebase to unlock unprecedented throughput (8M+ TPS).

All source code and a Docker container are now released for everyone to tinker with and build on. I will share more in-depth design choices and research paper preprints in follow-up posts over the next few weeks. Contributions, discussions, and questions are welcome.

Highlights

8M+ TPS for ERC20 transfers (no state conflicts) on RTX 5000 Ada
1M+ fuzzing TPS for end-to-end smart contract fuzzing with medusa-cuevm
Fuzzing integration: medusa-cuevm, built on top of Crytic’s Medusa v1.2.1
96%+ traces identical to go-ethereum (on eth-tests Shanghai)
Easy reproducible Docker container

Github Links

CuEVM sbip-sg/CuEVM
Docker container with fuzzer integration and benchmark minhhn2910/CuEVM-container

Ecosystem Integration

medusa-cuevm - built on top of Crytic’s Medusa
go-evmlab - a fork from holiman/goevmlab

Configuration and Resource Consumption

32,768–65,536 CUDA threads ↔ ~20GB GPU memory
2–8 CPU threads to prepare transaction data and process execution results for the CUDA threads between batches of transactions. The bottleneck shifts between the CPU and GPU sides depending on how these parameters are configured.

Main Optimizations for High Throughput

Preallocated buffers with coalesced access (the critical CUDA optimization)
Data structure interleaving across different EVM instances (SoA, structure-of-arrays design)
Thread scheduling (warp-size transaction similarity)
Low-level optimization of GPU resources (e.g., register usage)
Optimistic revert mechanism (copy-on-write, maintain a log of all writes; cheap commit writes directly to state, expensive revert)
Minimized CPU↔GPU transaction data transfer (persistent world state between transactions, cached initial state for state reset)

Result Replication

A fully reproducible environment is available via the Docker container link above.

Interfacing

CuEVM binary — functionality equivalent to ./cmd/evm in go-ethereum. Takes input as a JSON file.
CuEVM library (libcuevm_go.so) — a fuzzer on its own, interfacing with a host program to fuzz smart contracts and return program counters that trigger bugs. Supported oracles: assertions from Medusa, integer bugs, leaking Ether, and reentrancy.

Design

Execution model: Initial state + N sequences of M transactions each → a GPU kernel executes N CUDA threads = N transactions (the first transaction of all N sequences) → state persists in GPU memory → execute the second transaction of all N sequences → … → execute the Mᵗʰ transaction of all N sequences → copy result → reset state.

Communication:

Raw transaction data is passed as an object mimicking the eth-tests JSON format, to keep it compatible between the Go library and the binary EVM mode.
A single world state is cloned to all N EVM instances; each instance executes one transaction.
Execution result: fuzzing-specific results (bug locations, and interesting inputs that trigger bugs or increase code coverage), plus EIP-3155 traces (currently limited to printing the trace of one thread). You can instrument CuEVM to emit any other custom result.

What Could Be Done Differently

I experimentally developed a Python library and a fuzzer in Python from scratch, but abandoned that direction due to the difficulty of parallel data preparation on the CPU side (the CPU side becomes the biggest bottleneck). As true multithreading with GIL disabled arriving in Python 3.13+, a Python library could be a viable direction for future work.

I’m happy to discuss, follow up, and collaborate on new use cases and opportunities. Feel free to DM me at @nminh_ho.

The project was developed under the Singapore Blockchain Innovation Programme at the National University of Singapore, and funded by the Ethereum Foundation. I’d like to thank Fredrik Svantes for his support and mentorship in integrating CuEVM with the ecosystem.

FateX.exe · May 20, 2026, 6:16am

Sweet. I built something that uses the NVIDIA CuOPT API kind of “off label.” I’ll have to check this out.