Introduction
Decentralized exchanges (DEXs) rely on token reserves to determine swap prices, enabling permissionless trading, but also unintentionally introducing MEV opportunities. Because attackers can reorder, insert, or remove transactions around DEX trades, they can exploit temporary price discrepancies to extract profit via arbitrage or sandwich attacks.
Why MEV Search Is So Slow Today?
MEV search is an optimization problem: given a mempool and a DEX environment, we want to evaluate thousands, sometimes millions of possible transaction bundles and choose the most profitable one.
To diagnose the bottlenecks, we replayed seven historical arbitrage MEV examples (see the Appendix) using two state-of-the-art tools:
- Foundry, the fastest CPU-side EVM in Rust
- Lanturn, an MEV searcher in TypeScrip
Environment:
- Intel i7
- RTX3090Ti 24G
- SSD 1T
- Ubuntu 22.04
We ran 1,000 replays for each tool and measured how time is divided across the three main stages:
- Forking from mainnet state
- Generating input bundles
- Executing smart contracts
Our findings matched previous research: input generation is trivial (<1%), but EVM execution and forking dominate end-to-end latency. Foundry spends:
- 80.89% of its time on forking (27.87 ms)
- 18.82% on executing swaps (6.49 ms)
Lanturn performs even worse: its EVM execution is 17.8× slower and forking 1.98× slower than Foundry, despite multi-threading.
In short: MEV search is limited by CPU execution speed, and CPUs simply cannot explore the search space fast enough.
Compiling MEV Bot to GPU Code
To break out of this CPU bottleneck, we use GPU:
- Write the MEV bot in Solidity (e.g., a router performing multi-hop arbitrage)
- Compile the smart contract to LLVM IR using mau
- Generate PTX code (CUDA assembly) using LLVM backend
- Execute it natively on GPU threads
Here’s the Solidity bot we use as a running example. Its fallback handler parses a compressed byte-format describing the swaps, determines the DEX type, and calls the correct swap() function:
// SPDX-License-Identifier: UNLICENSED
pragma solidity 0.7.6;
import "./libraries/uniswapv2/UniswapV2Pair.sol";
import "./libraries/uniswapv3/UniswapV3Pair.sol";
import "./libraries/sushiswap/SushiswapPair.sol";
import "./libraries/pancakeswap/PancakeswapPair.sol";
contract MEV {
fallback(bytes calldata data) external payable returns (bytes memory) {
uint256 amount = abi.decode(data[:32], (uint256));
uint32 swap_len = uint32(abi.decode(data[32:], (bytes4)));
uint32[] memory swap_vec = new uint32[](swap_len);
uint32[] memory token_vec = new uint32[](swap_len);
uint32 calldata_size = uint32(data.length);
uint32 swaps_buf_size = (calldata_size - 32 - 4) / 2;
for (uint32 i = 0; i < swap_len; i++) {
swap_vec[i] = uint32(abi.decode(data[36 + i * 4:], (bytes4)));
uint256 token_word = abi.decode(data[36 + swaps_buf_size + 32 * (i / 8):], (uint256));
token_vec[i] = uint32(token_word >> (256 - 32 * (i % 8 + 1)));
}
for (uint32 i = 0; i < swap_len; i++) {
bool reversed = token_vec[i] > token_vec[(i + 1) % swap_len];
uint32 swap_u32 = swap_vec[i];
uint8 dex_type = uint8(swap_u32 >> 24);
address dex = address(swap_u32 & 0xFFFFFF);
if (dex_type == 1) {
amount = UniswapV2Pair(dex).swap(reversed, amount);
} else if (dex_type == 2) {
amount = UniswapV3Pair(dex).swap(reversed, amount)
} else if (dex_type == 3) {
amount = SushiswapPair(dex).swap(reversed, amount);
} else if (dex_type == 4) {
amount = PancakeswapPair(dex).swap(reversed, amount);
} else if (dex_type == 5) {
continue;
}
}
return abi.encodePacked(amount);
}
}
Once compiled to PTX, this bot becomes executable GPU code. Instead of evaluating one bundle at a time, as CPUs do, we launch tens of thousands of threads, each testing a different candidate bundle.
Parallel Genetic Algorithm: Searching Thousands of Bundles at Once
A parallel genetic algorithm (GA) is used to search bundles on GPU. Each GPU thread represents one individual (i.e., one bundle) in a population. In each generation, the algorithm performs:
- Selection – choose the top-profit bundles
- Crossover – recombine them into new bundles as offsprings
- Mutation – tweak DEXs, tokens, or ordering to get new bundles as invariants
- Evaluation – run MEV bot with the invariants bundles on the GPU
This workflow allows us to explore a vast search space extremely quickly. Multiple populations can be evaluated concurrently, enabling broad and deep search coverage.
Performance Results
Table: New MEV Opportunities Identified in 2025 Q1 Blocks (ETH)
| Profit | Revenue | Gas | |
|---|---|---|---|
| Arbitrage | 335.53 | 336.40 | 0.87 |
| Sandwich | 90.90 | 103.60 | 12.70 |
| Total | 426.43 | 440.00 | 13.57 |
Across experiments, we achieve 3.3M–5.1M transactions per second, outperforming Lanturn by ~100,000×. On real data from Q1 2025, the system estimated MEV opportunities spanning 2 to 14 transactions, amounting to 426.43 ETH in extractable profit.
Note that the gas price is depends on the market at that time. We
Runtime Overhead: Lightweight and Stable
MEVISOR introduces two types of overhead: cold startup and per-search overhead.
Cold Boost (one-time)
Before searching, the GPU must load the PTX bot and initialize memory.
This costs 7.03 seconds, but only happens once and is amortized across all future searches.
Runtime Overhead
The main runtime cost is loading DEX state snapshots into GPU memory.
As we scale from 64 to 540 DEX pools:
- Loading latency stays stable at ~0.30 seconds
- GPU memory usage increases linearly (774 MB → 1.58 GB)
Even with hundreds of DEXs, MEVISOR stays comfortably below the VRAM limits of commodity GPUs and maintains low, predictable overhead.
Appendix
Motivation Examples
| Length | Block | Capital (ETH) | Revenue (ETH) | DEX Chain | Asset Chain |
|---|---|---|---|---|---|
| 2 | 21536202 | 0.05 | 0.0002 | 0x1DC698b3d2995aFB66F96e1B19941f990A6b2662, 0x9081B50BaD8bEefaC48CC616694C26B027c559bb |
0x4c11249814f11b9346808179Cf06e71ac328c1b5, 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2 |
| 3 | 14079213 | 0.12 | 0.0009 | c926990039045611eb1de520c1e249fd0d20a8ea, 62ccb80f72cc5c975c5bc7fb4433d3c336ce5ceb, 77bd0e7ec0de000eea4ec88d51f57f1780e0dfb2 |
0x557B933a7C2c45672B610F8954A3deB39a51A8Ca, 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2, 0xe53EC727dbDEB9E2d5456c3be40cFF031AB40A55 |
| 4 | 12195667 | 0.64 | 0.0196 | 0x1f44e67eb4b8438efe62847affb5b8e528e3f465, 0x41ca2d9cf874af557b0d75fa9c78f0131c7f345c, 0x088ee5007c98a9677165d78dd2109ae4a3d04d0c, 0xa478c2975ab1ea89e8196811f51a7b7ade33eb11 |
0x6b175474e89094c44da98b954eedeac495271d0f, 0xbcda9e0658f4eecf56a0bd099e6dbc0c91f6a8c2, 0x0bc529c00c6401aef6d220be8c6ea1667f6ad93e, 0x6b175474e89094c44da98b954eedeac495271d0f |
| 5 | 13734406 | 2.04 | 0.0452 | 0x60594a405d53811d3bc4766596efd80fd545a270, 0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801, 0x9f178e86e42ddf2379cb3d2acf9ed67a1ed2550a, 0xfad57d2039c21811c8f2b5d5b65308aa99d31559, 0x5777d92f208679db4b9778590fa3cab3ac9e2168 |
0x6b175474e89094c44da98b954eedeac495271d0f, 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2, 0x1f9840a85d5af5bf1d1762f925bdaddc4201f984, 0x514910771af9ca656af840dff83e8264ecf986ca, 0x6b175474e89094c44da98b954eedeac495271d0f |
| 6 | 14530401 | 0.41 | 0.0182 | 0xa478c2975ab1ea89e8196811f51a7b7ade33eb11, 0x5ab53ee1d50eef2c1dd3d5402789cd27bb52c1bb, 0x59c38b6775ded821f010dbd30ecabdcf84e04756, 0x9f178e86e42ddf2379cb3d2acf9ed67a1ed2550a, 0x153b4c29e692faf10255fe435e290e9cfb2351b5, 0x1f44e67eb4b8438efe62847affb5b8e528e3f465 |
0x6b175474e89094c44da98b954eedeac495271d0f, 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2, 0x7fc66500c84a76ad7e9c93437bfc5ac33e2ddae9, 0x1f9840a85d5af5bf1d1762f925bdaddc4201f984, 0x514910771af9ca656af840dff83e8264ecf986ca, 0x6b175474e89094c44da98b954eedeac495271d0f |
| 7 | 13731798 | 0.24 | 0.0202 | 0x1f44e67eb4b8438efe62847affb5b8e528e3f465, 0x153b4c29e692faf10255fe435e290e9cfb2351b5, 0x14243ea6bb3d64c8d54a1f47b077e23394d6528a, 0xd75ea151a61d06868e31f8988d28dfe5e9df57b4, 0x088ee5007c98a9677165d78dd2109ae4a3d04d0c, 0x41ca2d9cf874af557b0d75fa9c78f0131c7f345c, 0x1f44e67eb4b8438efe62847affb5b8e528e3f465 |
0x6b175474e89094c44da98b954eedeac495271d0f, 0xbcda9e0658f4eecf56a0bd099e6dbc0c91f6a8c2, 0x514910771af9ca656af840dff83e8264ecf986ca, 0x7fc66500c84a76ad7e9c93437bfc5ac33e2ddae9, 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2, 0x0bc529c00c6401aef6d220be8c6ea1667f6ad93e, 0x6b175474e89094c44da98b954eedeac495271d0f |
| 8 | 13754757 | 2.48 | 0.0680 | 0xa478c2975ab1ea89e8196811f51a7b7ade33eb11, 0x5ab53ee1d50eef2c1dd3d5402789cd27bb52c1bb, 0x59c38b6775ded821f010dbd30ecabdcf84e04756, 0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801, 0x11b815efb8f581194ae79006d24e0d814b7697f6, 0xfcd13ea0b906f2f87229650b8d93a51b2e839ebd, 0xc0067d751fb1172dbab1fa003efe214ee8f419b6, 0x60594a405d53811d3bc4766596efd80fd545a270 |
0x6b175474e89094c44da98b954eedeac495271d0f, 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2, 0x7fc66500c84a76ad7e9c93437bfc5ac33e2ddae9, 0x1f9840a85d5af5bf1d1762f925bdaddc4201f984, 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2, 0xdac17f958d2ee523a2206206994597c13d831ec7, 0x4206931337dc273a630d328da6441786bfad668f, 0x6b175474e89094c44da98b954eedeac495271d0f |


