The return of Torus Based Cryptography: Whisk and Curdleproof in the target group

mratsim · December 1, 2024, 3:41pm

Since this summer, the Ethereum Foundation has financed a collaboration between Robert Granger, @asanso and I to further accelerate GT multiexp so that if validator privacy becomes a priority again, the performance bottlenecks are cleared out.

I have finished the work this week Torus-acceleration for multiexponentiation on GT by mratsim · Pull Request #485 · mratsim/constantine · GitHub and I’m happy to announce that for the size of interest (128 and 256 points), multi-exponentiation on ConstantineGT (Fp12, 6x bigger than G1) is only:

3x slower than BLST G1 for 128 points
3.28x slower than BLST G1 for 256 points

I use blst as reference as every consensus client uses it.

BLST MSM G1

Constantine MultiExp GT

The new work involves combined Torus-based acceleration with 4-way endomorphism decomposition + projective Torus coordinates to delay/aggregate expensive operations.

There are further optimizations down-the-line which are unfortunately blocked by a Constantine performance bug, despite having up to a raw 1.7x speed advantage on Fp, it dwindles down to only a 1x advantage or worse while building higher-level construct like G1 or GT (Constantine is still the fastest on x86 for BN254_Snarks / BLS12-381 due to state-of-the-art algorithms at each abstraction level).

Another venue for a 2x~3x perf improvement is using SIMD which would allow computing on 4x uint64 (AVX2) or 8x uint64 (AVX512) per instruction instead of 1.