Since this summer, the Ethereum Foundation has financed a collaboration between Robert Granger, @asanso and I to further accelerate GT multiexp so that if validator privacy becomes a priority again, the performance bottlenecks are cleared out.
I have finished the work this week Torus-acceleration for multiexponentiation on GT by mratsim · Pull Request #485 · mratsim/constantine · GitHub and I’m happy to announce that for the size of interest (128 and 256 points), multi-exponentiation on ConstantineGT (Fp12, 6x bigger than G1) is only:
- 3x slower than BLST G1 for 128 points
- 3.28x slower than BLST G1 for 256 points
I use blst as reference as every consensus client uses it.
BLST MSM G1
Constantine MultiExp GT
The new work involves combined Torus-based acceleration with 4-way endomorphism decomposition + projective Torus coordinates to delay/aggregate expensive operations.
There are further optimizations down-the-line which are unfortunately blocked by a Constantine performance bug, despite having up to a raw 1.7x speed advantage on Fp, it dwindles down to only a 1x advantage or worse while building higher-level construct like G1 or GT (Constantine is still the fastest on x86 for BN254_Snarks / BLS12-381 due to state-of-the-art algorithms at each abstraction level).
Another venue for a 2x~3x perf improvement is using SIMD which would allow computing on 4x uint64 (AVX2) or 8x uint64 (AVX512) per instruction instead of 1.