Revisiting Falcon signature aggregation for PQ mempools

b-wagn · March 18, 2026, 8:49am

Authors: Antonio Sanso @asanso, Thomas Thiery @soispoke, Benedikt Wagner @b-wagn

tl;dr: This post compares the size of signatures and aggregates for transactions using Falcon signatures (as defined in the Falcon specification), under different assumptions about key recovery and aggregation.

Introduction

Ethereum’s post-quantum (PQ) work has recently shifted from long-term research to more applied research and engineering efforts. As PQ transaction signatures move closer to deployment, one constraint has become evident: post-quantum signatures are large, and their cost is felt most acutely in network bandwidth (e.g., mempool propagation) and node storage requirements.

The Falcon signature scheme remains a compelling reference due to its selection for standardization by NIST and its relatively compact signatures, compared to other post-quantum alternatives.

However, even for Falcon, its signature sizes (666 Byte) impose very different mempool and storage costs than ECDSA (< 100 Byte) when many transactions coexist. This has motivated renewed interest in aggregation, particularly at the mempool or block-construction level, as a hypothesis worth examining under realistic assumptions.

In this post, a quantitative analysis of Falcon signature aggregation is presented using the standard Falcon scheme, comparing the total size of signatures and aggregates under different assumptions about public key recovery and aggregation, with the goal of clarifying the trade-offs relevant to post-quantum Ethereum transactions.

The central question examined here is when, and under which conditions, aggregation of post-quantum signatures in the mempool meaningfully reduces the total size of transaction data.

Prior Work

While pre-quantum signatures often have the algebraic structure to be aggregated natively (e.g., BLS signatures), aggregation for post-quantum schemes is often more complex.

Aggregation of Falcon and related lattice-based signatures has been explored in prior cryptographic and Ethereum-focused work, including a paper that studies aggregation using the LaBRADOR proof system and achieves somewhat compact aggregated signatures at the cost of non-trivial aggregation and verification overheads.

Related discussion in the Ethereum research community examines lattice-based signature aggregation in blockchain settings, highlighting practical trade-offs around proof size, verification time, and implementability in an ethresearch post.

As a disclaimer, we do not consider SPHINCS+ or Dilithium signatures here (which are larger than Falcon signatures). This document focuses exclusively on Falcon and its aggregation properties under Ethereum-relevant assumptions.

For broader context on post-quantum Ethereum transaction signatures, including prior discussion of Falcon and account abstraction, see this earlier three-part series: Part 1, Part 2, and Part 3.

Why Falcon?

Falcon has been selected by NIST for standardization. It has the smallest signatures among those selected schemes, and aggregation has also been studied in the literature.

As the figure above highlights, Falcon is especially attractive in this discussion because, among the selected NIST PQ signature schemes, it combines relatively small signature size with comparatively favorable aggregation properties. In particular, Falcon is one of the few practical candidates for which aggregation has been explored in a way that is plausibly relevant to Ethereum-style transaction pipelines.

We emphasize that this post is not claiming that Ethereum must use Falcon. It just studies what happens if we were to use Falcon.

Quick Recap of Falcon

Roughly, the verification equation for Falcon has the form H(m,r) = s_1 + s_2 h, and additionally a norm constraint on the vector (s_1,s_2) is checked. Here h is the public key, H is a hash function. The signature can have one of two forms:

Standard Version. It is (s_2,r), where r is a random salt. The value s_1 is recomputed during verification as s_1 = H(m,r) - s_2 h.
Version for Key Recovery Mode. The signature is (s_1,s_2,r). In this case, one can recompute the public key (and its hash) from the signature and the message, see Section 3.12 of the Falcon specification.

We do not go into detail on how these signatures are generated or in which domains these objects live. Both is not relevant for our discussion.

Throughout, we make the assumption that for aggregation, the hash function is not changed (e.g., we don’t replace it with Poseidon2). Further, we assume that the salts are not aggregated. That is, in the aggregated signature we have an aggregated version of all s_2's (or s_1,s_2's) and all individual salts. This is reasonable to assume (see discussions here) and makes the aggregation using succinct proofs more efficient, as the statement to be proven is purely algebraic, and no hash is involved.

We use the following notation throughout. When the public key is known, a Falcon signature has size S+R, where R denotes the size of the salt component and S denotes the size of s_2 above. When the public key is not known and only its hash is available (i.e., the address), the signature size increases to \tilde{S} + R for \tilde{S} > S, following the key-recovery mechanism described above.

Addresses are represented as a hash of the public key and have size h, while the full public key has size p. We denote by N the number of transactions considered.

We denote by a_N the aggregated signature size, which depends on the number of aggregated signatures N and includes the salts. Concrete values for a_N can be computed using a script provided alongside prior work on Falcon aggregation (paper). For our analysis, we used a forked version of the repository, which also contains the code used to generate the plots in this document.

Variants of how to use Falcon

We now compare the total storage costs, considering different variants of how Falcon could be used. Note while we talk about storage, the same applies to bandwidth.

Case 1: With Key Recovery, No Aggregation

In this case, we use the Falcon version for key recovery mode (see above), meaning that from a signature one can rederive the public key and consequently its hash (the address). Then, for each transaction, we need to store the signature including the nonce and the address of the sender. Namely, each transaction stores:

Salt: R
Signature (excluding salt): \tilde{S}
Total N(\tilde{S}+R)

The sender address is not stored explicitly: it is recovered from the signature during verification, analogously to how Ethereum currently recovers the sender from an ECDSA signature.

Case 2: Without Key Recovery, No Aggregation

If we use the standard Falcon version instead (without key recovery, see above), then we save on signature size, but additionally need to explicitly store the public keys. We do not need to store addresses, as they can be derived from the public keys. Concretely, each transaction stores:

Public key: p
Salt: R
Signature (excluding salt): S
Total N(p+S+R)

Case 3: Without Key Recovery, With Aggregation

Now, assume we aggregate signatures, and an aggregate for N signatures has size a_N. Then, we no longer need to store the individual signatures. Namely, each transaction stores:

Public key: p
Salt: R

In addition, we have one aggregated signature of size a_N.

Total N(p+R) + a_N

Note 1. When computing a_N in our scripts below, we call the scripts from Falcon signature aggregation with LaBRADOR in a black-box way. Of course, not only LaBRADOR can be used for aggregation, but any SNARK. We note that hash-based SNARKs (which seem to be favored in Ethereum these days) are larger than LaBRADOR, and so using LaBRADOR is a very optimistic estimate for size of aggregated signatures.

Note 2. We do not consider the case of key recovery with aggregation, as this is not covered by the work on Falcon signature aggregation. This is primarly because the statement to be proven here would involve hashes and would have less algebraic structure, which means it is less efficient to prove than the statement that is considered for the standard variant.

Results

We computed (code here) the total size (in KiB) for each of the three cases as a function of the number of transactions N.

Case 2 is the most expensive across the entire range. Each transaction carries a full public key (p) and a standard signature (S + R), so total size grows steeply.

Case 1 is cheaper because key recovery eliminates the need to store either the full public key or the address sunce both are recoverable from the signature itself.

Case 3 replaces N individual signatures with a single aggregated signature of size a_N. But each transaction still needs a full public key and an individual salt, so the fixed proof overhead makes Case 3 more expensive than Case 1 for small N (~50 KiB). As N grows, the lower marginal cost dominates, and Case 3 crosses below Case 1 around N \approx 200 (dashed line in the figure).

Even past this crossover, the savings from aggregation are modest. At today’s typical Ethereum block sizes (~250 transactions), Case 3 saves only slightly over Case 1. The gap widens slowly and aggregation only pays off substantially for signature counts well beyond current per-block levels.

Conclusion

Our results suggest that Case 1 (Key Recovery, No Aggregation) could be a practical sweet spot for Falcon-based PQ transactions on Ethereum. It results in the lowest storage cost at current block sizes without requiring aggregation infrastructure, and its simplicity makes it the most straightforward path to deployment.

Aggregation (Case 3) does eventually win on storage, but only for large N, and the savings remain modest near today’s typical block sizes.

Beyond proof sizes, aggregation introduces substantial system complexity. Someone must perform the aggregation and producing proofs over hundreds of signatures within block times can be computationally expensive. Realizing bandwidth savings at the mempool level also likely requires architectures like recursive STARK-based mempools, where every node acts as a prover and proofs are folded incrementally during gossip.

That said, aggregation techniques are improving. Schemes such as Hatchi may reduce proof sizes in the future, and recent work on faster aggregation verification reduces the computational cost of checking aggregated proofs. If these improvements materialize, the trade-offs may shift.

Finally, we note that this analysis focuses on Falcon as specified today. Future modifications to the scheme itself (e.g., derandomization) could change the per-signature costs and alter the relative comparison. We leave exploration of such modifications to future work.

manel1874 · March 18, 2026, 11:41am

Very interesting breakdown and thanks @b-wagn for drawing my attention to it!

My question from Lattice-based signatures aggregation post had the consensus setting in mind, where we don’t need to keep the public key around (as the validator’s public key can be looked up by its index). In that case, assuming the proof size stays more or less constant at about ~70 KB, the break even point should be around 100 signatures.

For reference, I am adding below a similar comparison you did but with the consensus setting in mind. I forked your code here for the estimation, adapted the formulas and considered the same three cases.

Case 1: Key Recovery, No Aggregation

This case only seems interesting on the EL (where public keys are not stored) as on the CL public keys are already known. Included for comparison:

\text{Total} = N \cdot (\tilde{S} + R)

Case 2: Standard Falcon, No Aggregation

Similar to your case 2, but includes the validator indexes instead of the public key p:

\text{Total} = N \cdot (S + R + \text{idx})

Case 3: Standard Falcon, With Aggregation

Similar to your case 3, but includes the validator indexes instead of the public key p:

\text{Total} = a_N + N \cdot (R + \text{idx})

As expected, the break even shifts to the left now and it is indeed around 100 signatures (instead of 200 as in EL).

latifkasuli · March 18, 2026, 1:09pm

Nice analysis. One thing I think is worth making explicit: the N ≈ 200 crossover assumes a stable batch, but mempool churn (replacements, expiries, per-peer differences) means the effective crossover for propagation bandwidth is probably further to the right than the storage-only plot suggests. Aggregation over a moving set is a harder problem than aggregation over a fixed set.

b-wagn · March 18, 2026, 1:11pm

Thanks, this is a valid point!