Faster block/blob propagation in Ethereum

potuz · May 19, 2025, 8:33pm

Because each individual chunk needs to come signed to not dos nodes. As I noted above there’s only one signature that is needed (to the commitment of all of them) but this signature needs to come in chunks, therefore you change the current signature scheme, which is what makes deep CL changes.

MarcoPolo · May 19, 2025, 8:38pm

Exciting work! Thanks for testing this

A couple of questions:

What are the network specs of the machines?
Could you link the code that distributes the RLNC chunks to the mesh?
How does parallelization speed up the commitment computation?

potuz · May 19, 2025, 10:34pm

I’ll defer the first question to @parithosh, all I remember from them is s-8vcpu-16gb-amd for spec. The branch that was tested is [DO NOT MERGE] Use RLNC for block propagation by potuz · Pull Request #14813 · OffchainLabs/prysm · GitHub without the last commits for parallelization. The call to distribute the chunks starts in this line.

The last bench I can find parallelized is

cpu: Intel(R) Core(TM) i9-14900
BenchmarkChunkMSM_6MB-32               1        1383648871 ns/op        317930016 B/op    197009 allocs/op
BenchmarkChunkMSM_2MB-32               3         355434164 ns/op        106150704 B/op     65944 allocs/op
BenchmarkChunkMSM_200KB-32            33          34271750 ns/op        10362574 B/op       6798 allocs/op

Paralellized:
BenchmarkChunkMSM_6MB-32               4         267416300 ns/op        317934364 B/op    197033 allocs/op
BenchmarkChunkMSM_2MB-32              14          82303238 ns/op        106154188 B/op     65969 allocs/op
BenchmarkChunkMSM_200KB-32           139           8521646 ns/op        10363791 B/op       6823 allocs/op

But this is a fast machine.

EDIT: I want to stress that anyway if this goes into production, we most probably want to use a curve defined over F_2 instead of Ristretto.

arajasek · June 19, 2025, 4:45pm

Hey, this is great to see! I work with a team called Optimum , we’re building RLNC-based technology for Web3 – our first product is a general-purpose gossip library built on Gossipsub.

We integrated our library with Shadow, and ran the same Ethereum mainnet-like experiments as @ppopth did in the “Doubling the blob count” experiments (thank you for the great work there, was very easy to build on). We’ve seen similarly positive results – the time for 99% of nodes to receive a message is generally about twice as fast. Note that this is without tuning parameters, I suspect we can get significantly better results by playing around with various numbers.

We also ran this on some real-world infra, and got similar results. It seems to scale with larger messages really well, we haven’t found its breaking point yet because my desktop runs out of memory before that .

A few points of difference to note:

We’re using \mathbb{F}_{256} to represent both coefficients and elements
We aren’t yet handling the possibility of bad chunks (assuming honest behaviour of all nodes). We have a few ideas of how we can do this, the Pedersen commitment idea is really neat!
Unlike the approach here, we modified the Gossipsub protocol itself to work with chunks. I think this is important, because that way you have full control over how chunks are propagated. I’m not sure how the approach in this thread works:
- Does a node always forward a chunk it receives to its mesh peers (in addition to potentially creating a new chunk at the application level that is published separately?)
- Is gossip (IHAVE/IWANT/IDONTWANT) emitted for each chunk?
Modifying Gossipsub to be aware of chunks also allows you to do some nice optimizations. As an example, when sending IWANT control messages, nodes can specify how many more chunks they need (which the receiving node may or may not respect).

I was wondering what (if anything) we could do that would be useful for your efforts. I’m happy to run more simulations on Shadow if there’s any other numbers you’d like to see captured (currently limited by my 256 GiB RAM desktop, but we’re looking into a bigger machine). Also happy to talk through any open questions, or write any code that would be useful. Very excited to see RLNC being experimented with!

cskiraly · July 1, 2025, 7:29am

Hi @arajasek, nice to see you join the discussion here!

Note that Shadow does not account for CPU time. Depending on how you have integrated your code, this can be quite a difference compared to runs on testbeds. We are investigating how to get around this (it was part of Shadow to some extent, but removed to favor reproducibility).

Unlike the approach here, we modified the Gossipsub protocol itself to work with chunks. I think this is important, because that way you have full control over how chunks are propagated. I’m not sure how the approach in this thread works …

There have been various studies on doing large message propagation over GossipSub with chunking. In the FullDAS work (where the simulation part also uses nim-libp2p + Shadow), I’ve used a naive approach in the implementation, simply using small messages and handling anything related to the large message context at a higher level in the stack. But the linked post also contains proposals for structured message IDs, bitmap based IHAVE/IWANT, etc.

It would be really interesting to see what modifications you did in your version.

potuz · July 4, 2025, 11:21pm

But this is the whole point of the problem! we can’t use these small fields like F_{256} because we can’t have commitments that are compatible. We can’t assume that all nodes are honest. Notice that a single malicious node can inject arbitrarily high number of bad chunks that will propagate on the chain poisoning every single message that their peers send.

We haven’t figured out yet a good way of using a small field for scalars while at the same time have a good homomorphic signature or hashing that makes this viable for something like blobs.

For blocks and payloads, Ristretto is good enough anyway.

jonasbostoen · July 8, 2025, 4:10pm

After some experimentation with different fields & crates, I was able to get significant speedups using BLS12-381 scalars with the blstrs crate, on everything but committing to small blocks.

Opened a draft showcase PR here with the results: show: use bls12-381 with blstrs by mempirate · Pull Request #1 · potuz/rlnc_poc · GitHub