Because each individual chunk needs to come signed to not dos nodes. As I noted above there’s only one signature that is needed (to the commitment of all of them) but this signature needs to come in chunks, therefore you change the current signature scheme, which is what makes deep CL changes.
Exciting work! Thanks for testing this
A couple of questions:
- What are the network specs of the machines?
- Could you link the code that distributes the RLNC chunks to the mesh?
- How does parallelization speed up the commitment computation?
I’ll defer the first question to @parithosh, all I remember from them is s-8vcpu-16gb-amd
for spec. The branch that was tested is [DO NOT MERGE] Use RLNC for block propagation by potuz · Pull Request #14813 · OffchainLabs/prysm · GitHub without the last commits for parallelization. The call to distribute the chunks starts in this line.
The last bench I can find parallelized is
cpu: Intel(R) Core(TM) i9-14900
BenchmarkChunkMSM_6MB-32 1 1383648871 ns/op 317930016 B/op 197009 allocs/op
BenchmarkChunkMSM_2MB-32 3 355434164 ns/op 106150704 B/op 65944 allocs/op
BenchmarkChunkMSM_200KB-32 33 34271750 ns/op 10362574 B/op 6798 allocs/op
Paralellized:
BenchmarkChunkMSM_6MB-32 4 267416300 ns/op 317934364 B/op 197033 allocs/op
BenchmarkChunkMSM_2MB-32 14 82303238 ns/op 106154188 B/op 65969 allocs/op
BenchmarkChunkMSM_200KB-32 139 8521646 ns/op 10363791 B/op 6823 allocs/op
But this is a fast machine.
EDIT: I want to stress that anyway if this goes into production, we most probably want to use a curve defined over F_2 instead of Ristretto.
Hey, this is great to see! I work with a team called Optimum , we’re building RLNC-based technology for Web3 – our first product is a general-purpose gossip library built on Gossipsub.
We integrated our library with Shadow, and ran the same Ethereum mainnet-like experiments as @ppopth did in the “Doubling the blob count” experiments (thank you for the great work there, was very easy to build on). We’ve seen similarly positive results – the time for 99% of nodes to receive a message is generally about twice as fast. Note that this is without tuning parameters, I suspect we can get significantly better results by playing around with various numbers.
We also ran this on some real-world infra, and got similar results. It seems to scale with larger messages really well, we haven’t found its breaking point yet because my desktop runs out of memory before that .
A few points of difference to note:
- We’re using \mathbb{F}_{256} to represent both coefficients and elements
- We aren’t yet handling the possibility of bad chunks (assuming honest behaviour of all nodes). We have a few ideas of how we can do this, the Pedersen commitment idea is really neat!
- Unlike the approach here, we modified the Gossipsub protocol itself to work with chunks. I think this is important, because that way you have full control over how chunks are propagated. I’m not sure how the approach in this thread works:
- Does a node always forward a chunk it receives to its mesh peers (in addition to potentially creating a new chunk at the application level that is published separately?)
- Is gossip (
IHAVE
/IWANT
/IDONTWANT
) emitted for each chunk?
- Modifying Gossipsub to be aware of chunks also allows you to do some nice optimizations. As an example, when sending
IWANT
control messages, nodes can specify how many more chunks they need (which the receiving node may or may not respect).
I was wondering what (if anything) we could do that would be useful for your efforts. I’m happy to run more simulations on Shadow if there’s any other numbers you’d like to see captured (currently limited by my 256 GiB RAM desktop, but we’re looking into a bigger machine). Also happy to talk through any open questions, or write any code that would be useful. Very excited to see RLNC being experimented with!
Hi @arajasek, nice to see you join the discussion here!
Note that Shadow does not account for CPU time. Depending on how you have integrated your code, this can be quite a difference compared to runs on testbeds. We are investigating how to get around this (it was part of Shadow to some extent, but removed to favor reproducibility).
Unlike the approach here, we modified the Gossipsub protocol itself to work with chunks. I think this is important, because that way you have full control over how chunks are propagated. I’m not sure how the approach in this thread works …
There have been various studies on doing large message propagation over GossipSub with chunking. In the FullDAS work (where the simulation part also uses nim-libp2p + Shadow), I’ve used a naive approach in the implementation, simply using small messages and handling anything related to the large message context at a higher level in the stack. But the linked post also contains proposals for structured message IDs, bitmap based IHAVE/IWANT, etc.
It would be really interesting to see what modifications you did in your version.
But this is the whole point of the problem! we can’t use these small fields like F_{256} because we can’t have commitments that are compatible. We can’t assume that all nodes are honest. Notice that a single malicious node can inject arbitrarily high number of bad chunks that will propagate on the chain poisoning every single message that their peers send.
We haven’t figured out yet a good way of using a small field for scalars while at the same time have a good homomorphic signature or hashing that makes this viable for something like blobs.
For blocks and payloads, Ristretto is good enough anyway.
After some experimentation with different fields & crates, I was able to get significant speedups using BLS12-381 scalars with the blstrs
crate, on everything but committing to small blocks.
Opened a draft showcase PR here with the results: show: use bls12-381 with blstrs by mempirate · Pull Request #1 · potuz/rlnc_poc · GitHub
We have a way to deal with bad blocks currently. I mentioned earlier the references on dealing with Byzantine nodes that introduce bad blocks (see my post from Feb 3). I would recommend NOT using Pedersen commitments because of modularity, as neat as I find it from a pure technical perspective. What you are doing is tying together the representation with the validation, which is fine, but it closes doors. For example, you start with a binary extension field, then you decide to do homomorphic and need to go to a large prime… Not impossible, but not super happy, either. The approach we use does not tie the two together.
We are now managing the possibilities of bad chunks at Optimum
Well, we haven’t figured out a better way yet than the Pedersen commitments. And yes, we need to go to a large prime, a binary field does not work and we do not know how to make it work with smaller fields. The literature we’ve seen on the topic (including the ones in your message in Feb 3) do not seem to work for our application on Ethereum L1. The only one that would perhaps work is to have homomorphic signatures, but that is impracticable on the running chain.
If you know a better way to deal with malicious nodes, it would be useful to point out a specific method that you think works in this setting. We can then discuss it.
To be concrete, I briefly looked at the paper “Resilient Network Coding in the Presence of Byzantine Adversaries”. The paper shows among other results that if an adversary can inject z packages per time and the network capacity is C, then the code achieves a rate of C - 2z. This means it is assumed that a receiver obtains more honest packages than malicious ones. This does not seem applicable in the blockchain setting, where honest nodes try to minimize their network traffic (instead of exhausting the network capacity) and adversaries can spam the network. Note that the Pedersen commitment approach does not suffer from this (but instead assumes a computationally bounded adversary).
In general, any mechanism we settle on to propagate blocks faster should aim to: (a) saturate all mesh paths in parallel, (b) utilize all available bandwidth, (c) maximize efficiency by transmitting unique information, (d) ensure we only handle authenticated chunks.
At a network level, we should favour packets sized below path MTU to prevent IP fragmentation. Because we now deal with smaller propagation units and built-in data erasure coding redundancy, we can use QUIC (unreliable) datagrams with larger fanouts experimenting with various routing strategies (e.g. chaotic/random, latency-centric, etc.) to enable burstier parallel transmission (up to the bandwidth limits in EIP-7870). We can also leverage QUIC session resumption to “warm up” connections with a large set of peers ahead of time, to later dispatch packets rapidly with 0-RTT under a reestablished secure channel.
Unfortunately, using large fields with RLNC to enable commitments narrows down the parameter space significantly, to the extent that none of the above is possible due to coefficient overhead.
There are other directions in the design space I’m keen to explore. Here are a few that seem conceptually promising and worthwhile.
-
Rateless/fountain codes like RaptorQ (RFC6330) with source-authenticated packets carrying signatures over the index + packet. The key problem is knowing when producers should stop seeding packets. We have a built-in feedback loop: attestation arrival. Valid attestations indicate peers successfully reconstructed the original payload, and can assist others in converging. We could set dynamic % thresholds over the expected attestations on subscribed subnets. Though CL clients attest at different times, efforts exist to normalize this to “as soon as a valid block is seen”.
-
Traditional/systematic Reed-Solomon with source-authenticated packets, opportunistically piggybacking availability bitmaps for pair-wise set reconciliation. This prevents duplicate transfers at the cost of bitmap overhead (reducible with RLE/Roaring compression). The main challenge is parameter optimization, concretely balancing RS redundancy, bitmap overhead, peer parallelism, and set reconciliation behavior.
-
Rateless IBLTs for mempool-aware block propagation. This reconciles incoming blocks against local mempool contents, transmitting only missing transactions. Given ~60% public transactions in blocks, this method could achieve 1.35x communication overhead relative to private/missing transactions (40%), potentially reducing block propagation bandwidth considerably. A peer could likely reuse the local symbol set across all its peers, though we need extra research on parallelization across peers. That said, there is a privacy consideration. While devp2p announces txs that may be dropped later, this algorithm could reveal current mempool state, which adversaries could theoretically exploit. However, the bandwidth and latency improvements may justify this tradeoff.
We’re working on prototyping (1) and (2) in the p2p networking team at the EF – we’ll have more to share in the coming weeks!
Indeed, end-to-end error correction is wasteful and should only be undertaken in the special case where only the sink nodes are capable of decoding.
As a side note, the fact that errors should be decoded locally to achieve capacity is at the core of network equivalence:
R. Koetter, M. Effros and M. Médard, “A Theory of Network Equivalence— Part I: Point-to-Point Channels,” in IEEE Transactions on Information Theory, vol. 57, no. 2, pp. 972-995, Feb. 2011
This is yet another example of why errors and erasures are so altogether different - propagating erasures is not wasteful.
Returning to the matter at hand, that of managing Byzantine errors in Gossip, as I mentioned before it is generally judicious to maintain modularity - tying the protocol to a specific representation of the data is not necessary and can hinder future upgrades, like deciding to go for larger prime fields rather than remaining in binary extension fields.
What we propose to do leverages the power of a code to be a hash, to verify correctness, but is agnostic as to the method for doing so. It immediately detects, flags and manages an error. We have internally nicknamed this algorithm the Rugby Protocol. For my fellow past or present ruggers reading this, it will be very natural.
The idea is the following. Rugby is played by two teams, each of which must stay on their side. The side is defined by the line of scrimmage, which is defined by the player who has the ball (or the position of the player who kicked the ball). In regular rugby, if you are offside, then you need to raise your hand and get onside, so behind the line of scrimmage for the ball. During that time, you take yourself out the game, cannot play and cannot interfere. Failure to do so opens you up to getting beat up by the opposing team, on whose side you have trespassed.
Let us now play a game called Byzantine Rugby. In Byzantine Rugby, you might have gone offside because somebody put in an extra ball in the game, so you were offside by no fault of your own. You still need to get onside, but to be allowed back onside, you need to explain who had the ball that you took to be the new line of scrimmage. It is that person who misled you and you need to prove that you were indeed misled, you cannot simply accuse. You therefore need to point to the source of the confusion. That person might also have misled, so will in turn need to explain the confusion.
That is detailed in Section V of the following paper detailing the Byzantine fault tolerance of the OptimumP2P approach:
1/ The coefficient overhead when implemented correctly for RLNC is only a few percent. There are multiple references to this effect:
J. K. Sundararajan, D. Shah, M. Médard, S. Jakubczak, M. Mitzenmacher and J. Barros, “Network Coding Meets TCP: Theory and Implementation,” in Proceedings of the IEEE, vol. 99, no. 3, pp. 490-512, March 2011
J. Heide, M. V. Pedersen, F. H. P. Fitzek and M. Medard, “On Code Parameters and Coding Vector Representation for Practical RLNC,” 2011 IEEE International Conference on Communications (ICC), Kyoto, Japan, 2011
J. Krigslund, J. Hansen, D. E. Lucani, F. H. P. Fitzek and M. Medard, “Network Coded Software Defined Networking: Design and Implementation,” Proceedings of European Wireless 2015; 21th European Wireless Conference, Budapest, Hungary, 2015
2/ Regarding using RS or Raptor, those do not scale. The throughput will go to 0 exponentially with the depth of the network. With RLNC it remains constant with the depth.
This is the topic of Myth #6 in
M. Medard, F. H. P. Fitzek, M. -J. Montpetit and C. Rosenberg, “Network coding mythbusting: why it is not about butterflies anymore,” in IEEE Communications Magazine, vol. 52, no. 7, pp. 177-183, July 2014
Basically, RS or Raptor with have throughput that goes to 0 with the size of the network, while RLNC will maintain a constant throughput.
Section 1.5 of our book Network Coding for Engineers gives a high level understanding, with more nuanced exposition in Chapter 5 of that book.