Nice writeup @leobago ! I think an explicit sharding in the mempool and a better separation of roles between the EL (row dissemination) and the CL (only column dissemination) is something we should consider in the design space.
I wold have a few notes to add, both on the shortcomings mentioned, and on the elements of the presented design.
I would argue some of the shortcomings were already (partially) addressed in the FullDAS design. There is space for improvement, but I think it is worth detailing these, also to better understand the changes:
Technically speaking, sending out 32MB is enough in FullDAS, as some of our simulations have also shown. This is because as soon as half of the cells of any given row or column were sent out, in-network reconstruction kicks in. If the builder is slow sending out, compared to the network doing the p2p redistribution, the whole 128 MB will be reconstructed in the network shortly after 32MB of cell data was sent out.
In other words, after sending out the 32MB, the builder is already starting to send out data for the purpose of reconstruction. It depends on the speed of the builder’s uplink vs. the speed of p2p redistribution how/whether this will be used.
This was the case before we’ve introduced getBlobs
. With getBlobs
, nodes that have the EL content will pass it to the CL though the engine API, and directly contribute the the CL row diffusion as source nodes. In other words, the bandwidth “waste” is limited, since the CL row diffusion only kicks in when the EL was not getting the row.
The current getBlobs
version is still a bit inefficient in eliminating this bandwidth “waste”, and I think we need a better one. This is what I called the streaming/notification based interface in FullDASv2. I think your design also needs a better interface between the EL and the CL to handle row/column interactions.
I would separate this question from the question of explicit sharding and alignment of interest between the CL and EL, which is part of your proposal.
This is more of a shortcoming of the current blob diffusion in the EL and column-based PeerDAS. In FullDAS, with cell-based messaging, this is not happening. Once a node receives half of the cells of a row or column, it reconstructs and becomes source for the rest of the cells.
Regarding the new techniques
Partial column dissemination sounds like an interesting middle ground between full columns (as in PeerDAS) and cell-based dissemination, as in FullDAS. Combined with the explicit last-bits based sharding, I think it could be quite efficient, and it might be the right compromise we are looking for.
For the sharding based on the tx hash, my main concern is how different transactions from the same tx sender might end up in different shards, leading to issues with nonces and nonce-gaps.
The EL mempool follows a logic where transactions from the same sender address are handled as part of a stream. When pushing (i.e. for small transactions, not for type3) transactions from the same sender are sent over the same links to the same peers, making sure nonce-gaps are minimized. When pulling based on announcements (i.e. for type3 and larger transactions) the nonce order is respected in the scheduling of requests.
It is an interesting question whether we can break this logic for type3 transactions, which I think we should investigate further.