I have a concern on this one. This can lead to congestion in the subnets and potentially lead to the DoS if too many nodes do reconstruction at the same time.
The nature of gossipsub is broadcasting, in contrast to DHT which is unicast. That is, if you send a single message, everyone in the subnet will receive it. That means, if you use x bandwidth to upload a message, the whole subnet will use nx bandwidth to download/forward that message (the amplification factor is n). In contrast to the DHT, if you want to have n nodes to have your message, you need to send it directly to those nodes one by one and use nx bandwidth (the amplification factor is 1).
Because the amplification factor is higher in gossipsub, the subnet will be quite congested, if many nodes want to do the reconstruction at the same time.
There are two ways that I can think of to resolve this problem:
- Allow only some validators to do reconstruction in any epoch. This can limit the congestion because the number of reconstructors is limited, but, in case we need to do reconstruction, we have to assume that some validators in the allowed list will do it. (Probably we should incentivize them if there is a way to do so)
- Set the message id to include the epoch number and not depend on the publisher.
- When the message id doesn’t depend on the publisher, it doesn’t matter how many nodes do the reconstructions because the reconstructed messages from all the reconstructors are treated as a single message and will be downloaded and forwarded only once.
- The epoch number is included so that we allow only one reconstruction per epoch. Since the message id doesn’t depend on the publisher anymore, the nodes may not forward the messages if there is another later reconstruction. Including the epoch number indicates that this is another round of reconstruction, not the same one as the previous one.
I think the second way is better than the first one in every aspect, but include both to throw ideas.