Nebula - A novel discv5 DHT crawler

Hi everyone,

I’m Dennis from the network measurement and protocol benchmarking team ProbeLab that spun out of Protocol Labs. So far, the team has focused on developing metrics for IPFS (see probelab dot io) but recently started looking into other libp2p-based networks. We extended our DHT crawler that powers IPFS metrics for over a year to also support Ethereum’s DiscV5 DHT. In this post, I want to share some findings and gather feedback. You can find the source code here:

This Discourse instance only allows one media item and a maximum of two links in a post for new users. So please follow the following link to this Notion page. I originally intended to post its contents here:


Hi Dennis,

very interesting analysis, as usual.

I understand the CL client distribution that you see is before filtering with any particular fork digest, correct? This means the distribution you are showing is for all networks combined, mainnet and testnets (See figure below for Ethereum testnets). The distribution shown in monitorEth is only for mainnet, so the two should not be compared directly (Apple vs Oranges).

Regarding the fork digests that you see in the network, you can find most of them in our source code:

I am curious how the CL client distribution looks like after filtering out all the testnets and leaving only the last fork of mainnet. You seem to see about 9.5K nodes on mainnet (0xbba4da96), which is very close to the number of nodes that we managed to connect in the last week with Armiarma, see the first bar in the figure below (9680 nodes). The other bars are nodes that we managed to connect some weeks before but haven’t managed to connect since then. They will get deprecated later if a connection is not successful in the coming weeks.

One of the trade-offs between having a very general libp2p crawler vs a specialized one is that with the general one is much harder to be a “good citizen” in the network, as you admit in your post. The first version of Armiarma was very general and we used it for other networks. However, for Armiarma v2, we changed it for a specialized one so that nodes could connect to us and keep us as a good peer of their peer list. This, together with running 24/7, are some key elements that are particularly useful for discovering peers behind NATs, as well as clients that are more strict on following the Ethereum specification. For instance, we have noticed that this is the case of Prysm nodes. If you don’t fully follow the specs (e.g., BeaconStatus exchange, etc) it is normal to see several connections dropped because of it, which might explain why Nebula sees so few of them.

Overall, this first preliminary results look very promising and I am looking forward to see more coming out of this.

Cheers! :grinning:

Hi @leobago

thanks for your insights!

You are totally right! I revised my analysis in two ways:

  1. Filtered by the fork digest of 0xbba4da96
  2. Looked at multiple crawls to derive the agent version. If I’m not able in a crawl to connect to a peer I won’t find out its agent version. However, when in the next crawl I’m able to connect to it I’m able to extract the agent version. The numbers in that Notion page refer to a single crawl. The below numbers take into account any crawl I’ve done so far.

These numbers come much closer to the ones you report:

Client Peers Share
Lighthouse 3600 38.66 %
Prysm 2645 28.40 %
teku 1349 14.49 %
nimbus 643 6.90 %
null 629 6.75 %
rust-libp2p 216 2.32 %
lodestar 192 2.06 %
erigon 37 0.40 %
Grandine 2 0.02 %
Total 9313 100.00 %

For comparison from

Not perfect but we’re getting there!

That’s my brief update! I’ll circle back here regarding your other remarks and when I have updates!

Cheers :slight_smile: