EL peer composition drifts on a fixed software baseline: two-week comparison on a 36-node fleet

TL;DR

In a recent fleet analysis (docs.stereumlabs.com/blog/ec-p2p-peering-behavior-analysis) we observed that DevP2P peer composition on a Besu node was ~75% Reth despite Reth’s ~12% aggregate network share. Re-running the same measurement two weeks later, on identical EC and CC versions on the same hardware, gives ~96% Reth concentration on the same Besu instances. Reth’s connected peer count on our fleet dropped from ~123 to ~69, and its tracked peer table from ~7,400 to ~2,506, on the same Reth v2.1.0. We don’t have a clean explanation. The drift matters for any cascade analysis built on a static peer-graph snapshot.

What stayed the same

Variable Snapshot 1 (Apr 29) Snapshot 2 (May 12)
EC versions identical identical
CC versions identical identical
Hardware bare-metal, Vienna bare-metal, Vienna
Geth peer count 50 49.5
Nethermind peer count 50 50.1
Erigon peer count 42 41.9
Besu peer count 24 25.0

Five of six EC peer counts are flat. The drift is concentrated in Reth and in peer composition.

What changed

Reth’s own peer footprint shrank.

Metric Apr 29 May 12 Delta
Reth connected peers (7d avg) 123 69 -44%
Reth tracked peers (DHT) ~7,400 ~2,506 -66%
Reth backed-off peers ~540 ~989 +83%
Reth “Too many peers” rejection rate high (active limiter) ~0.006/s dropped to negligible

Same Reth version (v2.1.0). Same Reth process configuration on our side. The Reth peer table shrinking by two thirds and the connected count nearly halving suggests something changed in the broader network’s connection patterns toward Reth, not on the Reth-side configuration.

Besu’s Reth concentration intensified.

besu_peers_peer_count_by_client, 7-day rolling average, across all six CC-paired Besu instances on the same fleet:

Peer client Apr 29 share May 12 share Aggregate net share
Reth ~75% 95.9% ~12%
Erigon minor 1.5% ~5%
Nethermind minor 0.7% ~21%
Geth minor 0.4% ~50%

At 25 configured peers, Besu now sees ~24 Reth peers and roughly one of everything else combined. The amplification factor versus aggregate Reth share is now ~8x, up from ~6x.

The two shifts are not consistent in direction. Reth’s absolute presence in the network dropped (fewer Reth peers connected, smaller discovery table). Besu’s relative concentration on Reth increased. Whatever Reth peers remain are an even larger fraction of Besu’s view than before.

Side finding: CC pairing perturbs composition

Same EC version, same hardware, same datacenter, distinguished only by which CC the EC was booted alongside:

Besu Reth share by paired CC (May 12, 7d avg):

Besu paired with Reth share
Teku 99.6%
Lodestar 99.1%
Lighthouse 96.0%
Prysm 95.8%
Nimbus 95.2%
Grandine 89.6%

A 10 percentage point swing in single-client peer concentration on identical EC software. The DevP2P stack doesn’t interact with the CC, so the mechanism has to be indirect: differing boot timing, Engine API call patterns, and shared-host CPU/memory pressure during the early discovery window, where the discovery routing table is being seeded. Once routing-table entries stabilize, they create path dependency in which new peers get discovered.

Two operators running identical EC versions on identical hardware can end up with materially different peer compositions purely because of CC choice. There is no obvious signal from EC-side metrics to detect that this has happened.

Implications for cascade modeling

The standard framing of EL client diversity treats each client’s failure share as proportionally contained. If Reth fails, the network loses 12%. On a Besu node with 96% Reth peers and 25 configured peer slots, a Reth failure drops the local peer count to ~1, well below any reasonable threshold for reliable gossip propagation.

A cascade probability calculated against the April snapshot (75% concentration, 25 peer ceiling) predicts the Besu node retains ~6 peers after a Reth outage. Recalculated against the May snapshot, the predicted post-outage count is ~1. Same software, same fleet, two weeks apart. Cascade-tolerance modeling that uses a single point-in-time measurement is not robust to natural drift in the peer graph on stable software baselines.

What’s the right temporal aggregation for peer-composition data used in network security analysis? A monthly average smooths the drift but masks the structural concentration at any given point. A snapshot captures the structure but mis-predicts cascade outcomes weeks later.

What we want to know

Three things would help confirm or rule out this being a fleet-specific artefact:

  1. Other fleets. Has anyone with multi-region or non-bare-metal fleets observed a similar Reth peer-count drop and a parallel concentration increase on Besu-Reth or analogous pairings (e.g. Ethrex-Geth, which on our fleet shows ~69% Geth concentration)?

  2. Mechanism. Is the drift driven by Reth-side discovery behavior (churning stale ENRs faster than fresh ones are bonded), or by network-wide peer churn that disproportionately affects Reth nodes (e.g. inbound port reachability degrading on cloud-hosted Reth deployments)? Reth client-team perspective particularly welcome.

  3. CC-pairing variance. Is the 10pp Reth-share swing across Besu’s CC pairings reproducible elsewhere, or is it within natural fleet-to-fleet variance that a single-DC measurement can’t separate out?

Side proposal

Independent of how the drift is explained: the measurement above is currently only possible on two of six major ECs. besu_peers_peer_count_by_client and ethrex_p2p_peer_clients expose per-peer-client breakdown natively. Geth, Reth, Erigon, and Nethermind do not. Nethermind logs the breakdown to stdout every five minutes; the other three don’t expose it at all.

Mandating a <client>_peer_count_by_client Prometheus metric across all major ECs is a low-cost client-side change that would meaningfully unblock external measurement of peer-graph properties. Currently any cascade analysis of the EL network is blind to peer composition on roughly 88% of the aggregate network (Geth + Reth + Erigon + Nethermind shares combined).


Methodology note

36 bare-metal nodes in Vienna, Austria, covering every {EC × CC} combination from: Besu 26.4.0, Erigon v3.3.10, Ethrex 10.0.0, Geth v1.17.2, Nethermind 1.36.2, Reth v2.1.0 × Grandine 2.0.4, Lighthouse v8.1.3, Lodestar v1.42.0, Nimbus multiarch-v26.3.1, Prysm v7.1.3, Teku 26.4.0.

All metrics are 7-day rolling averages via avg_over_time(metric[7d:1h]) instant queries. Snapshot 1 ends April 29, 2026; Snapshot 2 ends May 12, 2026. EC and CC versions held constant across both windows. Per-EC peer metric names: ethereum_peer_count (Besu, Nethermind), p2p_peers (Geth, Erigon), reth_network_connected_peers (Reth), ethrex_p2p_peer_count (Ethrex).

Fleet inventory (per-VM EC and CC versions, hardware, location) at docs.stereumlabs.com/docs/scope/list-of-vms. Raw Prometheus at grafana.stereumlabs.com (datasource prometheus-cold) for verification.