Execution & Consensus Client Bootnodes

randomishwalk · April 14, 2023, 1:36pm

I think that’s a great idea and somewhat of a natural choice given existing devops expertise.

Yet another option, which seems to be more the case in the MEV-Boost relay space, are independent, non-client team affiliated infrastructure operators (Agnostic and USM being two examples on the MEV-boost relay side). Experienced folks from the ethstaker community, for example, might be one natural fit for something like this.

pcaversaccio · April 14, 2023, 1:52pm

I think that’s a good idea, and similar to what I have been thinking since the beginning of this thread. The reason why I haven’t reached out so far is twofold:

I first wanted to gather various ideas in this thread and decide on the action plan,
Understand how such an action plan can be efficiently coordinated, since I don’t want to end up with a situation where Geth implements a couple of bootnodes, Erigon & co. don’t and just re-use the Geth bootnodes (as done currently). Any ideas on how to approach this best?

100%.

Great point - anyone has a direct line to @superphiz?

petertdavies · April 14, 2023, 2:10pm

I think you need to explain what “extreme censoring event” means concretely. My Erigon client has 35Mb database of previously seen peers. I find it hard to conceive of a situation where it cannot reconnect to the network using at least one peer in that file after a reboot and the network isn’t already broken anyway.

pcaversaccio · April 14, 2023, 2:46pm

For example, a supranationally coordinated censorship attack via ISPs. Let’s say your 35MB database consists of European and US peers and the attack is launched via Europe and the US, then you have a problem. Or there is a massive DDoS attack that prevents your peers from helping you resync. I think it’s important to emphasise that we need to build a censorship-resistant infrastructure that is future-proof, which means we also need to be prepared for the unimaginable. The argument “If we are in such an extreme situation, the world has bigger problems anyway” is not satisfactory. It is not only about the current situation but about all possible future scenarios (even if they are unlikely). I’d rather spend some time thinking about an appropriate solution now than regret in a decade that we did nothing.

parithosh · April 17, 2023, 12:23pm

I would argue this isn’t really a problem. A large portion of the discovery process includes EIP-1459, which is Node Discovery via DNS. What happens is that a crawler crawls through discoverable nodes on the network and at a regular cadence updates the domain ethdisco . net. The data can then be used to find a “dynamic” list of peers via DNS records. The raw data dump can be found on the github repo ethereum/discv4-dns-lists as mentioned in some of your older posts.

So in a scenario in which all nodes listed about on centralized providers are taken out, the network will still be up and function via the DNS based discovery. Assuming that Ethereum is actively being censored everywhere and DNS discovery isn’t enough, every user is more than welcome to add in peers that can be shared publically through the --bootnodes or similar flag present in every client, these can be shared by various forums or methods to end users. Assuming DNS is purely being censored for specific domains such as ethdisco, then users can opt to switch DNS providers or run their own recursive revolvers. More entities are also welcome to setup their DNS records in a similar manner to ethdisco, docs on the topic can be found here: DNS Discovery Setup Guide | go-ethereum

Additinally any sort of active attack on the discovery/bootnode layer of things will not break the network immediately. It’ll purely break new nodes wanting to join the network or restarted nodes, the network will continue to function as expected for already peered nodes. This would also imply we have some time to react in such a scenario.

MicahZoltu · April 17, 2023, 12:49pm

So in a scenario in which all nodes listed about on centralized providers are taken out, the network will still be up and function via the DNS based discovery.

I think our threat models are different here. Any attacker capable of compromising all of the boot nodes I feel like would be capable of compromising a single DNS address (likely as many as they want). I suppose this becomes less true as we reduce down to a single service provider (e.g., AWS), but even compromising both Hetzner and AWS seems harder than compromising DNS?

remyroy · April 18, 2023, 7:28pm

EthStaker would be happy to become a bootnode.

We are already running a public checkpoint sync endpoint and we could easily add the configuration to use that node as another bootnode.

I can be a contact point for this. Simply contact me on Discord (Remy Roy#1837) or Twitter (remy_roy) for private DMs.

pcaversaccio · April 19, 2023, 7:52am

Very happy to hear this! May I ask exactly how this node is currently hosted (i.e. bare metal, location, etc.)?

I think we should have a similar page as Ethereum Beacon Chain checkpoint sync endpoints for the bootnodes used by the clients, and it’s possible to add further community bootnodes (in a separate section) that folks can use via --bootnodes or similar flag (this would require some people maintain the GitHub repo for this page to approve/dismiss PRs; I would volunteer for that for sure, but at least 2-3 approvals needed for each PR that adds an additional community-based bootnode link).

I was wondering whether there is someone here who could support me to add at least someone from Geth, Nethermind, Erigon, Besu, Lighthouse, Lodestar, Nimbus, Prysm and Teku team to this thread so hopefully each of the EL/CL client teams could serve at least on bootnode itself. Maybe @Souptacular?

remyroy · April 19, 2023, 2:49pm

It is a baremetal dedicated machine running in northeast NA.It is running on unmetered network, hosted by a reliable business. It has more than enough resources to run a full Ethereum node. I’m happy to share more details in private.

michaelsproul · April 19, 2023, 11:54pm

The consensus client teams are all running consensus bootnodes for discv5, you can find the list of ENRs here: lighthouse/boot_enr.yaml at a53830fd60a119bf3f659b253360af8027128e83 · sigp/lighthouse · GitHub

You can extract the IPs using enr-cli read <enr-string>.

The Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia, so we’re contributing a little bit of hosting + geographic diversity. I imagine hosting on bare metal servers operated entirely by our team would be infeasible, but we could think about it. Running an execution bootnode may also be an option for us, if that would be deemed useful.

pcaversaccio · April 20, 2023, 9:31am

@michaelsproul thanks a lot for the insights. Would it be possible to add the specified location in the boot_enr.yaml file to each consensus bootnode? That would already increase transparency a lot.

At first glance, Linode Australia seems like a good solution, but after I did a little research, Linode was taken over by Akamai Technologies Inc. which in turn runs under US law. So the advantage of geographical diversity is kind of gone.

I think this makes perfect sense. Tbh, I would prefer the following solution: each EL/CL client runs at least each 2 EL and CL bootnodes. 1 of each of the 2 can (but does not have to) be cloud-based, and the other EL and CL bootnodes must each be on bare metal (preferably outside the US; e.g. Switzerland or Sweden might be good countries). So we have the following distribution:

4 EL clients run 2 EL and 2 CL bootnodes each, 50% of which should run on bare metal outside the US. So a total of 4 \times 2 \times 2 = 16 bootnodes (at least, of which 8 are on bare metal).
The same logic applies to the CL clients: 5 \times 2 \times 2 = 20 bootnodes (at least, of which 10 are on bare metal).
All additional bootnodes (whether EL or CL helps, of course).
In addition, there will be community-based bootnodes (EL & CL) (like EthStaker or Lido?) that will be carefully vetted and will enrich this list. If anyone has a contact at Lido, I would appreciate it if this thread could be forwarded.

Happy to hear any feedback or better ideas.

@holiman what are the thoughts from the Geth team on the above suggestion?

holiman · April 20, 2023, 10:40am

I think I already voiced my opinion (not speaking on behalf of the geth-team, just myself):

In general though: If EF-controlled bootnodes are seens as ‘critical infrastructure’ then we should remove them, because the network needs to get by without central points of failure.

pcaversaccio · April 21, 2023, 8:03am

UPDATE (21 April 2023)

Since I cannot edit my older posts, I will add a new comment here that now entails the full list of execution and consensus bootnodes including links.

Overview Execution Clients

Go-Ethereum

Mainnet bootnodes: go-ethereum/bootnodes.go.
4 bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
In this commit params: remove EF azure bootnodes (#26828) the 4 Azure bootnodes got removed.

Nethermind

Mainnet bootnodes: nethermind/foundation.json.
34 bootnodes running. 4 of the 32 bootnodes are the Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
For the remaining 28 bootnodes, I still couldn’t find the hosting locations. However, they use the same bootnodes as in the original Parity client: trinity/constants.py. Nonetheless, all without information on where hosted.
In this commit Remove deprecated EF bootnodes (#5408) the 4 Azure bootnodes got removed.

Erigon

Mainnet bootnodes: erigon/bootnodes.go.
The 4 Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
In this commit params: remove EF azure bootnodes (#7061) the 4 Azure bootnodes got removed.

Besu

Mainnet bootnodes: besu/mainnet.json.
14 Bootnodes running. 4 of the 10 bootnodes are the Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4). Additionally, 5 legacy Geth and 1 C++ bootnodes are listed. Nonetheless, all without information on where hosted.
In this commit Remove deprecated EF bootnodes (#5194) the 4 Azure bootnodes got removed.

Overview Consensus Clients

Lighthouse

Mainnet bootnodes: lighthouse/boot_enr.yaml.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

Lodestar

Mainnet bootnodes: lodestar/mainnet.ts.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

Nimbus

Mainnet bootnodes (pulled via submodule): eth2-networks/bootstrap_nodes.txt.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

Prysm

Mainnet bootnodes: prysm/mainnet_config.go.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

Teku

Mainnet bootnodes: teku/Eth2NetworkConfiguration.java.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

pcaversaccio · April 21, 2023, 8:05am

For the sake of transparency, I made a post in the Lido forum here in order to get them involved into this discussion as well. I would like to thank @remyroy for making the connection to Lido.

michaelsproul · May 4, 2023, 5:50am

We probably don’t need anyone to volunteer this information. The cloud provider for an IP can be fairly easily determined by a reverse DNS lookup like: dig -x A.B.C.D. The IPs can be fetched from the ENRs using enr-cli read as I previously mentioned. Maybe you could whip up a script to do it @pcaversaccio?

pcaversaccio · May 4, 2023, 11:00am

@michaelsproul done - the below summary (on an ENR basis) can be found also here. Seems AWS (mostly US) is currently ensuring that no liveness failure is happening on Ethereum ;). I’m pretty sure we can all do a better job here when it comes to greater geographic and provider diversity.

IPs and Locations

Teku team’s bootnodes

3.19.194.157 | aws-us-east-2-ohio
3.19.194.157 | aws-us-east-2-ohio

Prylab team’s bootnodes

18.223.219.100 | aws-us-east-2-ohio
18.223.219.100 | aws-us-east-2-ohio
18.223.219.100 | aws-us-east-2-ohio

Lighthouse team’s bootnodes

172.105.173.25 | linode-au-sidney
139.162.196.49 | linode-uk-london

EF bootnodes

3.17.30.69 | aws-us-east-2-ohio
18.216.248.220 | aws-us-east-2-ohio
54.178.44.198 | aws-ap-northeast-1-tokyo
54.65.172.253 | aws-ap-northeast-1-tokyo

Nimbus team’s bootnodes

3.120.104.18 | aws-eu-central-1-frankfurt
3.64.117.223 | aws-eu-central-1-frankfurt

Edit: I have opened a PR that adds this information to the eth2-networks repo.

pcaversaccio · May 5, 2023, 12:07pm

FWIW, Prysm does for example not, they would discover a new set of peers again on a restart. See the comment here.

MicahZoltu · May 5, 2023, 12:29pm

I don’t have an account over there so I’ll reply here:

On the point brought up in this discussion, in the event of a coordinated censoring event across regions ( where all bootnodes are taken down) a straightforward solution would be to simply share a new list of enrs for nodes to boot from.

I think the attack vector of interest here isn’t that the bootnodes are offline/unavailable, but they are controlled and give the attacker the ability to partition the network. It certainly seems like the right thing for Prysm to do is retain its peer list from previous session and then use it just as a sort of “updated list of bootnodes” and then discover a new set of peers from there (which, on restart, would yield a new set of bootnodes).

IMO, the hard coded bootnodes should only be used on first run to establish an initial connection to the network, but the bootnode list from that point on should be dynamic based on prior runs. This makes it much harder to meaningfully capture the network.

For a concrete example attack, imagine someone has two 0-days in their pocket (neither of these are particularly far fetched for a state actor):

They have the ability to takeover/control bootnodes.
They have the ability to crash Prysm clients on the network (causing them all to restart).

This attacker now has the ability to eclipse all Prysm nodes. If our client diversity numbers are high enough this isn’t too big of a problem, but if they aren’t then this could lead to a fork should the attacker desire. Keep in mind, once you have successfully eclipsed a node, you need not reveal this immediately. You can sit on your eclipse and not leverage it until an opportunity presents itself (like you gain the ability to eclipse Teku as well for example, then you attack once you have 66% of stake eclipsed).

philknows · May 5, 2023, 8:13pm

Great initiative. The Lodestar team at ChainSafe can look into seeing where we can be most useful and try to run both a consensus and execution bootnode. Though currently, a lot of our infrastructure is unfortunately cloud based with larger cloud based companies. For something like this I’d seek to explore what options we have for bare metal in smaller datacenters where it’s still reliable, but not exposed to the same geographical and/or large entity risks that we currently have here. Any leads, ideas and connections are appreciated!

pcaversaccio · May 6, 2023, 6:58am

@philknows very happy to hear this. I think @gsalberto can probably assist here and might also refer further similar local providers (as it seems latitude.sh has only one location (London) in Europe as I can see) (posting his answer from above here):

Hey Guys,

@randomishwalk sent this thread to me over twitter and I am jumping here as I can potentially help with distributed global infrastructure

latitude.sh, bare metal company I operate, run in 15 locations (9 of them being our of the US) - Global regions to deploy dedicated servers and custom projects - Latitude.sh

Happy to chat more