Execution & Consensus Client Bootnodes

pcaversaccio · January 30, 2023, 1:20pm

If everything works as intended, I would agree. But they can become pivotal in an extreme censoring event. And my threat model tells me that in the current state of the world we should think about this possibility.

that’s exactly the point - the overall preferred solution should be to have a situation where we could completely remove the current kind of centralised initial trusted setup. However, since DNS discovery is also centralised (for anyone interested, see here for the list) and can be affected by an extreme censoring event, I think having globally distributed, EF-independent bootnodes serving as last resort rescue is the best solution.

pcaversaccio · April 14, 2023, 9:30am

UPDATE (14 April 2023)

TL;DR: 4 EF Azure bootnodes got removed since my original post. Now mostly dependent on AWS and Hetzner (silently screaming inside!).

Overview Execution Clients

Go-Ethereum

Mainnet Bootnodes: go-ethereum/bootnodes.go at e14043db71c5d2d91520fab217302fcecf7aa939 · ethereum/go-ethereum · GitHub
4 bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
In this commit params: remove EF azure bootnodes (#26828) · ethereum/go-ethereum@e14043d (github.com) the 4 Azure bootnodes got removed.

Nethermind

Mainnet Bootnodes: nethermind/foundation.json at 64608b94bfd08793e84eb9d90028aafef7efe684 · NethermindEth/nethermind · GitHub
34 bootnodes running. 4 of the 32 bootnodes are the Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
For the remaining 28 bootnodes, I still couldn’t find the hosting locations. However, they use the same bootnodes as in the original Parity client: trinity/constants.py at master · ethereum/trinity (github.com). However, all without information on where hosted.
In this commit Remove deprecated EF bootnodes (#5408) · NethermindEth/nethermind@7d6215d (github.com) the 4 Azure bootnodes got removed.

Erigon

Mainnet Bootnodes: erigon/bootnodes.go at 7258a2b872710d6fee9b8e9f4ba617917e8f0e74 · ledgerwatch/erigon · GitHub
The 4 Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
In this commit params: remove EF azure bootnodes (#7061) · ledgerwatch/erigon@43960fe (github.com) the 4 Azure bootnodes got removed.

Besu

Mainnet Bootnodes: besu/mainnet.json at e6395c3af3012b4c4eeee6b2241486a7863b47c7 · hyperledger/besu · GitHub
14 Bootnodes running. 4 of the 10 bootnodes are the Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4). Additionally, 5 legacy Geth & 1 C++ bootnode is listed. However, all without information on where hosted.
In this commit Remove deprecated EF bootnodes (#5194) · hyperledger/besu@7afc035 (github.com) the 4 Azure bootnodes got removed.

Souptacular · April 14, 2023, 10:13am

I love that this discussion is coming back around. I am not convinced that it is as much of an issue as it used to be (because devp2p improvements happened like better DNS discovery so bootnodes aren’t even necessarily needed). At the same time, I’d love to see a solution for a decentralized way to have bootnodes that don’t also increase the risk of bad actors compromising the alternative bootnodes.

I co-lead the DevOps team at the EF from 2016-2021 (and had a 3 year stint in the middle of that as orgsec lead after Martin Swende). I won’t explain the entire security and set up for the bootnodes for security reasons, but I would find it super unlikely that the bootnodes could be compromised via a hack (so a hacker changes the geth bootnode to a bad geth node that makes a split chain) or that a sustained dos attack could happen to the nodes (because the EF would be able to respond and mitigate any attacks or at worst rebuild the whole thing in under half an hour not counting sync time).

I’m definitely open to hearing solutions, but I’m not convinced adding more entities besides the EF is the right way considering the low risk and the potential to open more possibilities for exploit of the bootnodes.

pcaversaccio · April 14, 2023, 10:20am

My issue here is that I don’t have transparency about why, for example, the EF is able to act as you claim. Security through opacity doesn’t work well, and I understand that you can’t disclose all information for security reasons either. The current situation is like: please trust the EF that we’re doing our job properly. I’m not saying this is not the case, but the required information is (at least publicly) not available. Also, peers can always be censored via ISPs so it’s important to have globally distributed bootnodes available to preserve the censorship-resistance core value and resilience of Ethereum as a whole.

Souptacular · April 14, 2023, 1:18pm

I agree with everything you are saying, but I don’t think the risk is high enough for the ones responsible for the boot nodes to act compared to other pressing issues that would affect Ethereum at the protocol/safety level more. Note: I’m not really involved in that deeply anymore so it’s up to them, I’m just relaying what I suspect they will react.

At the same time that shouldn’t mean we ditch your ideas because they do help.

One idea: start asking individual client teams to set up bootnodes that are geographically diverse and with a common set of standards that include what you propose (geographically diverse, bare metal, etc.). It shouldn’t be the responsibility of the EF to do this entirely and adding bootnodes once the other client teams create them is as simple as a PR on each client. I think other EL client teams would be very open to this and may already have testing infra that can be converted to supporting their own bootnodes as well.

randomishwalk · April 14, 2023, 1:36pm

I think that’s a great idea and somewhat of a natural choice given existing devops expertise.

Yet another option, which seems to be more the case in the MEV-Boost relay space, are independent, non-client team affiliated infrastructure operators (Agnostic and USM being two examples on the MEV-boost relay side). Experienced folks from the ethstaker community, for example, might be one natural fit for something like this.

pcaversaccio · April 14, 2023, 1:52pm

I think that’s a good idea, and similar to what I have been thinking since the beginning of this thread. The reason why I haven’t reached out so far is twofold:

I first wanted to gather various ideas in this thread and decide on the action plan,
Understand how such an action plan can be efficiently coordinated, since I don’t want to end up with a situation where Geth implements a couple of bootnodes, Erigon & co. don’t and just re-use the Geth bootnodes (as done currently). Any ideas on how to approach this best?

100%.

Great point - anyone has a direct line to @superphiz?

petertdavies · April 14, 2023, 2:10pm

I think you need to explain what “extreme censoring event” means concretely. My Erigon client has 35Mb database of previously seen peers. I find it hard to conceive of a situation where it cannot reconnect to the network using at least one peer in that file after a reboot and the network isn’t already broken anyway.

pcaversaccio · April 14, 2023, 2:46pm

For example, a supranationally coordinated censorship attack via ISPs. Let’s say your 35MB database consists of European and US peers and the attack is launched via Europe and the US, then you have a problem. Or there is a massive DDoS attack that prevents your peers from helping you resync. I think it’s important to emphasise that we need to build a censorship-resistant infrastructure that is future-proof, which means we also need to be prepared for the unimaginable. The argument “If we are in such an extreme situation, the world has bigger problems anyway” is not satisfactory. It is not only about the current situation but about all possible future scenarios (even if they are unlikely). I’d rather spend some time thinking about an appropriate solution now than regret in a decade that we did nothing.

parithosh · April 17, 2023, 12:23pm

I would argue this isn’t really a problem. A large portion of the discovery process includes EIP-1459, which is Node Discovery via DNS. What happens is that a crawler crawls through discoverable nodes on the network and at a regular cadence updates the domain ethdisco . net. The data can then be used to find a “dynamic” list of peers via DNS records. The raw data dump can be found on the github repo ethereum/discv4-dns-lists as mentioned in some of your older posts.

So in a scenario in which all nodes listed about on centralized providers are taken out, the network will still be up and function via the DNS based discovery. Assuming that Ethereum is actively being censored everywhere and DNS discovery isn’t enough, every user is more than welcome to add in peers that can be shared publically through the --bootnodes or similar flag present in every client, these can be shared by various forums or methods to end users. Assuming DNS is purely being censored for specific domains such as ethdisco, then users can opt to switch DNS providers or run their own recursive revolvers. More entities are also welcome to setup their DNS records in a similar manner to ethdisco, docs on the topic can be found here: DNS Discovery Setup Guide | go-ethereum

Additinally any sort of active attack on the discovery/bootnode layer of things will not break the network immediately. It’ll purely break new nodes wanting to join the network or restarted nodes, the network will continue to function as expected for already peered nodes. This would also imply we have some time to react in such a scenario.

MicahZoltu · April 17, 2023, 12:49pm

So in a scenario in which all nodes listed about on centralized providers are taken out, the network will still be up and function via the DNS based discovery.

I think our threat models are different here. Any attacker capable of compromising all of the boot nodes I feel like would be capable of compromising a single DNS address (likely as many as they want). I suppose this becomes less true as we reduce down to a single service provider (e.g., AWS), but even compromising both Hetzner and AWS seems harder than compromising DNS?

remyroy · April 18, 2023, 7:28pm

EthStaker would be happy to become a bootnode.

We are already running a public checkpoint sync endpoint and we could easily add the configuration to use that node as another bootnode.

I can be a contact point for this. Simply contact me on Discord (Remy Roy#1837) or Twitter (remy_roy) for private DMs.

pcaversaccio · April 19, 2023, 7:52am

Very happy to hear this! May I ask exactly how this node is currently hosted (i.e. bare metal, location, etc.)?

I think we should have a similar page as Ethereum Beacon Chain checkpoint sync endpoints for the bootnodes used by the clients, and it’s possible to add further community bootnodes (in a separate section) that folks can use via --bootnodes or similar flag (this would require some people maintain the GitHub repo for this page to approve/dismiss PRs; I would volunteer for that for sure, but at least 2-3 approvals needed for each PR that adds an additional community-based bootnode link).

I was wondering whether there is someone here who could support me to add at least someone from Geth, Nethermind, Erigon, Besu, Lighthouse, Lodestar, Nimbus, Prysm and Teku team to this thread so hopefully each of the EL/CL client teams could serve at least on bootnode itself. Maybe @Souptacular?

remyroy · April 19, 2023, 2:49pm

It is a baremetal dedicated machine running in northeast NA.It is running on unmetered network, hosted by a reliable business. It has more than enough resources to run a full Ethereum node. I’m happy to share more details in private.

michaelsproul · April 19, 2023, 11:54pm

The consensus client teams are all running consensus bootnodes for discv5, you can find the list of ENRs here: lighthouse/boot_enr.yaml at a53830fd60a119bf3f659b253360af8027128e83 · sigp/lighthouse · GitHub

You can extract the IPs using enr-cli read <enr-string>.

The Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia, so we’re contributing a little bit of hosting + geographic diversity. I imagine hosting on bare metal servers operated entirely by our team would be infeasible, but we could think about it. Running an execution bootnode may also be an option for us, if that would be deemed useful.

pcaversaccio · April 20, 2023, 9:31am

@michaelsproul thanks a lot for the insights. Would it be possible to add the specified location in the boot_enr.yaml file to each consensus bootnode? That would already increase transparency a lot.

At first glance, Linode Australia seems like a good solution, but after I did a little research, Linode was taken over by Akamai Technologies Inc. which in turn runs under US law. So the advantage of geographical diversity is kind of gone.

I think this makes perfect sense. Tbh, I would prefer the following solution: each EL/CL client runs at least each 2 EL and CL bootnodes. 1 of each of the 2 can (but does not have to) be cloud-based, and the other EL and CL bootnodes must each be on bare metal (preferably outside the US; e.g. Switzerland or Sweden might be good countries). So we have the following distribution:

4 EL clients run 2 EL and 2 CL bootnodes each, 50% of which should run on bare metal outside the US. So a total of 4 \times 2 \times 2 = 16 bootnodes (at least, of which 8 are on bare metal).
The same logic applies to the CL clients: 5 \times 2 \times 2 = 20 bootnodes (at least, of which 10 are on bare metal).
All additional bootnodes (whether EL or CL helps, of course).
In addition, there will be community-based bootnodes (EL & CL) (like EthStaker or Lido?) that will be carefully vetted and will enrich this list. If anyone has a contact at Lido, I would appreciate it if this thread could be forwarded.

Happy to hear any feedback or better ideas.

@holiman what are the thoughts from the Geth team on the above suggestion?

holiman · April 20, 2023, 10:40am

I think I already voiced my opinion (not speaking on behalf of the geth-team, just myself):

In general though: If EF-controlled bootnodes are seens as ‘critical infrastructure’ then we should remove them, because the network needs to get by without central points of failure.

pcaversaccio · April 21, 2023, 8:03am

UPDATE (21 April 2023)

Since I cannot edit my older posts, I will add a new comment here that now entails the full list of execution and consensus bootnodes including links.

Overview Execution Clients

Go-Ethereum

Mainnet bootnodes: go-ethereum/bootnodes.go.
4 bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
In this commit params: remove EF azure bootnodes (#26828) the 4 Azure bootnodes got removed.

Nethermind

Mainnet bootnodes: nethermind/foundation.json.
34 bootnodes running. 4 of the 32 bootnodes are the Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
For the remaining 28 bootnodes, I still couldn’t find the hosting locations. However, they use the same bootnodes as in the original Parity client: trinity/constants.py. Nonetheless, all without information on where hosted.
In this commit Remove deprecated EF bootnodes (#5408) the 4 Azure bootnodes got removed.

Erigon

Mainnet bootnodes: erigon/bootnodes.go.
The 4 Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4).
In this commit params: remove EF azure bootnodes (#7061) the 4 Azure bootnodes got removed.

Besu

Mainnet bootnodes: besu/mainnet.json.
14 Bootnodes running. 4 of the 10 bootnodes are the Geth bootnodes running on AWS (2 out 4) and Hetzner (2 out 4). Additionally, 5 legacy Geth and 1 C++ bootnodes are listed. Nonetheless, all without information on where hosted.
In this commit Remove deprecated EF bootnodes (#5194) the 4 Azure bootnodes got removed.

Overview Consensus Clients

Lighthouse

Mainnet bootnodes: lighthouse/boot_enr.yaml.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

Lodestar

Mainnet bootnodes: lodestar/mainnet.ts.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

Nimbus

Mainnet bootnodes (pulled via submodule): eth2-networks/bootstrap_nodes.txt.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

Prysm

Mainnet bootnodes: prysm/mainnet_config.go.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

Teku

Mainnet bootnodes: teku/Eth2NetworkConfiguration.java.
13 bootnodes running. The 2 Lighthouse (Sigma Prime) bootnodes are currently hosted on Linode in Australia (information via this comment). Additionally, 4 EF, 2 Teku, 3 Prysm and 2 Nimbus bootnodes are listed. Nonetheless, all without information on where hosted.

pcaversaccio · April 21, 2023, 8:05am

For the sake of transparency, I made a post in the Lido forum here in order to get them involved into this discussion as well. I would like to thank @remyroy for making the connection to Lido.

michaelsproul · May 4, 2023, 5:50am

We probably don’t need anyone to volunteer this information. The cloud provider for an IP can be fairly easily determined by a reverse DNS lookup like: dig -x A.B.C.D. The IPs can be fetched from the ENRs using enr-cli read as I previously mentioned. Maybe you could whip up a script to do it @pcaversaccio?