A local-node-favoring delta to the scaling roadmap

Special thanks to Micah Zoltu, Toni Wahrstätter, Justin Traglia and pcaversaccio for discussion

The most common criticism of increasing the L1 gas limit, beyond concerns about network safety, is that it makes it harder to run a full node.

Especially in the context of a roadmap focused on unbundling the full node, addressing this requires an understanding of what full nodes are for.

Historically, the thinking has been that full nodes are for validating the chain; see here for my own exposition of what could happen if regular users cannot verify. If this is the only issue, then L1 scaling is unlocked by ZK-EVMs: the only limit is keeping the block building and proving costs low enough that both can remain 1-of-n censorship-resistant and a competitive market.

However, in reality this is not actually the sole concern. The other major concern is: it’s valuable to have a full node so that you can have a local RPC server that you can use to read the chain in a trustless, censorship-resistant and privacy-friendly way. This document will discuss adjustments to the current L1 scaling roadmap that make this happen.

Why not stop with trustlessness and privacy via ZK-EVM + PIR?

The privacy roadmap I published last month focuses on TEEs + ORAM as a short-term patch plus PIR as a long-term solution. This, together with Helios and ZK-EVM verification, would allow any user to connect to external RPCs and be fully confident that (i) the chain they are getting is correct, and (ii) their data privacy is protected. So it is worth asking the question: why not stop here? Don’t these kinds of advanced cryptographic solutions make self-hosted nodes an outdated relic?

Here I can give a few replies:

  • Fully trustless cryptographic solutions (ie. 1-server PIR) will be expensive. Currently the overhead is impractically high, and even after many efficiency improvements it is likely to stay expensive.
  • Metadata privacy. The data of which IP address makes requests at what times, and the pattern of requests, is itself enough to reveal a lot of information about users.
  • Censorship vulnerability: a market structure dominated by a few RPC providers is one that will face strong pressure to deplatform or censor users. Many RPC providers already exclude entire countries.

For these reasons, there is value in continuing to ensure greater ease of running a personal node.

Short-term priorities

  • Up-prioritize a full rollout of EIP-4444, all the way up to the final end state where each node stores data for only ~36 days. This greatly reduces disk space requirements, which are the primary issue preventing more people from running nodes. After this, the disk space requirements for a node will be (i) state size, (ii) state Merkle branches, (iii) 36 days of history.
  • Build a distributed history storage solution, by which each node can store a small percentage of historical data older than the cutoff. Use erasure coding to maximize robustness. This ensures the property that “a blockchain is forever” without depending on centralized providers or putting heavy burdens on node operators
  • Adjust gas pricing to make storage more expensive and execution less expensive. A particularly high priority is increasing the gas cost of creating new state: (i) SSTORE for new storage slots, (ii) contract code creation, (iii) sending ETH to accounts that do not yet have a balance or nonce.

Medium-term priority: stateless verification

Once we enable stateless verification, it becomes possible to run an RPC-capable node (ie. one that stores the state) without storing state Merkle branches. This further decreases storage requirements by ~2x.

A new type of node: partially stateless nodes

This is the new idea, and will be key for allowing personal node operation even in a context where the L1 gas limit grows by 10-100x.

We add a node type which verifies blocks statelessly, and verifies the whole chain (either through stateless validation or ZK-EVM) and keeps up-to-date a portion of the state. The node is capable of responding to RPC requests as long as the required data is within that subset of the state; other requests will fail (or have to fallback to an externally-hosted cryptographic solution; whether or not to do this should be the user’s choice).

The exact portion of the state to be held will depend on a config chosen by the user. Some examples might be:

  • All state except for contracts that are known to be spam
  • State associated with all EOAs and SCWs and all commonly used ERC20 and ERC721 tokens and applications
  • State associated with all EOAs and SCWs that have been accessed in the last two years, some commonly used ERC20 tokens, plus a limited curated set of swap, defi and privacy applications

The config could be managed by an onchain contract: a user would run their node with --save_state_by_config 0x12345...67890, and the address would specify in some language a list of addresses, storage slots or otherwise filtered regions of the state that the node would save and keep up to date. Note that there is no need for the user to save Merkle branches; they only need to save the raw values.

This type of node would give the benefits of direct local access to the state that a user needs to care about, as well as maximal full privacy of access to that state.

8 Likes

Interesting!

Excuse my silliness, but why do you offer to save the config onchain? Why not have a free and private config in a file?

1 Like

Makes sense.
The main goal should be to make L1 fast enough to enable on-chain price discovery for assets. L1 should serve as the global settlement layer, even for all L2 prices. Assets seeking hard settlement guarantees should originate on L1.

I believe this approach can achieve that state, but we need to recognize that many people, including myself, prefer to hold major assets on L1 rather than L2s.

3 Likes

it’s a convenience feature to allow a third-party actor (could be a DAO) to keep the config up to date.

Pretty interesting read — thanks for sharing!

I’m curious about the longer-term history though. What if someone wants to verify transactions from, say, five years ago on Etherscan?

Would that still be possible after implementing a mechanism that retains history for only 36 months?

Sorry, but 4tb nvme cost is 300-500usd.
Main blocker from running a node is the size of 32ETH deposit.
Running node with x10(at least) less deposit will onboard whole new wave of home node operators

It is 36 days.

The goal here is to ease out the local node running process. I think the global data would still be dependent on RPCs.

1 Like

Cool ideas!

Rather than DAO or on-chain config managing on what state the node is keeping, I would give this power to the node operators. We could have an UI that would let user select and add what ever addresses to maintain state for. You could have a DAO managing address metadatas, eg uniswap pools = [0x23, 0x12], tornadoCash = [0x23…]. And the user could then check the checkboxes which state to manage.

A good example of this is IPFS desktop, that lets users to select which data to keep. The user should be able to select and deselect these addresses at any time and the node would start to adapt to users preferences (slowly, but eventually as retrieving new data from decentralized network takes time). IPFS has wanted_list system that manages this. You could also have some private RPC methods that allow you to manage on what data to keep and what not to keep.

1 Like

You need 0 eth to run a node, there’s no 32 ETH deposit requirements. The 32 ETH is required to run a validator. This thread is about running nodes, not validators.

6 Likes

If we’re on the topic of more at home setups, it should be noted we probably need to draw down ingress/egress. Depending on where you are the concept of unlimited bandwidth might not exist.

I’m aware there are efforts to potentially erasure code, and do more structure gossiping on the p2p level but would probably bake this in as a concern for this sub group of node operators.

1 Like

You are talking about stakers. This post is talking about end-users of Ethereum who need the ability to execute against state in a censorship resistant and privacy preserving way. These people do not have 32 ETH, and most of them may not have $500 to spend.

This relates to some recent thoughts we had around Validity-Only Partial Statelessness (VOPS): nodes keeping just enough data to ensure they can maintain a healthy mempool to ensure we can scale while preserving CR. This would result in just storing the account data, and gives us 25x storage reduction.

I really like the general approach of providing more flexibility in what partial-state nodes choose or need to store, depending on the use case:

  • Nodes actively participating in protocol duties (e.g., attesters, FOCIL includers) could be required to store at least the state necessary to maintain the mempool (VOPS).
  • However, there shouldn’t be strict requirements for nodes that simply want to read the chain in a trustless, CR-preserving, and privacy-friendly manner. They should just be able to keep as much state as they want depending on how often they would have to query state from elsewhere. But then the “elsewhere” part is still very important, and we’re trying to see if Portal could provide a good enough solution for random access lookups there.
3 Likes

This could be a good use for something like GitHub - Austin-Williams/cid-accumulator-monorepo: Trustless, decentralized CID accumulator for smart contracts —append, verify, and retrieve all your on-chain data via IPFS., so your on-chain updates would be on the order of 10,000s of gas with ~0 state growth, but it would perpetually link to a file that was maintained on the IPFS network by anonymous 3rd parties.

It is limited to append only, but you could split the files into 12 month epochs, and each file would contain a journal of additions and removals from which you can rebuild the final set for each epoch. One could imagine some way to materialize the epochs into a different format via social layer so you have a base list + accumulator over the course of a year.

This I assume is opening the door for Wallets or Dapps to follow some sort of “Dapp storage support” where they store their part of the state and serve to their users while at the same time, they keep the nodes to assert correct root.

Then, they can serve RPC requests to their users (who can follow the chain with something like VOPS.

So, in short, are wallets and Dapps the parties who we pretend to handle over the state storing and the RPC serving wrt their protocol-related data?
If so (and I think this makes sense) within stateless-consensus we’re carrying over a survey with Wallets and Dapps for stateless-related questions.

Would you be up for some feedback and proposing questions we should ask them or things/opinions/takes from their side?
One of our fears has always been that Dapps (specially good and useful ones) are a scarce resource. And putting more burden on them to build this partial RPC servers might be complex.

What’s your take?

1 Like