ReGenesis - resetting Ethereum to reduce the burden of large blockchain and state

Lessons from Cosmos Hub

If you observed how Cosmos Hub performed their upgrade from version 1 to version 2, and then from version 2 to version 3, you would know that it was essentially done via re-starting the blockchain from a fresh genesis. Upon the upgrade, node operators had to shut down their nodes, then generate a snapshot of the state of Cosmos Hub, and then effectively use that snapshot as a genesis for launching a new blockchain, from block 1.
Now anyone who wants to join Cosmos Hub, needs to acquire genesis for CosmosHub-3, download all the blocks of CosmosHub-3 (but not CosmosHub-1 or CosmosHub-2) and replay them.

Can we “relaunch” Ethereum 1?

Lets look at a hypothetical application of such method in Ethereum, where we have a really large blockchain (150-160Gb), as well as fairly large state (40-100Gb depending on how you store it). The obvious gain of such “relaunch” would be that the new joiner nodes will need to start with the 40Gb genesis state and not with 150 Gb worth of blocks. But downloading 40 Gb genesis is still not a great experience.

State in Ethereum is implicit, only its merkle root hash is explicit

Let us now assume that we can make these 40 Gb implicitly stored “off-chain” and only the root hash being used as the genesis. And let us also start with the empty state. How do we get transactions access part of the implicit state?

Bear in mind that even now the 40 Gb is implicit and the exact way of acquiring it is an implementation detail. You can run through all 10 million blocks to compute it, or you can download its snapshot via fast sync, or warp sync, or you can even copy it from someone’s external disk and then re-verify. Although the state is implicit, we assume that the block composers (mining pools usually) have access to that implicit state and will always be able to process all transactions. What we are removing is the assumption that all other validating nodes have the access to that implicit state to check that the transactions in the block are valid and the state root hash presented in the block header matches up with the result of the execution of that block.

Isn’t it Stateless Ethereum?

If you followed Stateless Ethereum at all, you may recognise that this is exactly what we are trying to do there - leave the assumption that block composers have access to the implicit state, but remove the assumption that all validating nodes have the same access. We propose to do it by obligating the block composers to add extra proofs in the block, and we call these proofs “block witness”.

Proofs in the blocks vs proofs in the transactions?

When people first learn about this, they assume that these extra proofs are indeed provided by the transaction senders, and become part of transaction payload, but we have to explain to them that no, it is block composers’ job. But later we discovered that transactions will have to include some extra proof. Namely they will need to prove that the sending address has enough ETH to purchase the gas for this transaction, and also for all other transactions from this account but with lower nonces. They might also need to prove the nonce of the sending account, so that the node can figure out if there are any nonce gaps and therefore potential DDOS attack via a storm of non-viable transactions. More stringent checks can be done, but knowing the ETH balance and the nonce of the sending account is the necessary (but perhaps not sufficient) info for most of the DDOS-protection heuristics.

Critique of proofs in the transactions

Let us now imagine that we do, in fact, want transaction senders to include proofs for all the pieces of state that the transaction touches. This would be nice because it would simplify our work on trying to charge extra gas for witnesses. The main critique of this is usually to do with Dynamic State Access (DSA, as opposed to Static State Access - SSA). If transaction goes to a particularly complex smart contract, specifically, if it involves lots of nested calls to other contracts, it might be that the items of state that this transaction is going to touch, is hard to pre-calculate. And it can even be used to “trap” a user by front-running their transactions and making them fail due to insufficient proofs.

ReGenesis provides mitigation

The concern about DSA can not be easily solved completely, but it can be sufficiently mitigated to the point that users will very rarely see the inconvenience, and never get permanently “trapped” into never being able to achieve their desired state transition. The mitigation relies on the extra rule that any proof provided with a transaction, which checks out against the state root (but is not necessarily sufficient for the transaction to be successful), becomes part of the implicit state. Therefore, repeated attempts by the user to execute the transaction will keep growing the implicit state, and eventually, it will succeed. Any attacker that tries to “trap” the user, will have to come up with more sophisticated ways to redirect the transaction’s state access outside of the implicit state, and eventually, the attacker will fail.

As the implicit state grows from nothing (just after the “relaunching”) to include more and more of the actively accessed state, the proofs that transactions need to provide will shrink. After a while, most transactions won’t even need to attach any proofs, only the ones that touch some very old and “dusty” parts of the state.

We can keep doing it

I call this “relaunch” reGenesis, and it can be done regularly to ease the burden on the non-mining nodes. It also represents a less dramatic version of Stateless Ethereum.
Doing ReGenesis repetitively would simplify the architecture of Ethereum client implementations. It can mostly obviate the need for more advanced snapshot sync algorithms. If we perform ReGenesis every 1m blocks (which is roughly 6 months), then the state snapshots as well as blockchain files can be made available on BitTorrent, Swarm, IPFS. We cannot do it now because the state changes every 15 seconds, not every 6 months. If client implementations can cope with replaying 6 months worth of blocks, then we do not need very sophisticated snapshot algorithms. Consequently, complexity of Ethereum implementations will go down.

Downsides

I have not yet explored lots of them, but there are three that I already see:

  1. Users may need to have access to the full implicit state to create transactions. I actually see this as a fair compromise
  2. Users may need to repeat transactions (due to Dynamic State Access) until they eventually achieve the desired state transition
  3. Some roll-up techniques (that utilise the blockchain data for data availability) may break if the network effectively “archives” all the blocks prior to the ReGenesis

Please consider the discussion open!

14 Likes

Some historical context. In March-April 2019, Alexey first solved the block witness size problem by proposing that block witnesses are deduplicated against a cache of n recent blocks, where n is a fixed positive integer. Now Alexey is proposing to relax that n is fixed, instead basing the cache from a regenesis block number, and updating this regenesis block number every-so-often. Both of these proposed caches are variations of least-recently-used cache and are effective at removing dusty cobweb state from hard-drives.

3 Likes

Check out Tendermint State Sync:

1 Like

It’s a pragmatic solution.

Yes, rollups will need to be looked at, but I don’t think it’s a problem without solution: they can have snaphots as well (and if you really do 1K TPS you don’t want to replay 2 years of transactions to sync). So if we know that at time T+100 there will be a regenesis with the state at time T, then rollups-like applications need to do a snaphot between T and T+100 and they are fine.

Something to think about is logs/events. IIUC they are not pruned today.

2 Likes

@mandrigin has done an explainer for this post here: https://medium.com/@mandrigin/regenesis-explained-97540f457807

1 Like

Am I correct in thinking that the networking requirements will go up and down in a sawtooth fashion? That is, as the start of a new million blocks, nodes start with zero state. Transaction senders will need to send witnesses for all state. The network bandwidth to send blocks will be high. Over time, as more state is included, less and less state is needed. The bandwidth to send blocks will decrease. Then at the end of a million blocks, the nodes discard all of their state and start again.

1 Like

That seems correct indeed. It will be interesting to see if uncle rate is affected by this, though it probably won’t based on some previous experiments with CALLDATA size.

1 Like

The networking requirements be more even if a “moving window” state caching strategy was used. That is, nodes are expected to keep state for the past X blocks. Any state that hasn’t been touched in the past X blocks must have a witness for it.

1 Like

Yes, it is correct to expect that the traffic usage will go up after the reset and quickly level off after that. Good news is that we can backtest it.
And it is true that there is a similarity between ReGenesis and the “moving window” state caching strategy that I called “Semi-Stateless” in the past. The reason why ReGenesis might be preferable is because it is simpler and less error-prone to implement. Moving window state caching requires tracking for each state item when it was last touched. Evicting items very frequently (as frequently as every block) also adds non-negligible overhead. And all of that as consensus rules - I had not had a strong belief that this could ever be deployed.
With ReGenesis - the caching policy is very simple and blunt - which makes is more attractive at a cost of slight loss of elegance in the beginning of the reset cycle

I like it! It has some quirks to work out.

I think this slightly undersells the cost from the transaction sender’s point of view. :slight_smile: Transactions that get mined as failing due to an insufficient witness would be a major annoyance.

State availability becomes consensus-critical

Miners need to be able to include a transaction that has an insufficient witness as failed, to collect the gas. If miners couldn’t include under-witnessed transactions, it would be easy to DoS nodes by forcing them to verify a flurry of 8 Mgas transactions that are missing one witnessed value, but only after successfully executing 7.9 Mgas.

Getting your transaction included as failed with a missing witness sounds pretty awkward/painful for transaction builders near the epoch boundary. You build the transaction based on the pre-ReGenesis state, then while the transaction is pending, all state disappears at the boundary.

As prevention, transaction senders could pay to include the witness when the current tip is a few blocks behind the epoch boundary, just in case it takes a while to get included. How many blocks behind the epoch should senders consider doing this? It depends on gas pricing, which might fluctuate after you send the transaction. Depending on the transaction, it might add significant cost to witness the transaction “just in case”. (Side note: this seems mostly fixed when transactions get an “expire by block number” field.)

In addition to gas cost, I can imagine this being quite a challenge for in-browser web3 clients to support. If your node doesn’t have the cold state, the web3 client has to notify you and/or help you build the state proof. It would be an absolute requirement for every web3 client to support this, because sending transactions at the epoch boundary would be impossible without it.

Rolling ReGenesis

One mitigation would be a “Rolling ReGenesis,” where we only drop state that’s been cold for one entire epoch. State would be split into an aged database and a current database.

All EVM reads would cascade, so they attempt to read from the current and then aged database. If it is necessary to read from the aged database, then the value gets copied into the current db.

At every block_num % EPOCH_LENGTH == 0:

  1. The node deletes the aged database, which was formed one epoch ago.
  2. The node moves all current state into an aged database, which becomes read-only.
  3. The node creates a new current database.
  4. All transactions must now include a witness for any state that is not in the current or aged database.

This should significantly reduce the impact on active contracts and accounts. An account that sends a transaction once every EPOCH_LENGTH blocks would not need to attach a witness for a simple payment transaction to an active account. When you interact with a contract, the relevant storage has often been read or written recently, too.

We can pick EPOCH_LENGTH such that it’s typically unnecessary to provide a witness at any block. I like 1M, but we should let mainnet backtests guide us. A nice target might be something like 99.9% of transactions not needing any witness.

Committing to dropped state

Another question is whether to include some commitment to which parts of the state are available, and how often. At one end, we could include a commitment at every header that describes the ranges of keys that are cold. At the other end, we could have it be entirely implicit.

I think this decision is independent from whether we use the original “clean slate” ReGenesis or the Rolling ReGenesis.

Explicit Approach

This helps quickly identify if nodes form different beliefs about which state is available. This seems like the safest route, but requires more coding and longer ACD discussions.

Some options:

  • Add a new field to the header. For example, add a hash of an SSZ-encoded list of key prefixes.
  • Force a custom storage value into a hard-coded account

Implicit Approach

Each node tracks its own state availability from block to block. Consensus errors might not cause a chain split immediately. The chain would only split when transactions touch the part of the state that nodes disagree on.

It weakens the verification that fast sync (or beam sync) can do. It’s not possible for these sync approaches to identify whether state is available by asking peers to prove a commitment. These sync methods would have to assume that witnesses are always sufficient (or insufficient, if the transaction fails due to a bad witness).

Further, ReGenesis makes it awkward for fast-syncing/beam-syncing nodes to know when they have finished backfilling all cold state. In the status quo: if a hash in the state trie is missing, that just means that you don’t have it yet. With ReGenesis, you might not have it yet, or it might be a pre-RG value. That means we probably want a new type of response to GetNodeData where a peer indicates that it is unavailable state. There is no way to prove the response is true in the implicit approach.

So it seems to be worth paying the technical costs of explicit commitments to state availability.

Ethereum Name Service

Some storage is read often, but rarely read on-chain. ENS comes to mind. So its registry would typically get forced into cold state, despite being actively used.

This would significantly decrease the benefit of using an ENS name. ENS is supposed to be easier and friendlier to folks new to Ethereum. With a ReGenesis-activated chain, an address publisher has to worry about how to help the noob find the cold state to look up an ENS name. That tips the balance in favor of just using the hex address, I think.

Maybe this is just an acceptable cost of ReGenesis, but it seems worth getting input from the ENS org.

1 Like

That is the current idea. Transactions with insufficient/invalid witnesses are included, but fail to make state transitions. However, the correct parts of the their witnesses are added to the active state

1 Like