Proper disk i/o gas pricing via LRU cache

Good concerns.

Could be made much simpler than LRU for the consensus part, like an append only approach with "cache generations. E.g. You have 3 generations of 100mb each. When new slot is touched, you append to current generation. When generation is full (100mb is reached) you start a new generation and delete the oldest of the 3 generation. Anything present in these 3 arrays is cheaper gas wise. This way you have an append only approach, which is O(1) to update and can easily store using hash chain in state tree and for light clients. Pretty straightforward for clients.

As many mentioned, clients can chose their own caching implementation, we just need to include as little as possible as part of the consensus.

As many mentioned as well, I think we can start with a much smaller “consensus cache”, like 100 mb to 200 mb. Clients already have a caching layer they need to rebuild when node restarts, so I don’t think this is much of a concern.

1 Like

The caching clients currently have is not consensus critical so it can be dropped and lazily repopulated as reads happen without any problems. This changes the caching to a consensus datastructure, which needs to be in sync across all nodes in the network. Shrinking the size may make this more feasible, but still not trivial even at hundreds of MB.

2 Likes

A potentially simpler implementation: Just keep things warm longer

  • Any slot accessed in the past 20 blocks is considered “warm” and costs less gas to read
  • Client implementations would need to track the last-block-access for only thousands of slots (potentially just kilobytes)
  • This would not lock clients into a particular caching architecture
  • Based on the above data I think it would cover the majority of reads, but need to validate this explicitly

Could also apply this logic to keeping accounts warm instead of just keeping slots warm

How much gas is spent on cold reads?

Still very relevant is the question of how much eth gas is actually spent on reads. A rough estimate using the above data:

(1564.6 mean cold read slots per block - 450.2 mean written slots per block) = 1114.4 mean cold reads per block
(1114.4 reads) (2100 gas per cold read) = 2.34M gas per block
2.34M / 15M = 15.6% of ethereum gas is spent on cold reads

(someone please double check this math)

1 Like

I do like the simplicity of time based solutions, but clients would still need to build an in memory list of storage slot every restart. The bigger issue with any time based approach rather than data size approach is that if we need to consider how big can the cache become if an attacker was to fill these N blocks with a lot new storage slots access. The cache size is dynamic effectively, which could cause some problems for clients making assumptions about cache size (e.g. RAM saturation).

20 blocks may be small enough, but how much would it allow us to increase the gas limit without running the risk of a “cache explosion attack”?

1 Like

I’m curious. Doesn’t this create a small number of large, very active contracts that will come to dominate the cache thereby creating two tiers? Smart contracts that are always in the cache (and therefore always cheaper) vs. those that are rarely in the cache and therefore more expensive. Doesn’t this establish winners-take-all in the long run? Won’t it become increasingly more difficult for new entrants to “break in”? That seems like a bad idea to me.

1 Like

My hot take: make it a single block. First read/write of a slot for a block is more expensive than other reads/writes.

Ironically, this kind of setup does the inverse to larger caches: (generally) pushes costs onto MEV bots and away from normal users (assuming tx is some form of sandwiching, JIT, frontrunning, liquidations, etc).

Larger caches not only do have a centralizing force at the app layer as @quickBlocks mentions, it also makes it cheaper for higher frequency MEV bots pushing more blockspace to them.

2 Likes

I share some of the reservations about this idea that others have expressed above (gas estimation complexity, creation of two tiers of contracts) but, withholding that, I have some thoughts on potential specification and implementation details:

Currently, in PBS mode, go-ethereum maintains 128 diff-layers in memory. This is a cache of all the state and storage trie nodes that have been mutated in the last 128 blocks.

Only once dirty nodes have settled to the 129th layer are they written to disk. On node shutdown the 128 in-memory layers are packed into a single database “journal” entry to be easily recovered on restart.

Perhaps this existing caching scheme could be used? Any storage operation performed on the state in these diff-layers could be discounted. A 128 block range covers at least 60-85% of the read and write intervals from the data @storm provided above. While it does not have a fixed byte size, this memory usage is already expected by the majority client and is small enough to be of little concern elsewhere. The amount of additional complexity introduced to geth would be absolutely minimized.

The downsides are that

  1. This cache does not include code, it only includes data embedded directly in trie nodes (balances, nonces, code hashes, storage roots, and storage slot value).
  2. More importantly, this cache only considers state which has been mutated. It does not cache state that was recently accessed but not mutated. So it would be a “Least Recently Mutated” cache if you will.

A concern which I haven’t seen expressed yet is how this would affect block builders in particular (whether or not the building process is separated from the proposer).

Builders execute different sets of transactions to find and build the most economically optimal valid block possible. Without requiring comprehensive transaction access lists they can’t use static analysis to factor in ahead of time the access pattern of the transactions and how that relates to the in memory cache. They only discover the access pattern while speculatively executing transactions. Their access pattern will be all over the place and the eventual access pattern (set of txs) that is proposed does not reflect all the I/O work the builder performed to get there.

Furthermore if the goal is to eventually transition to stateless validators using block proposers that submit block witnesses then, in that context, the only ones ever accessing state on disk will be block producers. Validators will be gossiped block witnesses (all state required to execute the block’s set of transactions) along with the block they need to validate. The state they need- no matter when it was last accessed- will be injected directly into their memory and never need to touch the disk.

1 Like