History, state, and asynchronous accumulators in the stateless model

JustinDrake · February 20, 2018, 10:18pm

The research has progressed somewhat since this post. The best accumulator design for logs we know of is the double-batched Merkle log accumulator. Below are reasons why using this particular log accumulator (for history) is cheaper than using a Patricia trie (for state):

Log witnesses do not have to be updated (only extended, once). Compare this to state witnesses which need to be updated for every insertion and deletion to the trie. A significant simplification is that, unlike state witnesses, there is no need for validators to auto-update log witnesses. Logs are also the natural basis for receipt-based asynchronous cross-shard communication, so having a protocol-level log accumulator helps communicating receipts across shards.
Notice that the size of the log witnesses is log(#objects in a single collation), whereas the size of state trie is log(#total state objects). For concreteness, let’s assume the bottom buffer of the double-batched accumulator has 1024 hashes. In a single collation we can expect on the order of 16,000 logs, so the size of a single log witness will be about (14 + 10) * 32 = 768 bytes. We can expect the state trie to quickly grow to a billion objects, so a single state witness will quickly reach 30 * 32 = 960 bytes.
With log shards and custom execution models (e.g. this one) you don’t have to execute transactions onchain. My guess is that the cost of onchain execution in the EVM is about 10x-100x greater than the cost of a state-minimised log-based equivalent.

Point 1) is an important practical simplification in the context of stateless clients. Point 3) is a significant opportunity for scalable apps (either stateless or stateful) that benefit from onchain data availability without bearing the costs of onchain execution.