Thank you so much for this write up, and for giving this a good name. I will try to describe my (currently FLAWED) idea on how the state can be enumerated and split up into parts to satisfy the three requirements you listed above. I will repeat them for easier reading, and will refine the second, “zero coordination” requirement:
-
Minimize duplication: During a single revolution of the MGR there should be very little or near zero duplication of data.
-
Zero coordination: There should be no coordination required for clients to know what part of the state is being distributed at any given time. Refinement: Only seeders (node that already have the entire state) will have zero-coordination requirement. The knowledge of the state will be used as a substitution for the ability to coordinate.
-
Flexible with-respect-to state size: For a chain with a very small state, each revolution of the MGR should be very quick. For a chain with very large state, each revolution of the MGR would be longer. Refinement I have not figured out how to make this happen yet, but it should be possible to specify too.
Each seeder will maintain (in their database) two instances of the data structure, which I will call “tree enumeration”. The specific items stored in this structure are implementation dependent. They have a form “key prefix => number of leaves with such prefix”. Implementations do not need to have an item for each possible prefix, but, for example only consider prefixes with even number of nibbles, or similar approaches (needs to be tested). Such data structure should be easy to maintain as the state data get modified. Each insertion or deletion of state key-value pair will result in the update of logarithmic number of prefixes (most of which would be cached in memory).
One of the instances of such tree enumeration structure always tracks the current state, as the chain progresses. Another instance of tree enumeration is “frozen” at the beginning of each Merry-Go-Round cycle. For example, if the chosen cycle length is chosen to be 4000, then the second instance of the tree enumeration will be frozen as of every block number which is multiple of 4000.
The “frozen” tree enumeration is used by seeders to calculate which part of the state needs to be synced at which block interval. Since it will be “frozen” for the entire duration of the cycle, the seeders can calculate (and arrive at the same result without extra coordination) how many state leaves need to be synced at each block interval. Once the cycle is over, the first, “current” enumeration is copied (THIS THE FLAW - I do not know how to copy enumeration efficiently) into the second, “frozen” enumeration, and the cycle starts over.
In order to prevent the state data from getting stale during the Merry-Go-Round cycle, we always combine state piece for the current block interval with the block witness for the corresponding block. That way, all the state data gets automatically “patched” at the leechers persist the packets they receive.
Some rough thoughts on possible issues:
- What if block intervals are too short for the state piece? The simplest solution is for the seeders to start sending the new piece once the previous piece has been sent.
- How to adjust the algorithm to the increasing state size? Currently I am thinking of not making the cycle duration (in number of blocks) the algorithm’s parameter, but instead the number of state leaves that are supposed to be synced during one block interval, and calculate the duration from it. This however, has an “arithmetic” problem of overlapping sync cycles. For example, when we transition from cycle duration 4000 to 4001, for example, the number which is 0 (mod 4000) and which is 0 (mod 4001) could be quite far apart. Do we need to make the state un-syncable while we get to the block 0 (mod 4001)?