- Existent imbalanced data/gas spent on shards
- Migration of data from one shard to another for other technical/business reasons
- New data from off-chain sources / Eth1
For the purpose of solving this problem we will use dType (dType (Decentralized Type System) on Ethereum 2.0) and the fact that it now has a
count function for stored items. The
count function is not ideal, because it does not measure the actual storage cost, but it gives enough approximation for the current purpose.
Given a number of shards
shard_count, each loaded with a certain
shard_load and a number of dType storage contracts
dtype_count, each with a certain
dtype_load, we need to find a way to balance the loads across shards.
The result is a list of shards, each with a list of dtype IDs that should be added to that shard and the final data load of that shard.
The Python code for this solution is: (it can also be read at https://github.com/pipeos-one/dType/blob/f28bc63377f0565b1809c4dd4842242ce25dbd73/docs/research/Data_Load_Balancing_of_Shards.ipynb)
import random shard_count = 20 dtype_count = 50 # # Increase average_coef if there are not enough shards for all dtypes average_coef = 1.3 max_shard_load = 400 max_dtype_load = 2000 # Initialize random load values for shards and dtypes shard_loads_initial = list(enumerate([random.randrange(i, max_shard_load) for i in range(shard_count)])) dtype_loads_initial = list(enumerate([random.randrange(i, max_dtype_load) for i in range(dtype_count)])) shards = [ for i in range(shard_count)] next_index_s = 0 next_index_dt = 0 last_index_dt = len(dtype_loads_initial) - 1 last_index_s = len(shard_loads_initial) - 1 # Sort loads: ascending for shards, descending for dtypes shard_loads = sorted(shard_loads_initial, key=lambda tup: tup) dtype_loads = sorted(dtype_loads_initial, key=lambda tup: tup, reverse=True) # Calculate average count per shard average_load_shard = (sum(i for i in dtype_loads) + sum(i for i in shard_loads)) / shard_count average_load_shard *= average_coef print('average_load_shard', average_load_shard) # Move heavier than average dtypes on the least heaviest shards for i, dload in dtype_loads: if dload >= average_load_shard: shards[next_index_s].append(i) next_index_s += 1 next_index_dt += 1 # Pair heaviest dtypes with lightest shards # and add as many light dtypes on top, as possible for i, dload in dtype_loads[next_index_dt:]: if last_index_s < next_index_s: print('Needs more shards. Increase average_coef'); break # Add the next heaviest dtype to the next lightest shard shards[next_index_s].append(i) # Add as many light dtypes as the average_load_shard permits load = shard_loads[next_index_s] + dload + dtype_loads[last_index_dt] while last_index_dt > next_index_dt and load <= average_load_shard: shards[next_index_s].append(dtype_loads[last_index_dt]) last_index_dt -= 1 load += dtype_loads[last_index_dt] next_index_s += 1 next_index_dt += 1 if next_index_dt > last_index_dt: break print('(shard_index, shard_load, dtype_indexes)') final_shards = [(shard_loads[x], sum([dtype_loads_initial[dtype_index] for dtype_index in shards[x]]), shards[x]) for x, _ in enumerate(shards)] print('final_shards', final_shards)
We may need more data about gas/storage costs for this algorithm to be effective. Extended usage of dType will also improve the outcome.