## Scenarios

• Existent imbalanced data/gas spent on shards
• Migration of data from one shard to another for other technical/business reasons
• New data from off-chain sources / Eth1

For the purpose of solving this problem we will use dType (dType (Decentralized Type System) on Ethereum 2.0) and the fact that it now has a `count` function for stored items. The `count` function is not ideal, because it does not measure the actual storage cost, but it gives enough approximation for the current purpose.

## Problem Statement

Given a number of shards `shard_count`, each loaded with a certain `shard_load` and a number of dType storage contracts `dtype_count`, each with a certain `dtype_load`, we need to find a way to balance the loads across shards.

The result is a list of shards, each with a list of dtype IDs that should be added to that shard and the final data load of that shard.

## Solution

The Python code for this solution is: (it can also be read at https://github.com/pipeos-one/dType/blob/f28bc63377f0565b1809c4dd4842242ce25dbd73/docs/research/Data_Load_Balancing_of_Shards.ipynb)

``````import random

shard_count = 20
dtype_count = 50

# # Increase average_coef if there are not enough shards for all dtypes
average_coef = 1.3

# Initialize random load values for shards and dtypes
shards = [[] for i in range(shard_count)]

next_index_s = 0
next_index_dt = 0

# Sort loads: ascending for shards, descending for dtypes

# Calculate average count per shard

# Move heavier than average dtypes on the least heaviest shards
shards[next_index_s].append(i)
next_index_s += 1
next_index_dt += 1

# Pair heaviest dtypes with lightest shards
# and add as many light dtypes on top, as possible
if last_index_s < next_index_s:
print('Needs more shards. Increase average_coef');
break

# Add the next heaviest dtype to the next lightest shard
shards[next_index_s].append(i)

last_index_dt -= 1

next_index_s += 1
next_index_dt += 1
if next_index_dt > last_index_dt:
break

final_shards = [(shard_loads[x][0], sum([dtype_loads_initial[dtype_index][1] for dtype_index in shards[x]]), shards[x]) for x, _ in enumerate(shards)]

print('final_shards', final_shards)
``````

## Conclusions

We may need more data about gas/storage costs for this algorithm to be effective. Extended usage of dType will also improve the outcome.

1 Like

If a shard is “overloaded” with just 2 heavy smart contracts that are made to interact a lot each others. I guess that the 2 smart contracts may end in 2 different shards? Then the interaction will become more complex, no? will it be transparent or the smart contracts will have to be somehow redesigned?

There are many graph partitioning algorithms available, so the main problem here seems to be coming up with a cost function for our particular sharded blockchain situation. One simple cost function would be to combine storage & gas usage into their actual dollar cost per day. Also, maintaning a record of the cross-shard communication pattern will let us place frequently communicating accounts in the same shard.