Authors: pop
This post is based on the current spec of PeerDAS
Edited after @leobago’s comment
This post will tell you the expected number of peers each node must connect to in order to cover all the columns to do peer sampling successfully.
Simulation
In PeerDAS, at the time of writing, the number of peers you are expected to connect to is solely calculated by CUSTODY_REQUIREMENT
and DATA_COLUMN_SIDECAR_SUBNET_COUNT
. We write the following simulation to count how many peers you need to connect.
import math
import random
def peer_count(N, C):
num_trials = 1000
counts = []
for trial in range(num_trials):
covered = set()
count = 0
while len(covered) != N:
selected = set(random.sample(range(N), C))
if len(selected.difference(covered)) > 0:
covered = covered.union(selected)
count += 1
counts.append(count)
return sum(counts)/len(counts)
where N and C are DATA_COLUMN_SIDECAR_SUBNET_COUNT
and CUSTODY_REQUIREMENT
, respectively.
What this function does is to simulate the node discovery mechanism. What it does in the real world is that the node will discover new peers and find out what subnets the peer is taking custody of. It will continue discovering new peers until all the subnets are covered by those peers. What’s important is that the node will not keep a peer that doesn’t add any more coverage to the subnets. So when N=32 and C=1, peer_count
will be exactly 32.
The set of subnets each peer is supposed to custody is determined by its Peer ID. However, in our simulation, such set is just randomized, which shouldn’t be different from the real world.
In order to get the expected number, we run 1,000 simulations and get only the mean.
Result
Trivially, if we fix C and make only N variable, peer_count
will increase as N increases since, when there are more subnets to cover, the more peers you should have. And, if we fix N and make only C variable, peer_count
will decrease as C increases since, when each peer custodies more, the number of peers you should have should be lower. Even if it’s trivial, we did the simulation for completeness.
You can see that with N=128 and C=16, you need 25.9 peers on average (so it’s far different from N/C=8).
Now, let’s consider an interesting non-trivial scenario, if the ratio (N/C) remains unchanged, do you think peer_count
will remain unchanged? That is, do you think peer_count(64, 8)
and peer_count(128, 16)
will be the same? The answer is no. As shown below, the higher N is, the higher peer_count
is.
So the expected number of peers you will have cannot be easily calculated with N and C. You need to do the simulation to tune the parameters.