Sharding in Peerbit is based on the content being committed. Each commit will be added to a log that represents some meaningful state, like an image or a document.
Every change in Peerbit are committed with explicit links to the content it depends on. This means that by following the dependencies to the root, we can get the full state
Every graph gets an graph id (GID)
When graphs merge (two independent states becomes dependent) the new graph will be named the same as the graph with the longest chain
Now this is important to background in order to understand how replicators/content leaders are chosen based on new changes.
Imagine the commit above is made, so that the merged graph gets the label "DOG", how can we choose replicators in a fully connected network in a simple random way? (By being a replicator you have the task of storing the log and potentially also make it searchable for peers)
The first thing we need to do is to hash the labels of the peers (IPFS IDs) and the DOG label with a hash function (more details on this function later)
Secondly put all the hashes into a list and sort it.
Now we lookup the labels from the hashes again
Now if we want 2 replicas of our content we can choose that the replicators are the 2 next elements in the list
The hash function is seeded with the checksum of the content itself, so it changes for every new commit. This means that the results would differ if the content changes. E.g.
When peers leave and join we need to redo leader selection for the heads of our content, this is because there might be replicators that no longer are online, or there might be new peers that should be replicators instead of someone else since the outcome of the algorithm is dependent on what peers participate in the replication process.
Similarly graphs merge (like when the CAT and DOG became a DOG), this is functionally equivalent to that replicators of CAT stop to replicate and that DOG replicators start to replicate a larger log.
The implementation can be found here (findLeaders method)