You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
With Segment Replication enabled when a new replica shard is recovered/created/added to an existing cluster, then replica shards don't get a checkpoint (latest segments) from primary until an operation is performed on index. So, replica will fall behind until a new operation happens on index.
Explanation:
-> In Ideal Segment Replication scenario, when a refresh happens on index and if a new reference is opened (happens only after some operation on index) then primary shard publishes checkpoint to replicas and send segment files for replica's to catch up.
-> But in case of new replica shards added to existing cluster, replicas don't receive any checkpoint from primary until an operation (index/update/delete) happens on index. Even if we manually refresh the index, a new reference will not opened until an operation (index/update/delete) happens on index and checkpoint is never published from primary to replica. So replica will fall behind.
To Reproduce
Steps to reproduce the behavior:
Start a cluster and create a new index with a primary shard.
Insert some documents into the index
Add new replica shard to existing cluster.
Search for docs inserted in step 2 on new replica.
Search on new replica will return empty even though documents are inserted successfully and present on primary.
Expected behavior
-> Search for documents on replica should not be empty if they are successfully inserted before.
Expected Solution
-> In segment replication when a new replica shard is added to existing cluster, it goes through process of peer recovery and finally mark it as STARTED.
-> After peer recovery is completed and before shard is marked as STARTED, we have to force new replica shard to start a round of replication (segment replication) to fetch latest segment files from primary shard. Then after this replication event is completed then we should mark the shard as STARTED.
-> This way replica shard will have all the latest segment files before it is STARTED and ready to be searched.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
OS: [e.g. iOS]
Version [e.g. 22]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
@Rishikesh1159 This is because phase 1 of recovery copies from the primary's latest safe commit. Any segments created after that safe commit will not be copied with recovery & will be copied on the first replication event after the replica is started.
I think its reasonable to force a round of replication here so we are not dependent on the primary receiving consistent index load & refreshing. I think we could do this by triggering a round of segrep when RecoveryListener resolves before IndicesClusterStateService marks the shard as active.
Thanks @mch2, yes what you said is correct. Forcing a round of replication while recovering/creating new replica shard makes sense and would solve this bug. I see two possible solution to force segment replication during recovery:
Another possible way is to trigger a publish checkpoint from primary when the finalize recovery step of replica shard is completed. We can this below piece of code block after this line in RecoverySourceHandler:
Describe the bug
With Segment Replication enabled when a new replica shard is recovered/created/added to an existing cluster, then replica shards don't get a checkpoint (latest segments) from primary until an operation is performed on index. So, replica will fall behind until a new operation happens on index.
Explanation:
-> In Ideal Segment Replication scenario, when a refresh happens on index and if a new reference is opened (happens only after some operation on index) then primary shard publishes checkpoint to replicas and send segment files for replica's to catch up.
-> But in case of new replica shards added to existing cluster, replicas don't receive any checkpoint from primary until an operation (index/update/delete) happens on index. Even if we manually refresh the index, a new reference will not opened until an operation (index/update/delete) happens on index and checkpoint is never published from primary to replica. So replica will fall behind.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
-> Search for documents on replica should not be empty if they are successfully inserted before.
Expected Solution
-> In segment replication when a new replica shard is added to existing cluster, it goes through process of peer recovery and finally mark it as STARTED.
-> After peer recovery is completed and before shard is marked as STARTED, we have to force new replica shard to start a round of replication (segment replication) to fetch latest segment files from primary shard. Then after this replication event is completed then we should mark the shard as STARTED.
-> This way replica shard will have all the latest segment files before it is STARTED and ready to be searched.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: