-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not cancel ongoing recovery for noop copy on broken node #48265
Conversation
Pinging @elastic/es-distributed (:Distributed/Allocation) |
server/src/test/java/org/elasticsearch/gateway/ReplicaShardAllocatorIT.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with using a simple boolean is that it will suppress no-op recoveries to other nodes that might succeed. I think we should track which node failed the no-op allocation.
What would happen if instead we kept track of all the nodes on which this shard failed during initialization (whether no-op or otherwise) and ignored them all in the ReplicaShardAllocator
? I am thinking particularly of #18417 - with a list of past failures we could also prefer to avoid those nodes in the BalancedShardsAllocator
.
@DaveCTurner Thanks for looking. We would make better decisions with a list of failed nodes. We need to make sure that the list is bounded. I used a boolean to avoid adding more load to the cluster state. |
I think a |
@DaveCTurner Good point. I will apply your feedback. Thank you. |
server/src/main/java/org/elasticsearch/gateway/ReplicaShardAllocator.java
Outdated
Show resolved
Hide resolved
@DaveCTurner It's ready again. Can you please take another look? Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review @dnhatn, I thought I submitted this a few days ago but apparently not.
server/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/gateway/ReplicaShardAllocator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @dnhatn, I left a few comments to consider.
server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/gateway/ReplicaShardAllocator.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/gateway/ReplicaShardAllocatorIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of responses to threads in an earlier review and duplicated them here.
server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java
Outdated
Show resolved
Hide resolved
@DaveCTurner I've addressed your feedback. Would you mind taking another look? Thank you. |
Thanks @gwbrown. @elasticmachine run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnhatn this LGTM
@DaveCTurner @henningandersen @original-brownbear Thanks for reviewing :). |
This change fixes a poisonous situation where an ongoing recovery was canceled because a better copy was found on a node that the cluster had previously tried allocating the shard to but failed. The solution is to keep track of the set of nodes that an allocation was failed on so that we can avoid canceling the current recovery for a copy on failed nodes. Closes #47974
This change fixes a poisonous situation where an ongoing recovery was canceled because a better copy was found on a node that the cluster had previously tried allocating the shard to but failed. The solution is to keep track of the set of nodes that an allocation was failed on so that we can avoid canceling the current recovery for a copy on failed nodes. Closes #47974
Today the replica allocator can repeatedly cancel an ongoing recovery for a copy on a broken node if that copy can perform a noop recovery. This loop can be happen endlessly (see testDoNotInfinitelyWaitForMapping). We should detect and avoid canceling in this situation.
Closes #47974