-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot in ABORTED state after rolling restart of nodes #22000
Comments
When retrieving the snapshot status, that action looks in the cluster state and retrieves the current snapshots. So its very strange that the snapshot status is showing an |
I retrieved both cluster state and snapshot status after the rolling restart fully completed and the cluster health was back to green. |
If you didn't use Did you explicitly try deleting the snapshot before (or after) the rolling restart? |
@desagar BTW, you inadvertently pasted us the entire cluster state which included your repository credentials. I removed the link from the ticket, but you should also update your security settings immediately so as to not have your repository account compromised. |
Thank you for removing the link. |
I attempted deleting the snapshot prior to the restart, and at that point it was just hanging. That could possibly have been due to the bug in the plugin - I did not take a thread dump at that point so I am unsure. I just attempted deleting it again after the restart, and the delete fails with since the snapshot is not fully written to the repository. However, the delete apparently removed the aborted snapshot, and it is no longer present in the cluster state output. Snapshot status reports that the snapshot is missing. |
You should be able to take snapshots now, correct? I believe I know what is happening. When you issued a delete snapshot request, the master node marked the snapshot as In this case, the full cluster restart is your main option. We opened #21759 to look at better options for aborting, and once that is completed, situations like the one you encountered would be properly handled. I'm closing this for now. If you encounter other issues, please feel free to reopen. Thank you for reporting this! |
Elasticsearch 6.0 removes support for lenient booleans (see elastic#22000). With this commit we deprecate all usages of non-strict booleans in Elasticsearch 5.x so users can already spot improper usages. Relates elastic#22000 Relates elastic#22696
Elasticsearch version: 2.3.1
Plugins installed: [a custom repository plugin]
JVM version: 1.8.0_101
OS version: Oracle Enterprise Linux 6 with Redhat kernel
Description of the problem including expected versus actual behavior:
We have a 2 node Elasticsearch cluster, and we have installed a custom repository plugin that is used for storing Elasticsearch snapshots. The custom plugin has a bug that occasionally causes it to hang indefinitely waiting for a connection to the back-end store for our snapshots. When this happened, we performed a rolling restart of the Elasticsearch cluster to clear the hanging thread. After the restart, we ended up with a state where the snapshot is in ABORTED status according to ES cluster state. However when querying the snapshot using the snapshot API, it reports that the snapshot is still in progress. As a result we are unable to take any further snapshots.
According to this link, snapshots in ABORTED status should be cleaned up when the master node is restarted.
Steps to reproduce:
Working on a reproducer - will provide I have one.
Provide logs (if relevant):
Please see attached files of cluster state and snapshot status.
snapshot_status.txt : output of /_snapshot/ppmgmt1645/snapshot_20161130_042001?pretty=true
The text was updated successfully, but these errors were encountered: