-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of readonly filesystems #45286
Comments
Pinging @elastic/es-distributed |
@DaveCTurner thanks for taking this up. Is this fix already prioritized? Do you think I can pick any of the task since I have some context on the issue based on our discussions |
This work is not yet on our roadmap, but if you have ideas for how to proceed then it'd be good to see them. Note that I'll be unavailable for the next few weeks so don't expect prompt feedback from me. |
#25591 feels relevant here - it covers documenting what is a cluster's behaviour when a filesystem crashes but the node(s) remains operational. |
Indeed, solving this issue will also resolve #25591 since here we are aiming to exclude the cases where the filesystem is unusable but the node remains in the cluster. |
For completeness, "readonly" also includes cases such as |
@DaveCTurner, Just a heads up I would be raising a PR for the issue, this week hopefully. |
Sounds reasonable, yes. Or else we block sending pre-votes (mentioned in the OP) and require a pre-vote from the local node before |
Actually I think I prefer the latter idea: require a pre-vote from the local node. A side-effect of receiving a pre-voting request is to call Edit: I'm unsure again; I see advantages on both sides. I think it's a minor point and hopefully easily adjusted later, so let's not dwell on it. |
…ite to all paths and emits a stats is_writable as a part of node stats API. FsReadOnlyMonitor pulls up the stats and tries to remove the node if not all paths are found to be writable. Addresses elastic#45286.
Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster. With this commit we periodically verify that nodes' filesystems are writable. If a node fails these writability checks then it is removed from the cluster and prevented from re-joining until the checks start passing again. Closes elastic#45286
Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster. With this commit we periodically verify that nodes' filesystems are writable. If a node fails these writability checks then it is removed from the cluster and prevented from re-joining until the checks start passing again. Closes #45286 Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>
Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster:
This issue is to improve Elasticsearch's behaviour when a node becomes readonly:
The text was updated successfully, but these errors were encountered: