Improve handling of readonly filesystems #45286

DaveCTurner · 2019-08-07T14:15:27Z

Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster:

a readonly master-eligible node repeatedly leaves and rejoins the cluster, and may trigger elections by offering a vote that it cannot give
a readonly data node may be assigned shards (e.g. due to rebalancing) which will then immediately fail (cf Prevent allocating shards to broken nodes #18417). Any shards that are already assigned to the node may also eventually fail too (e.g. when syncing retention leases triggers some IO).

This issue is to improve Elasticsearch's behaviour when a node becomes readonly:

verify that the data paths are writeable before joining a cluster
verify that the data paths are writeable before offering a pre-vote
periodically verify that data paths are writeable, and leave the cluster if they are not

elasticmachine · 2019-08-07T14:15:29Z

Pinging @elastic/es-distributed

Bukhtawar · 2019-08-08T10:59:53Z

@DaveCTurner thanks for taking this up. Is this fix already prioritized? Do you think I can pick any of the task since I have some context on the issue based on our discussions

DaveCTurner · 2019-08-08T11:54:19Z

This work is not yet on our roadmap, but if you have ideas for how to proceed then it'd be good to see them. Note that I'll be unavailable for the next few weeks so don't expect prompt feedback from me.

pugnascotia · 2019-09-12T15:53:27Z

#25591 feels relevant here - it covers documenting what is a cluster's behaviour when a filesystem crashes but the node(s) remains operational.

DaveCTurner · 2019-09-12T16:28:32Z

Indeed, solving this issue will also resolve #25591 since here we are aiming to exclude the cases where the filesystem is unusable but the node remains in the cluster.

DaveCTurner · 2020-02-10T14:26:11Z

For completeness, "readonly" also includes cases such as Disk quota exceeded.

Bukhtawar · 2020-02-10T18:32:51Z

@DaveCTurner, Just a heads up I would be raising a PR for the issue, this week hopefully.
One clarification though, shouldn't master eligible readonly nodes also be blocked from starting a pre-voting round

DaveCTurner · 2020-02-10T19:39:40Z

shouldn't master eligible readonly nodes also be blocked from starting a pre-voting round

Sounds reasonable, yes. Or else we block sending pre-votes (mentioned in the OP) and require a pre-vote from the local node before PreVotingRound#handlePreVoteResponse starts the election.

DaveCTurner · 2020-02-10T19:43:00Z

Actually I think I prefer the latter idea: require a pre-vote from the local node. A side-effect of receiving a pre-voting request is to call Coordinator#updateMaxTermSeen which is best to call as early as possible. If we delayed that until a node stopped being read-only then it could trigger another election in an otherwise healthy cluster, which would be a bit weird.

Edit: I'm unsure again; I see advantages on both sides. I think it's a minor point and hopefully easily adjusted later, so let's not dwell on it.

…ite to all paths and emits a stats is_writable as a part of node stats API. FsReadOnlyMonitor pulls up the stats and tries to remove the node if not all paths are found to be writable. Addresses elastic#45286.

Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster. With this commit we periodically verify that nodes' filesystems are writable. If a node fails these writability checks then it is removed from the cluster and prevented from re-joining until the checks start passing again. Closes elastic#45286

Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster. With this commit we periodically verify that nodes' filesystems are writable. If a node fails these writability checks then it is removed from the cluster and prevented from re-joining until the checks start passing again. Closes #45286 Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>

pugnascotia mentioned this issue Sep 12, 2019

[DOCS] Document cluster behavior when a file system crashes but node remains operational #25591

Closed

DaveCTurner mentioned this issue Oct 15, 2019

Do not age out no-op peer recovery retention leases #47905

Open

This was referenced Feb 23, 2020

Adds resiliency to read-only filesystems #45286 #52680

Merged

Support for timeout in stats API #52616

Closed

rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020

DaveCTurner closed this as completed in ef4cdb0 Jul 7, 2020

DaveCTurner mentioned this issue Jul 7, 2020

Remove nodes with read-only filesystems (#52680) #59138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of readonly filesystems #45286

Improve handling of readonly filesystems #45286

DaveCTurner commented Aug 7, 2019 •

edited

Loading

elasticmachine commented Aug 7, 2019

Bukhtawar commented Aug 8, 2019

DaveCTurner commented Aug 8, 2019

pugnascotia commented Sep 12, 2019

DaveCTurner commented Sep 12, 2019 •

edited

Loading

DaveCTurner commented Feb 10, 2020

Bukhtawar commented Feb 10, 2020 •

edited

Loading

DaveCTurner commented Feb 10, 2020

DaveCTurner commented Feb 10, 2020 •

edited

Loading

Improve handling of readonly filesystems #45286

Improve handling of readonly filesystems #45286

Comments

DaveCTurner commented Aug 7, 2019 • edited Loading

elasticmachine commented Aug 7, 2019

Bukhtawar commented Aug 8, 2019

DaveCTurner commented Aug 8, 2019

pugnascotia commented Sep 12, 2019

DaveCTurner commented Sep 12, 2019 • edited Loading

DaveCTurner commented Feb 10, 2020

Bukhtawar commented Feb 10, 2020 • edited Loading

DaveCTurner commented Feb 10, 2020

DaveCTurner commented Feb 10, 2020 • edited Loading

DaveCTurner commented Aug 7, 2019 •

edited

Loading

DaveCTurner commented Sep 12, 2019 •

edited

Loading

Bukhtawar commented Feb 10, 2020 •

edited

Loading

DaveCTurner commented Feb 10, 2020 •

edited

Loading