You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When some masters need to be deleted the following algorithm is applied:
calculateChanges calculates how many master nodes should be removed
CalculatePerformableChanges returns how many master nodes can be removed
Then for each master node to be removed:
Compute and apply new quorum size without the master node to be removed
Schedule master pod deletion
In some rare cases it can lead to a split brain situation, for instance:
Initial situation: 4 masters in a clusters, 2 masters need to be removed
minimum_master_nodes is decrease from 3 to 2
2 masters pods are scheduled to be deleted at the K8S level
Since minimum_master_nodes is set to 2 while there is still 4 masters running there is a small chance of having a split brain situation between steps 2. and 3.
This situation is mostly true for Zen1, with Zen2 masters are excluded before to be deleted.
The algorithm depicted above is the only way to move from two masters to one node, it is a special case which is inherently unsafe.
Some improvements can be done here:
We should never delete more than half of the masters (at least with Zen1 and with the exception of the special case of the two to one master)
If there are some dedicated masters, maybe we should treat them more carefully than the other nodes and do not blindly apply the maxUnavailable setting.
We should never go down to 1 master (spof)
The 2 latest points are also true for Zen 2 and might have a higher priority since we are moving to ES 7.
The text was updated successfully, but these errors were encountered:
We now add and remove one master node at a time (wip for rolling upgrades, other issues have been opened) and wait for our cache of resources to match expectations, to properly handle zen1 and zen2 settings.
Closing this issue in favor of keeping open #1710, #1628, #1693.
When some masters need to be deleted the following algorithm is applied:
calculateChanges
calculates how many master nodes should be removedCalculatePerformableChanges
returns how many master nodes can be removedThen for each master node to be removed:
In some rare cases it can lead to a split brain situation, for instance:
minimum_master_nodes
is decrease from 3 to 2Since
minimum_master_nodes
is set to 2 while there is still 4 masters running there is a small chance of having a split brain situation between steps2.
and3.
This situation is mostly true for Zen1, with Zen2 masters are excluded before to be deleted.
The algorithm depicted above is the only way to move from two masters to one node, it is a special case which is inherently unsafe.
Some improvements can be done here:
maxUnavailable
setting.The 2 latest points are also true for Zen 2 and might have a higher priority since we are moving to ES 7.
The text was updated successfully, but these errors were encountered: