Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Elasticsearch update strategy change budget & groups #1210

Merged
merged 1 commit into from
Jul 12, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions docs/elasticsearch-spec.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -100,3 +100,141 @@ Example to create a Kubernetes TLS secret with a self-signed certificate:
$ openssl req -x509 -newkey rsa:4096 -keyout tls.key -out tls.crt -days 365 -nodes
$ kubectl create secret tls my-cert --cert tls.crt --key tls.key
----

[id="{p}-update-strategy"]
=== Update strategy

The Elasticsearch cluster configuration can be updated at any time:

* add new nodes
* remove some nodes
* change Elasticsearch configuration
* change pod resources (example: memory limits, cpu limit, environment variables, etc.)

On any change, ECK reconciles Kubernetes resources towards the desired cluster definition. Changes are done in a rolling fashion: the state of the cluster is continuously monitored, to allow addition of new nodes and removal of deprecated nodes.

[id="{p}-change-budget"]
==== Change budget

No downtime should be expected when the cluster topology changes. Shards on deprecated nodes are migrated away so the node can be safely removed.

For example, in order to mutate a 3-nodes cluster with 16GB memory limit on each node to a 3-nodes cluster with 32GB memory limit on each node, ECK will:

1. add a new 32GB node: the cluster temporarily has 4 nodes
2. migrate data away from the first 16GB node
3. once data is migrated, remove the first 16GB node
4. follow the same steps for the 2 other 16GB nodes

The cluster health stays green during the entire process.
By default, only one extra node can be added on top of the expected ones. In the example above, a 3-nodes cluster may temporarily be composed of 4 nodes while data migration is in progress.

This behaviour can be controlled through the `changeBudget` section of the Cluster specification `updateStrategy`. If not specified, it defaults to the following:

[source,yaml]
----
spec:
updateStrategy:
changeBudget:
maxSurge: 1
maxUnavailable: 0
----

* `maxSurge` specifies the number of pods that can be added to the cluster, on top of the desired number of nodes in the spec during cluster updates
* `maxUnavailable` specifies the number of pods that can be made unavailable during cluster updates

The default of `maxSurge: 1; maxUnavailable: 0` spins up an additional Elasticsearch node during cluster updates.
It is possible to speed up cluster topology changes by increasing `maxSurge`. For example, setting `maxSurge: 3` would allow 3 new nodes to be created while the original 3 migrate data in parallel.
The cluster would then temporarily have 6 nodes.

Setting `maxSurge` to 0 and `maxUnavailable` to a positive value only allows a maximum number of pods to exist on the Kubernetes cluster.
For example, `maxSurge: 0; maxUnavailable: 1` would perform the 3 nodes upgrade this way:

1. migrate data away from the first 16GB node
2. once data is migrated, remove the 16GB node: the cluster temporarily has 2 nodes
3. add a new 32GB node: the cluster grows to 3 nodes
4. follow the same steps for the 2 other 16GB nodes

Even though any `changeBudget` can be specified, ECK will make sure some invariants are respected while a mutation is in progress:

* there must be at least one master node alive in the cluster
* there must be at least one data node alive in the cluster

Under certain circumstances, ECK will therefore ignore the change budget. For example, a safe migration from a 1-node cluster to another 1-node cluster can only be done by temporarily setting up a 2-nodes cluster.

It is possible to configure the `changeBudget` to optimize for reusing Persistent Volumes instead of migrating data across nodes. This feature is not supported yet: more details to come in the next release.

[id="{p}-group-definitions"]
==== Group definitions

To optimize upgrades for highly available setups, ECK can take into account arbitrary nodes grouping. It prioritizes recovery of entire availability zones in catastrophic scenarios.

For example, let's create a zone-aware Elasticsearch cluster. Some nodes will be created in `europe-west3-a`, and some others in `europe-west3-b`:

[source,yaml]
----
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.1.0
nodes:
- nodeCount: 3
config:
node.attr.zone: europe-west3-a
cluster.routing.allocation.awareness.attributes: zone
podTemplate:
meta:
labels:
nodesGroup: group-a
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- europe-west3-a
- nodeCount: 3
config:
node.attr.zone: europe-west3-b
cluster.routing.allocation.awareness.attributes: zone
podTemplate:
meta:
labels:
nodesGroup: group-b
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- europe-west3-b
updateStrategy:
changeBudget:
maxSurge: 1
maxUnavailable: 0
groups:
- selector:
matchLabels:
nodesGroup: group-a
- selector:
matchLabels:
nodesGroup: group-b
----

If a modification is applied to the Elasticsearch configuration of these 6 nodes, ECK will slowly upgrade the cluster nodes, taking the provided `changeBudget` into account.
In this example, it will spawn one additional node at a time, and migrate data away from one node at a time.

Imagine a catastrophic situation occurs while the mutation is in progress: all nodes in `europe-west3-b` suddenly disappear.
ECK will detect it, and recreate the 3 missing nodes as expected. However, since a cluster upgrade is already in progress, the current `changeBudget may already be maxed out, preventing new nodes to be created in `europe-west3-b`.

In this situation, it would be preferable to first recreate the missing nodes in `europe-west-3b`, then continue the cluster upgrade.

In order to do so, ECK must know about the logical grouping of nodes. Since this is an arbitrary setting (can represent availability zones, but also nodes roles, hot-warm topologies, etc.), it must be specified in the `updateStrategy.groups` section of the Elasticsearch specification.
Nodes grouping is expressed through labels on the resources. In the example above, 3 pods are labeled with `group-a`, and the 3 other pods with `group-b`.