use topology to achive HA #2366

PengJi · 2020-05-02T14:54:47Z

What problem does this PR solve?

What is changed and how does it work?

Some label is set to node, for example: kubernetes.io/hostname=host1,zone=zone1, then a certain label is used as tolology, and the scheduler schedules pod to different tolology according to tolology to implement HA

Check List

Tests

Unit test
E2E test
Manual test (add detailed scripts or steps below)

Code changes
Mainly changed ha.go

Side effects

NONE

Related changes

Need to cherry-pick to the release branch
Need to update the documentation

Does this PR introduce a user-facing change?:

Previously, tidb-operators implemented node-based HA, but now we can do HA based on other topology (through ha-topology-key defined via tidb-cluelle.yaml annotations), such as based on zones

CLAassistant · 2020-05-02T14:55:13Z

All committers have signed the CLA.

Yisaer · 2020-05-02T15:00:40Z

Thanks for your contribution. Could you provide more detail and some examples to demonstrate the scenario and how it works?

weekface · 2020-05-02T23:25:58Z

charts/tidb-cluster/templates/tidb-cluster.yaml

@@ -7,6 +7,7 @@ metadata:
    pingcap.com/pd.{{ template "cluster.name" . }}-pd.sha: {{ include "pd-configmap.data-digest" . | quote }}
    pingcap.com/tikv.{{ template "cluster.name" . }}-tikv.sha: {{ include "tikv-configmap.data-digest" . | quote }}
    pingcap.com/tidb.{{ template "cluster.name" . }}-tidb.sha: {{ include "tidb-configmap.data-digest" . | quote }}
+    topologyKey: {{ .Values.topologyKey }}


What about change topologyKey to pingcap.com/ha-topology-key?

and define it as a const like https://github.com/pingcap/tidb-operator/blob/master/pkg/label/label.go#L43 does.

weekface · 2020-05-02T23:27:51Z

charts/tidb-cluster/values.yaml

@@ -64,6 +64,10 @@ discovery:
 # if the ConfigMap was not changed.
 enableConfigMapRollout: true

+# The scheduler achieves HA scheduling based on the topologyKey.
+# You can modify it to other label of node
+topologyKey: kubernetes.io/hostname


Suggested change

topologyKey: kubernetes.io/hostname

haTopologyKey: kubernetes.io/hostname

PengJi · 2020-05-04T14:08:48Z

Thanks for your contribution. Could you provide more detail and some examples to demonstrate the scenario and how it works?

The current tidb-operator's scheduling rule is highly available scheduling in the node dimension, which guarantees that the component instances are scheduled to different nodes to ensure high availability. Sometimes the distribution of nodes is as follows:

zone	zone1	zone2	zone3	zone3	zone3
node	kind-worker	kind-worker2	kind-worker3	kind-worker4	kind-worker5
pd	basic-pd-0		basic-pd-1	basic-pd-2
tikv	basic-tikv-0	basic-tikv-1			basic-tikv-2

The zones here are assumed to be different IDCs. It can be seen that kind-worker is located in zone1, kind-worker2 is located in zone2, kind-worker3, kind-worker4 and kind-worker5 are all located in zone3. At this time, basic-po-1 and basic-pd-2 are located in the same zone (zone3). We sometimes want to schedule pod to different zones, that is, to ensure the zone dimension is highly available, as follows:

zone	zone1	zone2	zone3	zone3	zone3
node	kind-worker	kind-worker2	kind-worker3	kind-worker4	kind-worker5
pd	basic-pd-0	basic-pd-1		basic-pd-2
tikv	basic-tikv-0	basic-tikv-1			basic-tikv-2

This ensures that at least two pods are still available in case of zone3 power outage, network outage, etc.

The code implements a highly available schedule for the zone dimension, where the zone, if not configured (I'm currently changing it to be configured through tidb-cluster.yaml annotations), defaults to the node, which is compatible with the current scheduling rules of tidb-operator

Yisaer · 2020-05-05T05:09:11Z

@PengJi Thanks for clarifying. That's very helpful.

Yisaer · 2020-05-06T06:09:26Z

Plz gofmt the pkg/scheduler/predicates/test_helper.go and pkg/scheduler/predicates/predicate_test.go

DanielZhangQD · 2020-05-07T04:53:58Z

@PengJi Thanks for the contribution!
Can this be done via pod anti-affinity? @PengJi @weekface

PengJi · 2020-05-07T07:01:45Z

@PengJi Thanks for the contribution!
Can this be done via pod anti-affinity? @PengJi @weekface

@DanielZhangQD One of requirements is to implement HA based on the zone dimension (such as IDC, currently is node dimension). If using podAntiAffinity, the possible configuration is as follows:

nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
      - key: failure-domain.beta.kubernetes.io/zone
        operator: In
        values:
        - zone1
        - zone2
        - zone3
podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 100
    podAffinityTerm:
      labelSelector:
        matchExpressions:
        - key: app.kubernetes.io/component
          operator: In
          values:
          - pd
      topologyKey: failure-domain.beta.kubernetes.io/zone

This method does not guarantee that pods are evenly distributed among the three different zones, or is there any other configuration method?
In fact, it is similar to TopologySpreadConstraints, but our k8s version is relatively low (< 1.18), cannot use this feature in production environment.

charts/tidb-cluster/templates/tidb-cluster.yaml

weekface · 2020-05-07T12:38:35Z

@cofyc PTAL

weekface · 2020-05-07T12:50:28Z

@PengJi Can you add a release note (user-facing change)?

cofyc · 2020-05-08T02:45:15Z

/run-e2e-tests

cofyc · 2020-05-08T05:11:20Z

Thanks for your contribution. Could you provide more detail and some examples to demonstrate the scenario and how it works?

The current tidb-operator's scheduling rule is highly available scheduling in the node dimension, which guarantees that the component instances are scheduled to different nodes to ensure high availability. Sometimes the distribution of nodes is as follows:

zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0 basic-pd-1 basic-pd-2
tikv basic-tikv-0 basic-tikv-1 basic-tikv-2
The zones here are assumed to be different IDCs. It can be seen that kind-worker is located in zone1, kind-worker2 is located in zone2, kind-worker3, kind-worker4 and kind-worker5 are all located in zone3. At this time, basic-po-1 and basic-pd-2 are located in the same zone (zone3). We sometimes want to schedule pod to different zones, that is, to ensure the zone dimension is highly available, as follows:

zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0 basic-pd-1 basic-pd-2
tikv basic-tikv-0 basic-tikv-1 basic-tikv-2
This ensures that at least two pods are still available in case of zone3 power outage, network outage, etc.

The code implements a highly available schedule for the zone dimension, where the zone, if not configured (I'm currently changing it to be configured through tidb-cluster.yaml annotations), defaults to the node, which is compatible with the current scheduling rules of tidb-operator

This LGTM and is an enhancement to current HA implementation.

It does not support more than one topology, so pods in a particular zone (topology-key=zone) will not be spread evenly across nodes. Do you plan to implement this feature, like Multiple TopologySpreadConstraints?

weekface

LGTM

cofyc · 2020-05-08T07:26:04Z

/run-cherry-picker

sre-bot · 2020-05-08T07:27:07Z

cherry pick to release-1.1 in PR #2399

PengJi · 2020-05-08T15:27:37Z

Thanks for your contribution. Could you provide more detail and some examples to demonstrate the scenario and how it works?

The current tidb-operator's scheduling rule is highly available scheduling in the node dimension, which guarantees that the component instances are scheduled to different nodes to ensure high availability. Sometimes the distribution of nodes is as follows:
zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0 basic-pd-1 basic-pd-2
tikv basic-tikv-0 basic-tikv-1 basic-tikv-2
The zones here are assumed to be different IDCs. It can be seen that kind-worker is located in zone1, kind-worker2 is located in zone2, kind-worker3, kind-worker4 and kind-worker5 are all located in zone3. At this time, basic-po-1 and basic-pd-2 are located in the same zone (zone3). We sometimes want to schedule pod to different zones, that is, to ensure the zone dimension is highly available, as follows:
zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0 basic-pd-1 basic-pd-2
tikv basic-tikv-0 basic-tikv-1 basic-tikv-2
This ensures that at least two pods are still available in case of zone3 power outage, network outage, etc.
The code implements a highly available schedule for the zone dimension, where the zone, if not configured (I'm currently changing it to be configured through tidb-cluster.yaml annotations), defaults to the node, which is compatible with the current scheduling rules of tidb-operator

This LGTM and is an enhancement to current HA implementation.

It does not support more than one topology, so pods in a particular zone (topology-key=zone) will not be spread evenly across nodes. Do you plan to implement this feature, like Multiple TopologySpreadConstraints?

There is no plan to implement this feature temporarily. The current method is to use the k8s SelectorSpreadPriority and InterPodAffinityPriority priority strategy and increase their weight.

use topology to achive HA

abb025d

sre-bot added the contribution label May 2, 2020

weekface reviewed May 2, 2020

View reviewed changes

weekface requested review from cofyc, Yisaer and DanielZhangQD May 2, 2020 23:51

format files

37fd8f5

weekface reviewed May 7, 2020

View reviewed changes

charts/tidb-cluster/templates/tidb-cluster.yaml Outdated Show resolved Hide resolved

move HATopologyKey to label

98dd10e

weekface approved these changes May 8, 2020

View reviewed changes

cofyc approved these changes May 8, 2020

View reviewed changes

cofyc merged commit 7d7ce06 into pingcap:master May 8, 2020

cofyc added the needs-cherry-pick-1.1 label May 8, 2020

sre-bot pushed a commit to sre-bot/tidb-operator that referenced this pull request May 8, 2020

use topology to achive HA (pingcap#2366)

59dfa52

sre-bot mentioned this pull request May 8, 2020

use topology to achive HA (#2366) #2399

Merged

sre-bot added a commit that referenced this pull request May 8, 2020

use topology to achive HA (#2366) (#2399)

1a87344

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use topology to achive HA #2366

use topology to achive HA #2366

PengJi commented May 2, 2020 •

edited

Loading

CLAassistant commented May 2, 2020 •

edited

Loading

Yisaer commented May 2, 2020

weekface May 2, 2020

PengJi May 8, 2020

weekface May 2, 2020

PengJi May 8, 2020

PengJi commented May 4, 2020

Yisaer commented May 5, 2020

Yisaer commented May 6, 2020

DanielZhangQD commented May 7, 2020

PengJi commented May 7, 2020 •

edited

Loading

weekface commented May 7, 2020

weekface commented May 7, 2020

cofyc commented May 8, 2020

cofyc commented May 8, 2020

weekface left a comment

cofyc commented May 8, 2020

sre-bot commented May 8, 2020

PengJi commented May 8, 2020

	topologyKey: kubernetes.io/hostname
	haTopologyKey: kubernetes.io/hostname

use topology to achive HA #2366

use topology to achive HA #2366

Conversation

PengJi commented May 2, 2020 • edited Loading

What problem does this PR solve?

What is changed and how does it work?

Check List

Does this PR introduce a user-facing change?:

CLAassistant commented May 2, 2020 • edited Loading

Yisaer commented May 2, 2020

weekface May 2, 2020

Choose a reason for hiding this comment

PengJi May 8, 2020

Choose a reason for hiding this comment

weekface May 2, 2020

Choose a reason for hiding this comment

PengJi May 8, 2020

Choose a reason for hiding this comment

PengJi commented May 4, 2020

Yisaer commented May 5, 2020

Yisaer commented May 6, 2020

DanielZhangQD commented May 7, 2020

PengJi commented May 7, 2020 • edited Loading

weekface commented May 7, 2020

weekface commented May 7, 2020

cofyc commented May 8, 2020

cofyc commented May 8, 2020

weekface left a comment

Choose a reason for hiding this comment

cofyc commented May 8, 2020

sre-bot commented May 8, 2020

PengJi commented May 8, 2020

PengJi commented May 2, 2020 •

edited

Loading

CLAassistant commented May 2, 2020 •

edited

Loading

PengJi commented May 7, 2020 •

edited

Loading