Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use topology to achive HA #2366

Merged
merged 3 commits into from
May 8, 2020
Merged

use topology to achive HA #2366

merged 3 commits into from
May 8, 2020

Conversation

PengJi
Copy link
Contributor

@PengJi PengJi commented May 2, 2020

What problem does this PR solve?

#2110

What is changed and how does it work?

Some label is set to node, for example: kubernetes.io/hostname=host1,zone=zone1, then a certain label is used as tolology, and the scheduler schedules pod to different tolology according to tolology to implement HA

Check List

Tests

  • Unit test
  • E2E test
  • Manual test (add detailed scripts or steps below)

Code changes
Mainly changed ha.go

Side effects

NONE

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Does this PR introduce a user-facing change?:

Previously, tidb-operators implemented node-based HA, but now we can do HA based on other topology (through ha-topology-key defined via tidb-cluelle.yaml annotations), such as based on zones

@CLAassistant
Copy link

CLAassistant commented May 2, 2020

CLA assistant check
All committers have signed the CLA.

@Yisaer
Copy link
Contributor

Yisaer commented May 2, 2020

Thanks for your contribution. Could you provide more detail and some examples to demonstrate the scenario and how it works?

@@ -7,6 +7,7 @@ metadata:
pingcap.com/pd.{{ template "cluster.name" . }}-pd.sha: {{ include "pd-configmap.data-digest" . | quote }}
pingcap.com/tikv.{{ template "cluster.name" . }}-tikv.sha: {{ include "tikv-configmap.data-digest" . | quote }}
pingcap.com/tidb.{{ template "cluster.name" . }}-tidb.sha: {{ include "tidb-configmap.data-digest" . | quote }}
topologyKey: {{ .Values.topologyKey }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about change topologyKey to pingcap.com/ha-topology-key?

and define it as a const like https://github.com/pingcap/tidb-operator/blob/master/pkg/label/label.go#L43 does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -64,6 +64,10 @@ discovery:
# if the ConfigMap was not changed.
enableConfigMapRollout: true

# The scheduler achieves HA scheduling based on the topologyKey.
# You can modify it to other label of node
topologyKey: kubernetes.io/hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
topologyKey: kubernetes.io/hostname
haTopologyKey: kubernetes.io/hostname

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@PengJi
Copy link
Contributor Author

PengJi commented May 4, 2020

Thanks for your contribution. Could you provide more detail and some examples to demonstrate the scenario and how it works?

The current tidb-operator's scheduling rule is highly available scheduling in the node dimension, which guarantees that the component instances are scheduled to different nodes to ensure high availability. Sometimes the distribution of nodes is as follows:

zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0   basic-pd-1 basic-pd-2  
tikv basic-tikv-0 basic-tikv-1     basic-tikv-2

The zones here are assumed to be different IDCs. It can be seen that kind-worker is located in zone1, kind-worker2 is located in zone2, kind-worker3, kind-worker4 and kind-worker5 are all located in zone3. At this time, basic-po-1 and basic-pd-2 are located in the same zone (zone3). We sometimes want to schedule pod to different zones, that is, to ensure the zone dimension is highly available, as follows:

zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0 basic-pd-1   basic-pd-2
tikv basic-tikv-0 basic-tikv-1     basic-tikv-2

This ensures that at least two pods are still available in case of zone3 power outage, network outage, etc.

The code implements a highly available schedule for the zone dimension, where the zone, if not configured (I'm currently changing it to be configured through tidb-cluster.yaml annotations), defaults to the node, which is compatible with the current scheduling rules of tidb-operator

@Yisaer
Copy link
Contributor

Yisaer commented May 5, 2020

@PengJi Thanks for clarifying. That's very helpful.

@Yisaer
Copy link
Contributor

Yisaer commented May 6, 2020

Plz gofmt the pkg/scheduler/predicates/test_helper.go and pkg/scheduler/predicates/predicate_test.go

@DanielZhangQD
Copy link
Contributor

@PengJi Thanks for the contribution!
Can this be done via pod anti-affinity? @PengJi @weekface

@PengJi
Copy link
Contributor Author

PengJi commented May 7, 2020

@PengJi Thanks for the contribution!
Can this be done via pod anti-affinity? @PengJi @weekface

@DanielZhangQD One of requirements is to implement HA based on the zone dimension (such as IDC, currently is node dimension). If using podAntiAffinity, the possible configuration is as follows:

nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
      - key: failure-domain.beta.kubernetes.io/zone
        operator: In
        values:
        - zone1
        - zone2
        - zone3
podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 100
    podAffinityTerm:
      labelSelector:
        matchExpressions:
        - key: app.kubernetes.io/component
          operator: In
          values:
          - pd
      topologyKey: failure-domain.beta.kubernetes.io/zone

This method does not guarantee that pods are evenly distributed among the three different zones, or is there any other configuration method?
In fact, it is similar to TopologySpreadConstraints, but our k8s version is relatively low (< 1.18), cannot use this feature in production environment.

@weekface
Copy link
Contributor

weekface commented May 7, 2020

@cofyc PTAL

@weekface
Copy link
Contributor

weekface commented May 7, 2020

@PengJi Can you add a release note (user-facing change)?

@cofyc
Copy link
Contributor

cofyc commented May 8, 2020

/run-e2e-tests

@cofyc
Copy link
Contributor

cofyc commented May 8, 2020

Thanks for your contribution. Could you provide more detail and some examples to demonstrate the scenario and how it works?

The current tidb-operator's scheduling rule is highly available scheduling in the node dimension, which guarantees that the component instances are scheduled to different nodes to ensure high availability. Sometimes the distribution of nodes is as follows:

zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0   basic-pd-1 basic-pd-2  
tikv basic-tikv-0 basic-tikv-1     basic-tikv-2
The zones here are assumed to be different IDCs. It can be seen that kind-worker is located in zone1, kind-worker2 is located in zone2, kind-worker3, kind-worker4 and kind-worker5 are all located in zone3. At this time, basic-po-1 and basic-pd-2 are located in the same zone (zone3). We sometimes want to schedule pod to different zones, that is, to ensure the zone dimension is highly available, as follows:

zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0 basic-pd-1   basic-pd-2
tikv basic-tikv-0 basic-tikv-1     basic-tikv-2
This ensures that at least two pods are still available in case of zone3 power outage, network outage, etc.

The code implements a highly available schedule for the zone dimension, where the zone, if not configured (I'm currently changing it to be configured through tidb-cluster.yaml annotations), defaults to the node, which is compatible with the current scheduling rules of tidb-operator

This LGTM and is an enhancement to current HA implementation.

It does not support more than one topology, so pods in a particular zone (topology-key=zone) will not be spread evenly across nodes. Do you plan to implement this feature, like Multiple TopologySpreadConstraints?

Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cofyc cofyc merged commit 7d7ce06 into pingcap:master May 8, 2020
@cofyc
Copy link
Contributor

cofyc commented May 8, 2020

/run-cherry-picker

sre-bot pushed a commit to sre-bot/tidb-operator that referenced this pull request May 8, 2020
@sre-bot
Copy link
Contributor

sre-bot commented May 8, 2020

cherry pick to release-1.1 in PR #2399

sre-bot added a commit that referenced this pull request May 8, 2020
@PengJi
Copy link
Contributor Author

PengJi commented May 8, 2020

Thanks for your contribution. Could you provide more detail and some examples to demonstrate the scenario and how it works?

The current tidb-operator's scheduling rule is highly available scheduling in the node dimension, which guarantees that the component instances are scheduled to different nodes to ensure high availability. Sometimes the distribution of nodes is as follows:
zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0   basic-pd-1 basic-pd-2  
tikv basic-tikv-0 basic-tikv-1     basic-tikv-2
The zones here are assumed to be different IDCs. It can be seen that kind-worker is located in zone1, kind-worker2 is located in zone2, kind-worker3, kind-worker4 and kind-worker5 are all located in zone3. At this time, basic-po-1 and basic-pd-2 are located in the same zone (zone3). We sometimes want to schedule pod to different zones, that is, to ensure the zone dimension is highly available, as follows:
zone zone1 zone2 zone3 zone3 zone3
node kind-worker kind-worker2 kind-worker3 kind-worker4 kind-worker5
pd basic-pd-0 basic-pd-1   basic-pd-2
tikv basic-tikv-0 basic-tikv-1     basic-tikv-2
This ensures that at least two pods are still available in case of zone3 power outage, network outage, etc.
The code implements a highly available schedule for the zone dimension, where the zone, if not configured (I'm currently changing it to be configured through tidb-cluster.yaml annotations), defaults to the node, which is compatible with the current scheduling rules of tidb-operator

This LGTM and is an enhancement to current HA implementation.

It does not support more than one topology, so pods in a particular zone (topology-key=zone) will not be spread evenly across nodes. Do you plan to implement this feature, like Multiple TopologySpreadConstraints?

There is no plan to implement this feature temporarily. The current method is to use the k8s SelectorSpreadPriority and InterPodAffinityPriority priority strategy and increase their weight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants