Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add multiple statefulsets support to TiDB Operator #240

Merged
merged 25 commits into from
Dec 29, 2018

Conversation

weekface
Copy link
Contributor

replicas: 3
image: pingcap/tikv:v2.1.0
extra:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be renamed since extra is a potentially common key name. sets would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a default set and zero or more extra sets, so what about extraSets or otherSets

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is better. I wan't sure if we would support having just sets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is the problem we just discussed. Removing the default statefulset would impact all the other sets that inherit the property from the default set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just sets +1

pd:
  - name: name-1
    replicas: 3
    ...
  - name: name-2
    replicas: 2
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

type TiKVStatus struct {
StatefulSet []apps.StatefulSetStatus `json:"statefulSet,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatefulSet -> StatefulSets


``` go
type TidbClusterSpec struct {
TiKV []TiKVSpec `json:"tikv,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TiKV is a unreasonable name when its type is an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TiKVSets or TiKVDomains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer TiKVSets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TiKVSets is ok for me. Do we need rename TiKVSpec to TiKVSetSpec?

@gregwebs
Copy link
Contributor

I am still very concerned that we are creating a highly-manual UX that doesn't actually match the user's workflow. To scale vertically, I just want to tell the operator to scale vertically. Actually change the yaml to create multiple stateful sets isn't what I want to do.

The advantage seems to be that

  1. It makes it apparent to me that I need to increase the cluster capacity
  2. I can write programs that can use this to automatically do what I want


## Background

Now, TiDB Operator create only one statefulset for one PD/TiKV/TiDB component separately. This works very well in most cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/create/creates/

* When using local volume, only horizontal scaling is guaranteed, vertical scaling may fail if there are not enough resources on the node. We need vertical scaling anyway
* Users can't specify an exact number of pods in a given AZ. For example deploy 3 TiKV in AZ-1, and 2 TiKV in AZ-2
* For on-premise deployment, k8s nodes may not have the same size. To make most of the resources, it's better to allow different profiles for a single cluster.
* One TiKV peer use different resources(CPU/Memory/Storage) from other peers in the same TiDB cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same case as above for on-premise deployment.

* For on-premise deployment, k8s nodes may not have the same size. To make most of the resources, it's better to allow different profiles for a single cluster.
* One TiKV peer use different resources(CPU/Memory/Storage) from other peers in the same TiDB cluster
* No isolation between different tidb-servers, a big AP query may affect TP query. The worst case is that AP query cause a tidb-server OOM, and this tidb-server is also serving TP request
* No support offline the specific peers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No support for offline arbitrary peers due to ordinal limitations of statefulset.


Using local persistent volume, once deployed the PD and TiKV pod has to be on the node all its lifecycle, migrating it to other nodes with larger resources would be hard. However if we use multiple statefulsets, we can do the migration on the fly. This is useful when there is a hot region on a TiKV that can hardly split.

So the TiDB Operator should support creating multiple statefulsets for one component to achieve all these requests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we can achieve all the above issues by using multiple statefulsets:
...

So this proposal suggests extending one statefulset with multiple statefulsets for a single component.

* No support offline the specific peers
* No support deploy TiDB Operator on multiple K8s clusters

Using local persistent volume, once deployed the PD and TiKV pod has to be on the node all its lifecycle, migrating it to other nodes with larger resources would be hard. However if we use multiple statefulsets, we can do the migration on the fly. This is useful when there is a hot region on a TiKV that can hardly split.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph should be removed.

@tennix
Copy link
Member

tennix commented Dec 19, 2018

@gregwebs Yes, the vertical scaling requires many manual interventions. But I don't think it's proper to do this in tidb-operator. Vertical scaling may involve migrating all data to a new host, this may be time-consuming and there is no clear way to tell users the progress. We should add another proposal of how to implement it.

@gregwebs
Copy link
Contributor

What would be very helpful to understand how this is designed to be used is to alter the user guide according to the proposal. Or otherwise list out all the different scenarios this is designed for and how it will be used in them.

@weekface weekface force-pushed the multiple-statefulsets-proposal branch from f545a5c to e608b07 Compare December 21, 2018 10:07
@weekface
Copy link
Contributor Author

I made some changes based on @tennix @onlymellb @xiaojingchen 's suggestions: e608b07

@weekface
Copy link
Contributor Author

According to K8s api compatibility:

An API change is considered compatible if it:

  • adds new functionality that is not required for correct behavior (e.g., does not add a new required field)
  • does not change existing semantics, including:
  • the semantic meaning of default values and behavior
  • interpretation of existing API types, fields, and values
  • which fields are required and which are not
  • mutable fields do not become immutable
  • valid values do not become invalid
  • explicitly invalid values do not become valid

Put another way:

  • Any API call (e.g. a structure POSTed to a REST endpoint) that succeeded before your change must succeed after your change.
  • Any API call that does not use your change must behave the same as it did before your change.
  • Any API call that uses your change must not cause problems (e.g. crash or degrade behavior) when issued against an API servers that do not include your change.
  • It must be possible to round-trip your change (convert to different API versions and back) with no loss of information.
  • Existing clients need not be aware of your change in order for them to continue to function as they did previously, even when your change is in use.
  • It must be possible to rollback to a previous version of API server that does not include your change and have no impact on API objects which do not use your change. API objects that use your change will be impacted in case of a rollback.
  • If your change does not meet these criteria, it is not considered compatible, and may break older clients, or result in newer clients causing undefined behavior. Such changes are generally disallowed, though exceptions have been made in extreme cases (e.g. security or obvious bugs).

So add an extra field to existing may be a better solution. For example: changevalues.yaml from figure 1 to figure 2, and change the TidbCluster object from figure 3 to figure 4:

Figure 1:

tikv:
  replicas: 3
  image: pingcap/tikv:v2.1.0
  

Figure 2:

tikv:
  replicas: 3
  image: pingcap/tikv:v2.1.0
  

tikvs:
  - name: name-1
    replicas: 3
    image: pingcap/tikv:v2.1.0
    ...
  - name: name-2
    replicas: 2
    image: pingcap/tikv:v2.1.0
    ...

Figure 3:

type TidbClusterSpec struct {
  TiKV            TiKVSpec            `json:"tikv,omitempty"`
  …
}

Figure 4:

type TidbClusterSpec struct {
  TiKV            TiKVSpec            `json:"tikv,omitempty"`
  TiKVs           []TiKVSpec            `json:"tikvs,omitempty"`
  …
}

@tennix
Copy link
Member

tennix commented Dec 25, 2018

How do we handle the status of TidbCluster, should we merge all the statefulsets status? And how do we handle the order of all the statefulsets, the first PD acts as the seed and it's different from the rest.

@weekface
Copy link
Contributor Author

How do we handle the status of TidbCluster, should we merge all the statefulsets status?

No. Add a new attribute SetStatuses.

And how do we handle the order of all the statefulsets, the first PD acts as the seed and it's different from the rest.

Yes, the pd is the first order. Others are ordered by their index.

@weekface
Copy link
Contributor Author

I change the API according to K8s api compatibility and add some use cases including vertical scaling and multiple zones.

@tennix @onlymellb @xiaojingchen @gregwebs PTAL

tennix
tennix previously approved these changes Dec 26, 2018
Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

TiKV TiKVSpec `json:"tikv,omitempty"`
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the definition of type TiKVSpec as shown in Figure 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


type TiKVStatus struct {
StatefulSet *apps.StatefulSetStatus `json:"statefulSet,omitempty"`
StatefulSets []apps.StatefulSetStatus `json:"statefulSets,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatefulSets type change to map[string]apps.StatefulSetStatus

```

The operator does not guarantee the order of multiple statefulsets’ upgrade. If the user care about the order indeed. Just: modify one statefulset, wait it done, and then modify the other one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have a default upgrade order, such as we can upgrade the main statefulset first, and then upgrade the statefulset in the order of statefulset slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

```

The operator does not guarantee the order of multiple statefulsets’ scale in/out. If the user care the order indeed. Just: modify one statefulset, wait it done, and then modify the other one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

To do vertical scaling, the user must:

- Add a new statefulset in the `tikvs` section, using more `requests` or `limits` and wait it done. Add a new `name-3`, change from figure 9 to figure 10;
- Scale down the old statefulset(`name-1`) in the `tikvs` section to zero and wait it done. See figure 11;
Copy link
Contributor

@onlymellb onlymellb Dec 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait for the new statefulset name-3 to start completely before scale down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user must ensure that.

Co-Authored-By: weekface <weekface@gmail.com>
@weekface weekface dismissed stale reviews from tennix via e845c63 December 26, 2018 05:51
@tennix
Copy link
Member

tennix commented Dec 28, 2018

  • What if the name filed in tikvs are the same? How do we handle this case?
  • For API compatibility, do you mean we should use different apiVersion to access the two different form of resources
  • For the pd/tikv service, you mentioned there is no change. Do you mean that different statefulsets using the same headless and non-headless service as the main statefulset, or service per statefulset?


### Vertical scaling

We implement this feature in two phases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can achieve vertical scaling in two phases.

We do not implement this in tidb-operator. But users can achieve this.

@weekface
Copy link
Contributor Author

  • What if the name filed in tikvs are the same? How do we handle this case?

Validation will fail. Addressed.

  • For API compatibility, do you mean we should use different apiVersion to access the two different form of resources

I mean the API changes in this proposal are compatible.

  • For the pd/tikv service, you mentioned there is no change. Do you mean that different statefulsets using the same headless and non-headless service as the main statefulset, or service per statefulset?

No.

  • For PD: N+1 Services should be created, N headless services and 1 client service
  • For TiKV: N headless services should be created

Addressed.

Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@onlymellb onlymellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tennix tennix merged commit 8392a81 into pingcap:master Dec 29, 2018
@weekface weekface deleted the multiple-statefulsets-proposal branch December 29, 2018 11:52
@tennix tennix mentioned this pull request Dec 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using multiple statefulsets for each component
5 participants