-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: add multiple statefulsets support to TiDB Operator #240
Proposal: add multiple statefulsets support to TiDB Operator #240
Conversation
replicas: 3 | ||
image: pingcap/tikv:v2.1.0 | ||
… | ||
extra: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be renamed since extra
is a potentially common key name. sets
would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a default
set and zero or more extra sets, so what about extraSets
or otherSets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is better. I wan't sure if we would support having just sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is the problem we just discussed. Removing the default statefulset would impact all the other sets that inherit the property from the default set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sets +1
pd:
- name: name-1
replicas: 3
...
- name: name-2
replicas: 2
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
type TiKVStatus struct { | ||
StatefulSet []apps.StatefulSetStatus `json:"statefulSet,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatefulSet -> StatefulSets
|
||
``` go | ||
type TidbClusterSpec struct { | ||
TiKV []TiKVSpec `json:"tikv,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TiKV
is a unreasonable name when its type is an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TiKVSets
or TiKVDomains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tennix @onlymellb @gregwebs Is this ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer TiKVSets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TiKVSets
is ok for me. Do we need rename TiKVSpec
to TiKVSetSpec
?
I am still very concerned that we are creating a highly-manual UX that doesn't actually match the user's workflow. To scale vertically, I just want to tell the operator to scale vertically. Actually change the yaml to create multiple stateful sets isn't what I want to do. The advantage seems to be that
|
|
||
## Background | ||
|
||
Now, TiDB Operator create only one statefulset for one PD/TiKV/TiDB component separately. This works very well in most cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/create/creates/
* When using local volume, only horizontal scaling is guaranteed, vertical scaling may fail if there are not enough resources on the node. We need vertical scaling anyway | ||
* Users can't specify an exact number of pods in a given AZ. For example deploy 3 TiKV in AZ-1, and 2 TiKV in AZ-2 | ||
* For on-premise deployment, k8s nodes may not have the same size. To make most of the resources, it's better to allow different profiles for a single cluster. | ||
* One TiKV peer use different resources(CPU/Memory/Storage) from other peers in the same TiDB cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same case as above for on-premise deployment.
* For on-premise deployment, k8s nodes may not have the same size. To make most of the resources, it's better to allow different profiles for a single cluster. | ||
* One TiKV peer use different resources(CPU/Memory/Storage) from other peers in the same TiDB cluster | ||
* No isolation between different tidb-servers, a big AP query may affect TP query. The worst case is that AP query cause a tidb-server OOM, and this tidb-server is also serving TP request | ||
* No support offline the specific peers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No support for offline arbitrary peers due to ordinal limitations of statefulset.
|
||
Using local persistent volume, once deployed the PD and TiKV pod has to be on the node all its lifecycle, migrating it to other nodes with larger resources would be hard. However if we use multiple statefulsets, we can do the migration on the fly. This is useful when there is a hot region on a TiKV that can hardly split. | ||
|
||
So the TiDB Operator should support creating multiple statefulsets for one component to achieve all these requests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we can achieve all the above issues by using multiple statefulsets:
...
So this proposal suggests extending one statefulset with multiple statefulsets for a single component.
* No support offline the specific peers | ||
* No support deploy TiDB Operator on multiple K8s clusters | ||
|
||
Using local persistent volume, once deployed the PD and TiKV pod has to be on the node all its lifecycle, migrating it to other nodes with larger resources would be hard. However if we use multiple statefulsets, we can do the migration on the fly. This is useful when there is a hot region on a TiKV that can hardly split. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph should be removed.
@gregwebs Yes, the vertical scaling requires many manual interventions. But I don't think it's proper to do this in tidb-operator. Vertical scaling may involve migrating all data to a new host, this may be time-consuming and there is no clear way to tell users the progress. We should add another proposal of how to implement it. |
What would be very helpful to understand how this is designed to be used is to alter the user guide according to the proposal. Or otherwise list out all the different scenarios this is designed for and how it will be used in them. |
…eekface/tidb-operator into multiple-statefulsets-proposal
f545a5c
to
e608b07
Compare
I made some changes based on @tennix @onlymellb @xiaojingchen 's suggestions: e608b07 |
According to K8s api compatibility:
So add an extra field to existing may be a better solution. For example: change Figure 1: tikv:
replicas: 3
image: pingcap/tikv:v2.1.0
… Figure 2: tikv:
replicas: 3
image: pingcap/tikv:v2.1.0
…
tikvs:
- name: name-1
replicas: 3
image: pingcap/tikv:v2.1.0
...
- name: name-2
replicas: 2
image: pingcap/tikv:v2.1.0
... Figure 3: type TidbClusterSpec struct {
TiKV TiKVSpec `json:"tikv,omitempty"`
…
} Figure 4: type TidbClusterSpec struct {
TiKV TiKVSpec `json:"tikv,omitempty"`
TiKVs []TiKVSpec `json:"tikvs,omitempty"`
…
} |
How do we handle the status of |
No. Add a new attribute
Yes, the |
I change the API according to K8s api compatibility and add some use cases including vertical scaling and multiple zones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TiKV TiKVSpec `json:"tikv,omitempty"` | ||
… | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the definition of type TiKVSpec
as shown in Figure 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
type TiKVStatus struct { | ||
StatefulSet *apps.StatefulSetStatus `json:"statefulSet,omitempty"` | ||
StatefulSets []apps.StatefulSetStatus `json:"statefulSets,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatefulSets
type change to map[string]apps.StatefulSetStatus
``` | ||
|
||
The operator does not guarantee the order of multiple statefulsets’ upgrade. If the user care about the order indeed. Just: modify one statefulset, wait it done, and then modify the other one. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have a default upgrade order, such as we can upgrade the main statefulset first, and then upgrade the statefulset in the order of statefulset slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
``` | ||
|
||
The operator does not guarantee the order of multiple statefulsets’ scale in/out. If the user care the order indeed. Just: modify one statefulset, wait it done, and then modify the other one. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
To do vertical scaling, the user must: | ||
|
||
- Add a new statefulset in the `tikvs` section, using more `requests` or `limits` and wait it done. Add a new `name-3`, change from figure 9 to figure 10; | ||
- Scale down the old statefulset(`name-1`) in the `tikvs` section to zero and wait it done. See figure 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to wait for the new statefulset name-3 to start completely before scale down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user must ensure that.
Co-Authored-By: weekface <weekface@gmail.com>
…eekface/tidb-operator into multiple-statefulsets-proposal
…eekface/tidb-operator into multiple-statefulsets-proposal
|
|
||
### Vertical scaling | ||
|
||
We implement this feature in two phases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can achieve vertical scaling in two phases.
We do not implement this in tidb-operator. But users can achieve this.
Validation will fail. Addressed.
I mean the API changes in this proposal are compatible.
No.
Addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fixes: #228
@tennix @onlymellb @xiaojingchen @gregwebs PTAL