-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pods disaster tolerance and data regions disaster tolerance test cases #497
Conversation
/run-e2e-test |
1 similar comment
/run-e2e-test |
/run-e2e-test |
1 similar comment
/run-e2e-test |
/run-e2e-test |
1 similar comment
/run-e2e-test |
tests/cmd/e2e/main.go
Outdated
@@ -103,6 +102,46 @@ func main() { | |||
BatchSize: 1, | |||
RawSize: 1, | |||
}, | |||
SubValues: `pd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better readability, can we move this string literal to a private const value?
tests/cmd/e2e/main.go
Outdated
@@ -146,6 +185,46 @@ func main() { | |||
BatchSize: 1, | |||
RawSize: 1, | |||
}, | |||
SubValues: `pd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this
tests/cmd/stability/main.go
Outdated
@@ -100,6 +100,46 @@ func main() { | |||
}, | |||
Monitor: true, | |||
BlockWriteConfig: conf.BlockWriter, | |||
SubValues: `pd: | |||
affinity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend abstract the template and render the namespaces
field, maintaining similar yaml literal in 4 places is error-prone
tests/dt.go
Outdated
for i, node := range nodes.Items { | ||
index := i % RackNum | ||
node.Labels[RackLabel] = fmt.Sprintf("rack%d", index) | ||
oa.kubeCli.CoreV1().Nodes().Update(&node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle the error
tests/dt.go
Outdated
return oa.checkDisasterTolerance(tidbs.Items, nodeMap) | ||
} | ||
|
||
func (oa *operatorActions) checkDisasterTolerance(allPods []corev1.Pod, nodeMap map[string]corev1.Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revive
warns about function/method name only differs in capitalization
, I thinks we should avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good, got it
tests/dt.go
Outdated
} | ||
|
||
for rack, pods := range rackPods { | ||
if len(pods) > maxPodsOneRack { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also check minimum pods.
Consider 4 pod with 3 rack, max pod is 2, [2, 2, 0] fit the criteria but not the best arrangement to be fault-tolerant. Add the minimum check >=1
will help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! it's a bug
pdClient := http.Client{ | ||
Timeout: 10 * time.Second, | ||
} | ||
url := fmt.Sprintf("http://%s-pd.%s:2379/pd/api/v1/regions", cluster.ClusterName, cluster.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add this to PdControlInterface
and use oa.pdControl.GetPDClient().GetRegions()
instead? Which is better for future maintainability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considered, mainly at present only the test code to use this interface, and PD 3.0 and later versions of the interface has changed, so temporary implementation.
return err | ||
} | ||
|
||
for _, region := range regions.Regions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add comment for this? I find this is hard to understand at first glance
/run-e2e-tests |
/run-e2e-test |
/run-e2e-test |
1 similar comment
/run-e2e-test |
/run-e2e-test |
/run-e2e-test |
/run-e2e-test |
2 similar comments
/run-e2e-test |
/run-e2e-test |
/run-e2e-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tests/manifests/e2e/e2e.yaml
Outdated
@@ -42,13 +42,14 @@ spec: | |||
serviceAccount: tidb-operator-e2e | |||
containers: | |||
- name: tidb-operator-e2e | |||
image: 127.0.0.1:5000/pingcap/tidb-operator-e2e:latest | |||
image: hub.pingcap.net/chenxiaojing/tidb-operator-e2e:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image name
/run-e2e-test |
LGTM |
What problem does this PR solve?
What is changed and how it works?
Check List
Tests
Code changes
tests/
)Side effects
Related changes
base PR: #475
should be merged after #499
Does this PR introduce a user-facing change?: