🌱 test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec #11145

chrischdi · 2024-09-06T11:29:23Z

What this PR does / why we need it:

Implements additional checks to ensure the cluster is operational during an update.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

/area e2e-testing

chrischdi · 2024-09-06T11:29:41Z

/test help

k8s-ci-robot · 2024-09-06T11:29:44Z

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test pull-cluster-api-build-main
/test pull-cluster-api-e2e-blocking-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-31-1-32-main
/test pull-cluster-api-test-main
/test pull-cluster-api-test-mink8s-main
/test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

/test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-apidiff-main
pull-cluster-api-build-main
pull-cluster-api-e2e-blocking-main
pull-cluster-api-test-main
pull-cluster-api-verify-main

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

chrischdi · 2024-09-06T11:30:11Z

/test pull-cluster-api-e2e-main

chrischdi · 2024-09-09T07:05:54Z

/assign @sbueringer @fabriziopandini

fabriziopandini

Thanks for implementing this test! just a few nits from my side

test/e2e/cluster_upgrade_test.go

fabriziopandini · 2024-09-09T07:43:56Z

test/e2e/cluster_upgrade_test.go

+				Expect(managementClusterProxy.GetClient().Get(ctx, client.ObjectKeyFromObject(cluster), cluster)).To(Succeed())
+
+				// This replaces the WaitForControlPlaneMachinesToBeUpgraded function and additionally:
+				// * checks that kube-proxy is healthy


What about adding a not explaining that we are doing this test in order to ensure that non static pods remain healthy on CP machines during upgrade

fabriziopandini · 2024-09-09T07:46:25Z

test/e2e/cluster_upgrade_test.go

+					var upgraded int64
+					deletingMachinesWithPreDrainHook := []clusterv1.Machine{}
+					for _, m := range machines {
+						if *m.Spec.Version == cluster.Spec.Topology.Version && conditions.IsTrue(&m, clusterv1.MachineNodeHealthyCondition) {


q: why are we checking clusterv1.MachineNodeHealthyCondition? as far as I remember it only checks for a dummy condition to not exists, so if I'm not wrong it doesn't really give added value 🤔

Wait what. For a dummy condition to not exist?

// MachineNodeHealthyCondition provides info about the operational state of the Kubernetes node hosted on the machine by summarizing node conditions. // If the conditions defined in a Kubernetes node (i.e., NodeReady, NodeMemoryPressure, NodeDiskPressure, NodePIDPressure, and NodeNetworkUnavailable) are in a healthy state, it will be set to True. MachineNodeHealthyCondition ConditionType = "NodeHealthy"

jep, I was expecting this to check this as mentioned in the comment

This is how we are configuring MHC in E2E tests:

machineHealthCheck: maxUnhealthy: 100% unhealthyConditions: - type: e2e.remediation.condition status: "False" timeout: 20s

So (in E2E tests only) MHC is testing for a dummy e2e.remediation.condition, not for NodeReady, NodeMemoryPressure etc

But this is the MachineNodeHealthyCondition not the MachineHealthCheckSucceeded condition

It is set to true here:

cluster-api/internal/controllers/machine/machine_controller_noderef.go

Line 160 in d7db259

conditions.MarkTrue(machine, clusterv1.MachineNodeHealthyCondition)

This should have nothing to do with MHC's

fabriziopandini · 2024-09-09T07:49:21Z

test/e2e/cluster_upgrade_test.go

+						}
+					}
+
+					// Check if the expected number of kube-proxy pods exist and all of them are healthy.


might be let's add a not specifying that we are checking kube proxy both on old and new CP nodes, as well as on workers (across the entire cluster)

test/e2e/cluster_upgrade_test.go

chrischdi · 2024-09-10T12:14:03Z

/test pull-cluster-api-e2e-main

…anceSpec

…s and kube-proxy being healthy via a pre-drain hook

chrischdi · 2024-09-10T13:40:41Z

/test pull-cluster-api-e2e-main

rebase

test/e2e/cluster_upgrade_test.go

chrischdi · 2024-09-11T11:52:37Z

/test pull-cluster-api-e2e-main

sbueringer

Last nits from my side

test/e2e/cluster_upgrade_test.go

sbueringer · 2024-09-12T10:30:11Z

/assign @fabriziopandini
(for a final review)

chrischdi · 2024-09-12T11:20:13Z

/test pull-cluster-api-e2e-main

chrischdi · 2024-09-12T13:34:15Z

flake

/retest

sbueringer · 2024-09-12T15:05:31Z

Thx!!

Really nice improvement

/lgtm

k8s-ci-robot · 2024-09-12T15:05:37Z

LGTM label has been added.

Git tree hash: 216aa8bf9643340fc73cd3b02f603ca74ebb80ab

fabriziopandini

Great work!
/lgtm
/approve

k8s-ci-robot · 2024-09-19T11:01:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [fabriziopandini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sbueringer · 2024-10-18T10:08:40Z

/cherry-pick release-1.8

sbueringer · 2024-10-18T10:09:03Z

This additional validation found this issue: #11296

Let's also add it to release-1.8

k8s-infra-cherrypick-robot · 2024-10-18T10:09:16Z

@sbueringer: #11145 failed to apply on top of branch "release-1.8":

Applying: test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec
Applying: test: add template for kcp-pre-drain
Using index info to reconstruct a base tree...
M	Makefile
M	test/e2e/config/docker.yaml
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/config/docker.yaml
CONFLICT (content): Merge conflict in test/e2e/config/docker.yaml
Auto-merging Makefile
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 test: add template for kcp-pre-drain

In response to this:

/cherry-pick release-1.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sbueringer · 2024-10-18T10:09:28Z

I'll do a manual cherry-pick

…rmanceSpec (kubernetes-sigs#11145) * test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec * test: add template for kcp-pre-drain * test: adjust multi-controlplane quickstart test to check for all nodes and kube-proxy being healthy via a pre-drain hook * lint fix * Review fixes * review fixes * review fixes * review fix

k8s-ci-robot added area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 6, 2024

k8s-ci-robot requested review from elmiko and killianmuldoon September 6, 2024 11:29

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 6, 2024

chrischdi changed the title ~~test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec~~ 🌱 test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec Sep 6, 2024

k8s-ci-robot assigned fabriziopandini and sbueringer Sep 9, 2024

fabriziopandini reviewed Sep 9, 2024

View reviewed changes

sbueringer reviewed Sep 9, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2024

chrischdi added 6 commits September 10, 2024 15:38

test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConform…

ddd2e07

…anceSpec

test: add template for kcp-pre-drain

0d841e6

test: adjust multi-controlplane quickstart test to check for all node…

d9279e0

…s and kube-proxy being healthy via a pre-drain hook

lint fix

07a59e3

Review fixes

5f744e3

review fixes

32152a3

chrischdi force-pushed the pr-test-rollout-health branch from ad2be97 to 32152a3 Compare September 10, 2024 13:40

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 10, 2024

sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 10, 2024

sbueringer reviewed Sep 10, 2024

View reviewed changes

review fixes

0270bfc

sbueringer reviewed Sep 12, 2024

View reviewed changes

test/e2e/cluster_upgrade_test.go Outdated Show resolved Hide resolved

test/e2e/cluster_upgrade_test.go Show resolved Hide resolved

review fix

b8c5953

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2024

sbueringer mentioned this pull request Sep 12, 2024

🌱 condition: fix godoc for MachineNodeHealthyCondition #11178

Merged

fabriziopandini reviewed Sep 19, 2024

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 19, 2024

k8s-ci-robot merged commit 8a634a7 into kubernetes-sigs:main Sep 19, 2024
19 checks passed

k8s-ci-robot added this to the v1.9 milestone Sep 19, 2024

chrischdi mentioned this pull request Sep 26, 2024

Improve validation during upgrade e2e tests #10956

Closed

sbueringer mentioned this pull request Oct 18, 2024

[release-1.8] 🌱 test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec #11303

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌱 test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec #11145

🌱 test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec #11145

chrischdi commented Sep 6, 2024

chrischdi commented Sep 6, 2024

k8s-ci-robot commented Sep 6, 2024

chrischdi commented Sep 6, 2024

chrischdi commented Sep 9, 2024

fabriziopandini left a comment

fabriziopandini Sep 9, 2024

sbueringer Sep 9, 2024

fabriziopandini Sep 9, 2024

sbueringer Sep 9, 2024 •

edited

Loading

chrischdi Sep 9, 2024

fabriziopandini Sep 10, 2024

sbueringer Sep 10, 2024

sbueringer Sep 10, 2024

fabriziopandini Sep 9, 2024

chrischdi commented Sep 10, 2024

chrischdi commented Sep 10, 2024

chrischdi commented Sep 11, 2024

sbueringer left a comment

sbueringer commented Sep 12, 2024

chrischdi commented Sep 12, 2024

chrischdi commented Sep 12, 2024

sbueringer commented Sep 12, 2024

k8s-ci-robot commented Sep 12, 2024

fabriziopandini left a comment

k8s-ci-robot commented Sep 19, 2024

sbueringer commented Oct 18, 2024

sbueringer commented Oct 18, 2024

k8s-infra-cherrypick-robot commented Oct 18, 2024

sbueringer commented Oct 18, 2024

🌱 test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec #11145

🌱 test: add PreWaitForControlplaneToBeUpgraded to ClusterUpgradeConformanceSpec #11145

Conversation

chrischdi commented Sep 6, 2024

chrischdi commented Sep 6, 2024

k8s-ci-robot commented Sep 6, 2024

chrischdi commented Sep 6, 2024

chrischdi commented Sep 9, 2024

fabriziopandini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrischdi commented Sep 10, 2024

chrischdi commented Sep 10, 2024

chrischdi commented Sep 11, 2024

sbueringer left a comment

Choose a reason for hiding this comment

sbueringer commented Sep 12, 2024

chrischdi commented Sep 12, 2024

chrischdi commented Sep 12, 2024

sbueringer commented Sep 12, 2024

k8s-ci-robot commented Sep 12, 2024

fabriziopandini left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 19, 2024

sbueringer commented Oct 18, 2024

sbueringer commented Oct 18, 2024

k8s-infra-cherrypick-robot commented Oct 18, 2024

sbueringer commented Oct 18, 2024

sbueringer Sep 9, 2024 •

edited

Loading