-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 KCP should defer remediation when a control plane machine is still provisioning #9734
🐛 KCP should defer remediation when a control plane machine is still provisioning #9734
Conversation
cc @sbueringer |
func (c *ControlPlane) HasHealthyMachineStillProvisioning() bool { | ||
return len(c.Machines.Filter(collections.Not(collections.HasUnhealthyCondition), collections.And(collections.Not(collections.HasNode())))) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, this is checking if any of the machines available don't have a node associated AND are not marked as unhealthy (yet)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
The idea is that we want to wait for all the machines to be provisioned before remediation, but we should preserve the capability to remediate a machine that fails to provision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved the filter and added unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: e39a29b97024ffb55dbd0b8f9171e4bb509f4e28
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: a92f6d44d7b9c0cafd8241663442cba9e6d8e29a
|
/test pull-cluster-api-e2e-full-main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits, otherwise lgtm
t.Run("Remediation deletes unhealthy machine failed to provision - 4 CP (during 3 CP rolling upgrade)", func(t *testing.T) { | ||
g := NewWithT(t) | ||
|
||
m1 := createMachine(ctx, g, ns.Name, "m1-unhealthy-", withMachineHealthCheckFailed(), withWaitBeforeDeleteFinalizer(), withoutNodeRef()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to clarify, does this withoutNodeRef() makes any actual difference from the test above "Remediation deletes unhealthy machine - 4 CP (during 3 CP rolling upgrade)"?
I assume they both would run the same code path because of withMachineHealthCheckFailed. I can still see the value of the test case, but just wanted to clarify that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are testing the same code path.
However, considering that this PR defers remediation in case a healthy machine is still provisioning, I figured out that having a test preventing any regression when a machine is still provisioning (withoutNodeRef), but unhealty, could help.
Thx!! /lgtm merge/approve pending squash |
LGTM label has been added. Git tree hash: 75959fb18b3255faeae3ab13540c7b013a225a0c
|
a2e25e8
to
2cb95f7
Compare
rebased + squashed |
@fabriziopandini: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
2cb95f7
to
34c09b4
Compare
Thank you! /lgtm |
LGTM label has been added. Git tree hash: 8b2d3fd3f51ed46ce9495c7b02dc06aea85a4e7a
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
With this PR KCP defers remediation when another control plane machine is still provisioning
Which issue(s) this PR fixes:
Fixes #9398
/area control-plane