-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 Add Node related condition to Machine conditions #3670
🌱 Add Node related condition to Machine conditions #3670
Conversation
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass
6a2d6b7
to
eb2fff6
Compare
Unit test fails because CacheTracker panics in the tests that doesn't use testEnv. Probably will need to use testEnv for the tests using CacheTracker. |
eb2fff6
to
e2c6e86
Compare
/milestone v0.4.0 |
995c571
to
663a3eb
Compare
/test pull-cluster-api-test |
42b6249
to
52b6450
Compare
This was ready to go, just solved conflicts. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to take a bit more time to review but it seems we're potentially changing some behaviors that we should first chat about
} | ||
conditions.MarkFalse(m, clusterv1.MachineNodeHealthyCondition, clusterv1.DeletingReason, clusterv1.ConditionSeverityInfo, "") | ||
if err := patchMachine(ctx, patchHelper, m); err != nil { | ||
conditions.MarkFalse(m, clusterv1.MachineNodeHealthyCondition, clusterv1.DeletionFailedReason, clusterv1.ConditionSeverityInfo, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to fail as well if the patch doesn't go through above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Machines are patched in Reconcile
after returning from reconcileDelete
, so setting this in case it won't fail there.
node, err := r.getNode(remoteClient, providerID) | ||
if err != nil { | ||
if err == ErrNodeNotFound { | ||
// While a NodeRef is set in the status, failing to get that node means the node is deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it's a transient error? The node might not be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are setting the condition and reconciling, if it is transient it will be fixed right away. Otherwise how to detect the transient error?
if err != nil { | ||
if err == ErrNodeNotFound { | ||
// While a NodeRef is set in the status, failing to get that node means the node is deleted. | ||
// If Status.NodeRef is not set before, node still can be in the provisioning state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a little confusing, don't we set the node reference a few lines down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR we were returning directly if the NodeRef is set.
if machine.Status.NodeRef != nil { |
Removed that check to be able to check the health of the node.
And here, if node cannot be found when the NodeRef is already assigned, that means the node is deleted. But if NodeRef is not yet set, that means there was never a node before as NodeRef is nil, and hence it is not a NodeNotFoundReason
issue.
Kind: node.Kind, | ||
APIVersion: node.APIVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these properly populated? Usually they're not, and they need to be filled in manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is same with what was used before:
cluster-api/controllers/machine_controller_noderef.go
Lines 103 to 107 in fe28e9d
return &apicorev1.ObjectReference{ | |
Kind: node.Kind, | |
APIVersion: node.APIVersion, | |
Name: node.Name, | |
UID: node.UID, |
I had renamed cc @vincepri |
@sedefsavas: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/hold cancel |
@sedefsavas if you squash commits, then this one can be merged |
MachineNodeHealthyCondition ConditionType = "NodeHealthy" | ||
|
||
// WaitingForNodeRefReason (Severity=Info) documents a machine.spec.providerId is not assigned yet. | ||
WaitingForNodeRefReason = "WaitingForNodeRef" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: what about
WaitingForNodeRefReason = "WaitingForNodeRef" | |
WaitingForNodeProvisioningReason = "WaitingForNodeProvisioning" |
NodeRef is a bit too technical for a user that does not know CAPI internals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a NodeProvisioningReason
, I think it will be confusing.
WaitingForNodeRefReason
is waiting for machine.Spec.ProviderID
and
NodeProvisioningReason
is waiting for machine.Status.NodeRef
when there is no node found.
394b4bb
to
9506423
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What this PR does / why we need it:
This PR adds a condition for Node health to Machine conditions that will be managed by Machine controller.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes # #3138