-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix race condition #52
Conversation
- cordon removed nodes - reattach terminated nodes
Hooray! All contributors have signed the CLA. |
@@ -303,8 +303,8 @@ func (t *CycleNodeRequestTransitioner) transitionScalingUp() (reconcile.Result, | |||
|
|||
// Check we have waited long enough - give the node some time to start up | |||
if time.Since(scaleUpStarted.Time) <= scaleUpWait { | |||
t.rm.LogEvent(t.cycleNodeRequest, "ScalingUpWaiting", "Waiting for new nodes to be ready") | |||
return reconcile.Result{Requeue: true, RequeueAfter: requeueDuration}, nil | |||
t.rm.LogEvent(t.cycleNodeRequest, "ScalingUpWaiting", "Waiting for new nodes to be warmed up") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce the number of unnecessary requeue
@@ -160,7 +160,7 @@ func (t *CycleNodeRequestTransitioner) finalReapChildren() (shouldRequeue bool, | |||
} | |||
|
|||
switch t.cycleNodeRequest.Status.Phase { | |||
case v1.CycleNodeRequestInitialised: | |||
case v1.CycleNodeRequestInitialised, v1.CycleNodeRequestFailed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Failed
CNR in infinity loop of re-queue
Change the version in the makefile to 1.8.2 |
We can wait to cut release in a separate PR to not mix this issue changes with the release cut trigger |
You are correct 🤦♂️ |
closing to investigate intermediate failure in pre-termination trigger |
- soft fail pretermination trigger and checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@vincentportella there're many other places where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
- improvement for race condition handling #52
- improvement for race condition handling #52
This PR fixes race conditions where nodes are terminated (i.e by cluster-autoscaler) while in the progress of cycling nodes, which is identify in:
Node "xxxx" not found
To simulate the senario we can insert below to each of the transition phases