-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle pods that have been evicted via the eviction API. #1560
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KubernetesPipelineTest.podDeadlineExceeded
and other failures. Might need to convert some text literals to constants. (Messages
is not so useful since localized text in build logs does not really work.)
.filter(c -> "EvictionByEvictionAPI".equals(c.getReason())) | ||
.findFirst(); | ||
if (evictionCondition.isPresent()) { | ||
sb.append(" Pod was evicted by the Kubernetes Eviction API."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the Kubernetes Eviction API does not really add much here. Maybe explain that the node might have been drained, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No way to know precisely from this single condition, drain is only one of the root causes that could lead to an eviction
The behaviour doesn't seem fully consistent, sometimes by the time |
(only one actual timeout, causing other tests to fail) |
Double-checked expected behaviour for evicted pods and they should not be deleted automatically, so this may be a race condition with another part of the plugin. Will check whether I can make it behave as I want for the test. |
Not sure if a bug, but I'm running Message is displayed correctly but no retry was performed. And i'm not sure why
I'm running declarative pipeline
I it instead a Kubernetes agent. Labels are added using the Thanks |
Looks like a bug with |
Thanks I will check this. I was also confused to not see the pod template unique label on the logs In the meanwhile I'm wondering if using the retry at the option level will catch this case
|
I did some tests using on my OKD cluster
And by draining the node
because of if (terminationReasons.stream().anyMatch(IGNORED_CONTAINER_TERMINATION_REASONS::contains)) {
listener.getLogger()
.println("Ignored termination reason(s) for " + node + " for purposes of retry: "
+ terminationReasons);
return false;
} and Looking also at the return code, not all containers return same status code
One fix would be to return directly true if the |
Ah, nice catch. I think the condition should be changed to ignore if it contains only ignored reasons. |
I've open a PR here : #1582 |
Adds an explicit message about an agent pod that has been evicted using the eviction API (cf. https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions)
Adds a test that is verifying retry behaviour when an agent pod has been evicted (such as what happens if it runs on a kubernetes node that gets drained)
Testing done
Submitter checklist