-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce RequestEviction feature for evicting pods in background (KEP-1397) #1466
Introduce RequestEviction feature for evicting pods in background (KEP-1397) #1466
Conversation
1c05dac
to
d022bf4
Compare
Compared to virt-handler produced by my local kind these are the logs produced by the upstream CI:
Could be kubevirt does not like combination of running kind in pod with dind enabled. |
3210921
to
90a6b55
Compare
/retest-required |
@@ -45,6 +50,10 @@ type DeschedulerServer struct { | |||
SecureServing *apiserveroptions.SecureServingOptionsWithLoopback | |||
DisableMetrics bool | |||
EnableHTTP2 bool | |||
// FeatureGates enabled by the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, the approach used by upstream is via ComponentGlobalsRegistry. Might be good to consider if it makes sense to introduce that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not applicable here as one may create multiple descheduler instances for simulation purposes.
eventRecorder, | ||
podInformer, | ||
rs.DefaultFeatureGates, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to pass the feature gates as an argument everywhere? Compared to the global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was keeping in mind the case where the descheduling framework is run as a simulator. E.g. in case two or more framework instances are created in the same run/binary.
return | ||
} | ||
// Ignore any pod that does not have eviction in progress | ||
if _, exists := newPod.Annotations[EvictionInProgressAnnotationKey]; !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we get to this line and delete pods that lost their annotation, when we do
if newPod.Annotations == nil {
return
}
// Ignore pod's that are not subject to an eviction in background
if _, exists := newPod.Annotations[EvictionRequestAnnotationKey]; !exists {
return
}
just above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a pod looses its annotation there's no way of knowing whether the annotation was removed by accident or intentionally. E.g. a live migration is no longer needed since some scoring mechanism decided it's more costly to perform a live migration than to drop VM's state and re-create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, shouldn't we remove the check above, or move this check there instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I misunderstood. So it can happen a pod might still have an eviction in background correctly finished. Yet, if EvictionRequestAnnotationKey
is removed (e.g. by accident or on purpose) the pod is no longer considered for further processing. EvictionRequestAnnotationKey
is the referential one. We might still check for EvictionInProgressAnnotationKey
annotation as well just in case there's one. Yet, some implementations might decide to just remove EvictionRequestAnnotationKey
annotation to signal an eviction in background was aborted/canceled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet, if EvictionRequestAnnotationKey is removed (e.g. by accident or on purpose) the pod is no longer considered for further processing.
I am not sure if I see all the loose ends here :D
If the pod was in the cache when the Request
annotation was removed, then we skip evicting it (thus we process the InProgress
annotation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's important is to properly remove entry from the cache
which we would not do when evictionAssumed == false
and EvictionRequestAnnotationKey
is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Updating the delete event handler to delete every pod that's in the cache: bb084d6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is much better. I still think it is unfortunate that we do not deschedule pods that had both EvictionRequestAnnotationKey
and EvictionInProgressAnnotationKey
removed at the same time. If the owner cancels the progress in this way and the pod is not terminating, then the pod will be immune to descheduling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the pod from the cache during update event just in case the pod is in the cache: 4425072
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thx!
@@ -108,6 +423,46 @@ func (pe *PodEvictor) SetClient(client clientset.Interface) { | |||
pe.client = client | |||
} | |||
|
|||
func (pe *PodEvictor) evictionRequestsTotal() uint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot what was the consensus last time, but do we want to introduce new metrics for this feature? https://github.com/kubernetes-sigs/descheduler/blob/master/keps/1397-evictions-in-background/README.md#metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do. Yet, not as part of this PR. I will have a follow up for all other changes.
ObjectMeta: metav1.ObjectMeta{ | ||
Name: fmt.Sprintf("kubevirtvmi-%v", idx), | ||
Annotations: map[string]string{ | ||
"descheduler.alpha.kubernetes.io/request-evict-only": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, the feature depends only on the key, and not on the value? Would be good to add a little a bit of the documentation above EvictionRequestAnnotationKey
and EvictionInProgressAnnotationKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. +1 for a documentation. I still have a blog post to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good. Let's not forget about the constants and update them after as well.
a2b5211
to
7d50a3b
Compare
7d50a3b
to
1ee0351
Compare
return | ||
} | ||
// Ignore any pod that does not have eviction in progress | ||
if _, exists := newPod.Annotations[EvictionInProgressAnnotationKey]; !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, shouldn't we remove the check above, or move this check there instead?
if oldPod.Annotations == nil { | ||
return | ||
} | ||
if _, exists := oldPod.Annotations[EvictionInProgressAnnotationKey]; exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can check the presence of a key on nil annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation value is currently empty. So oldPod.Annotations[EvictionInProgressAnnotationKey]
will always be empty. If, did you mean a different check?
if pod.Annotations != nil { | ||
// eviction in background requested | ||
if _, exists := pod.Annotations[EvictionRequestAnnotationKey]; exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could simplify this and other checks as well
ObjectMeta: metav1.ObjectMeta{ | ||
Name: fmt.Sprintf("kubevirtvmi-%v", idx), | ||
Annotations: map[string]string{ | ||
"descheduler.alpha.kubernetes.io/request-evict-only": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good. Let's not forget about the constants and update them after as well.
assumedRequestTimeoutSeconds uint | ||
} | ||
|
||
func newEvictionRequestsCache(assumedRequestTimeoutSeconds uint) *evictionRequestsCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to introduce unit tests for the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one in TestEvictionRequestsCacheCleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that one looks sufficient. Nevertheless from the long term perspective it would be good to test all the cache operations separated from the PodEvictor. We would not have to be concerned what operations the evictor uses and which does not - useful for example when we make changes to the cache in the future.
005f8ea
to
17d7518
Compare
assumedRequestTimeoutSeconds uint | ||
} | ||
|
||
func newEvictionRequestsCache(assumedRequestTimeoutSeconds uint) *evictionRequestsCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that one looks sufficient. Nevertheless from the long term perspective it would be good to test all the cache operations separated from the PodEvictor. We would not have to be concerned what operations the evictor uses and which does not - useful for example when we make changes to the cache in the future.
return | ||
} | ||
// Ignore any pod that does not have eviction in progress | ||
if _, exists := newPod.Annotations[EvictionInProgressAnnotationKey]; !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet, if EvictionRequestAnnotationKey is removed (e.g. by accident or on purpose) the pod is no longer considered for further processing.
I am not sure if I see all the loose ends here :D
If the pod was in the cache when the Request
annotation was removed, then we skip evicting it (thus we process the InProgress
annotation)
17d7518
to
6d40fdc
Compare
return | ||
} | ||
// Ignore any pod that does not have eviction in progress | ||
if _, exists := newPod.Annotations[EvictionInProgressAnnotationKey]; !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's important is to properly remove entry from the cache
which we would not do when evictionAssumed == false
and EvictionRequestAnnotationKey
is removed
func getPodKey(pod *v1.Pod) (string, error) { | ||
uid := string(pod.UID) | ||
if len(uid) == 0 { | ||
return "", errors.New("cannot get cache key for pod with empty UID") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error is quite troublesome. It bubbles up to so many places. Can we remove it?
The UID should be guaranteed on already created pods. Do we ever operate on pods that have not been created yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming all pods have the UID set this error will never get returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for exploring this avenue and filtering out all pods that does not have their uid set from eviction in a separate PR/improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing the error for missing uid with a panic.
9a3544d
to
4425072
Compare
18c833f
to
cfbb740
Compare
LGTM, thanks! |
Squashing all the review commits now |
When the feature is enabled each pod with descheduler.alpha.kubernetes.io/request-evict-only annotation will have the eviction API error examined for a specific error code/reason and message. If matched eviction of such a pod will be interpreted as initiation of an eviction in background.
c45c8cb
to
7d4ec60
Compare
@atiratree thank you for all your reviews and patience. This really helped. |
Implementation of #1354.
When the feature is enabled each pod with
descheduler.alpha.kubernetes.io/request-evict-only
annotation will have the eviction API error examined for a specific error code/reason and message. If matched eviction of such a pod will be interpreted as initiation of an eviction in background.The first version of the feature does not provide any way to configure a timeout for pending evictions in background. Neither a timeout for how often a pending evictions in background are checked for timeouts. Both set to 10 minutes by default. The default values can be adjusted based on provided feedback.
TODO:
EvictionsInBackground
as a new feature gate)Refactorings before this PR can be merged: