-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Price Improvement Threshold #1440
Comments
Just brainstorming: I'm curious if we might want to reason about this as |
To check my understanding of If I'm understanding that correctly, it sounds to me like it would be better placed on an individual NodeClaim spec rather than in |
This is a good point. Similar line of reasoning to TGP and expireAfter. If we model it as |
Additionally, it could potentially support fixed or percent values via
The problem with fixed value disruption costs is that karpenter has no way to predict how long the node would be around in the future to gauge if a one-time $20 cost is worth it. Maybe it could use historical data on node age to make a prediction. |
I expect drift to not take this feature into account at all. If the node needs to be replaced, it needs to be replaced regardless of cost. When a new nodeclaim is provisioned to replace each node, the usual provisioning logic should still choose the cheapest available, which could result in minor cost improvements that are below the configured threshold. |
/triage accepted |
I've marked this as accepted, but I think we definitely need to talk this out. Excited to discuss more about this |
Description
One piece of implementation for RFC #1433 (Consolidation Policies)
What problem are you trying to solve?
Make it possible to balance the cost of disrupting pods against the cost of using a more expensive instance type.
Currently, single-node consolidation will replace any instance with a cheaper version that can house the same pods, resulting in very frequent pod disruption. Disruption pods has its own cost, which for certain workloads can negate savings or result in a net loss of money.
This concept has already been documented as an improvement for spot consolidation, but I see no reason why it wouldn't be useful for generalized consolidation.
https://github.com/kubernetes-sigs/karpenter/blob/main/designs/spot-consolidation.md#2-price-improvement-factor
How important is this feature to you?
Very important. We are currently running a fork of Karpenter with single-node consolidation completely disabled due to the frequent node disruption we are seeing for small price improvements (individual pods are frequently being disrupted multiple times an hour).
The text was updated successfully, but these errors were encountered: