-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: parallel scale up #5731
Feature: parallel scale up #5731
Conversation
Welcome @pmendelski! |
c4e76d5
to
9342963
Compare
If you're reviewing this PR, please take a look at this test. I wrote it to learn more about autoscaler but the result seems a bit strange to me. |
}, | ||
expectedErr: errors.NewAutoscalerError( | ||
errors.CloudProviderError, | ||
"A ...other concurrent errors: [\"[cloudProviderError] B\", \"[internalError] A\"]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems non-intuitive that we skip the type (CloudProviderError
) for the first error A
, but print it for other errors.
The ...
looks rather artificial.
Also this message is losing the information on which node group we encountered the given error, which I think is quite valuable, especially when debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems non-intuitive that we skip the type (CloudProviderError) for the first error A, but print it for other errors.
It's not that first error is skipped. Errors are:
- deduplicated - to keep the message short and increase a chance to fit the log length limit.
- sorted - the error order is not meaningful in concurrent execution. Making them sorted makes log aggregation (and testing) easier.
I could go with a dedicated error that aggregates all others but it would require a rather big change in API and error handlers.
The ... looks rather artificial.
What do you propose?
Also this message is losing the information on which node group we encountered the given error, which I think is quite valuable, especially when debugging.
The information about that maps error to a failed node group scale-up is preserved in logs and clusterStateRegistry
|
||
// Asserts all groups are scaled only once. | ||
// Scaling one group multiple times concurrently may cause problems. | ||
func assertUniqueNodeGroups(scaleUpInfos []nodegroupset.ScaleUpInfo) errors.AutoscalerError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we usually use assert
in context of unit tests, I would use check
or something similar in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
} | ||
return errA.Type() < errB.Type() | ||
}) | ||
firstErrMsg := errs[0].Error() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we make two map[string]bool
one having types and second messages?
This way len(typesMap) > 1
should be equivalent to singleType
variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right but we would simply replace two booleans with two maps, making it a bit less optimal. Is it a readability issue?
availableGPUTypes map[string]struct{}, | ||
now time.Time, | ||
) errors.AutoscalerError { | ||
gpuConfig := e.autoscalingContext.CloudProvider.GetNodeGpuConfig(nodeInfo.Node()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function call a platitude of other objects, did we check that all of them support concurrent calls?
I did my own quick check and didn't found any obvious issues, but wanted to confirm as any potential issues here will be hard to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point but the more critical line is info.Group.IncreaseSize(increase) as it modifies the state. I did a thorough analysis and finished with a conclusion that not all cloudProviders are ready for concurrent operations. That's why it's a flag and the decision is left to the user. Although I've just added a warning message to flag description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! Left a couple of comments.
kube_client "k8s.io/client-go/kubernetes" | ||
kube_record "k8s.io/client-go/tools/record" | ||
schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework" | ||
|
||
"k8s.io/autoscaler/cluster-autoscaler/utils/backoff" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would make sense to group all k8s.io/autoscaler/cluster-autoscaler
imports together. In general, I suggest the following order:
- core libs (sorted lexicographically)
- 1 line space
- imports internal to the project (sorted lexicographically)
- 1 line space
- external imports (sorted lexicographically)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
return &option | ||
} | ||
} | ||
assert.Fail(r.t, "did not find expansionOptionToChoose %s", r.optionToChoose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: %v (or even %+v) will give a better format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
} | ||
|
||
// NewMockRepotingStrategy creates an expander strategy with reporting and mocking capabilities. | ||
func NewMockRepotingStrategy(t *testing.T, optionToChoose *GroupSizeChange) *MockReportingStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd consider passing optionToChoose
by value - we're not comparing pointers and the struct is quite small, so there won't be a lot of overhead anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but it's passed by reference because it can be null. If you don't care about the optionToChoose
, one will be picked according to the default strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, let's leave as is then.
return nil | ||
} | ||
|
||
func combineConcurrentScaleUpErrors(errs []errors.AutoscalerError) errors.AutoscalerError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite generic, WDYT about moving it to errors
package and just calling Combine
? I don't think there's anything scale up specific here, apart from the current name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it as well. I didn't go that path because:
- this function is dedicated for combining concurrent errors (errors are deduplicated and sorted)
- it's based on an assumption that in this particular place we're ok with loosing some information about aggregated errors (as it's reported in logs and state). Thanks to this we omit changes in API.
...so there are some assumptions we're aware of. Having them in errors
package could make this function the default way. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is something that could be addressed by documenting in the comment and perhaps reflecting better in the function name (Aggregate
? Coalesce
? Naming is hard...) Leaving up to you though.
|
||
// Asserts all groups are scaled only once. | ||
// Scaling one group multiple times concurrently may cause problems. | ||
func assertUniqueNodeGroups(scaleUpInfos []nodegroupset.ScaleUpInfo) errors.AutoscalerError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, but I wonder if it would make sense to enforce this at compile time by creating type ScaleUpInfos map[string]ScaleUpInfo
and using that instead? Obviously that's yet another refactor, but hopefully not a huge one (perhaps could be carved out into its own commit though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, what stands in a way of putting the same object under different keys? 😉
We would still require some validation, therefore I left the array.
We could introduce a hashset type (with group.id as the hash key), but it felt like to much for this particular usage. As the array size is guranteed to be small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setters do: the key can be derived from group in a setter function. Agreed it might be an overkill, so not blocking on this.
} | ||
var builder strings.Builder | ||
builder.WriteString(firstErrMsg) | ||
builder.WriteString(" ...other concurrent errors: [") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: s/...other/... and other/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
213e649
to
7ee8070
Compare
cluster-autoscaler/main.go
Outdated
@@ -147,7 +147,8 @@ var ( | |||
maxGracefulTerminationFlag = flag.Int("max-graceful-termination-sec", 10*60, "Maximum number of seconds CA waits for pod termination when trying to scale down a node.") | |||
maxTotalUnreadyPercentage = flag.Float64("max-total-unready-percentage", 45, "Maximum percentage of unready nodes in the cluster. After this is exceeded, CA halts operations") | |||
okTotalUnreadyCount = flag.Int("ok-total-unready-count", 3, "Number of allowed unready nodes, irrespective of max-total-unready-percentage") | |||
scaleUpFromZero = flag.Bool("scale-up-from-zero", true, "Should CA scale up when there 0 ready nodes.") | |||
scaleUpFromZero = flag.Bool("scale-up-from-zero", true, "Should CA scale up when there are 0 ready nodes.") | |||
parallelScaleUp = flag.Bool("parallel-scale-up", false, "Whether to allow parallel node groups scale up. Make sure you cloud provider supports parallel operations.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "you cloud provider" -> "your cloud provider"
Also, as a user, how do I do that? This seems a bit non-actionable, I'd rather just mark it as "experimental: may not work on some cloud providers, enable at your own risk" or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Your message is much better
e6c4ea0
to
1704ec6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment otherwise LGTM
builder.WriteString(firstErr.Error()) | ||
builder.WriteString(" ...and other concurrent errors: [") | ||
formattedErrs := make(map[errors.AutoscalerError]bool) | ||
formattedErrs[firstErr] = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write it as one operation:
formattedErrs := map[errors.AutoscalerError]bool{
firstErr: true,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
1704ec6
to
51f1660
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pmendelski, x13n The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I tried to use that but it seems it always ends up over-scaling when using 1 asg per az. |
was unable to reproduce, so must have been a 1-off 👍 |
What type of PR is this?
/kind feature
/kind cleanup
What this PR does / why we need it:
This PR introduces an optional flag
--parallel-scale-up=true/false
that makes the scale-up process parallel. Thanks to this change, the scale-up action across multiple node groups is faster.The default behavior is unchanged - node groups are scaled synchronously, one by one.
Special notes for your reviewer:
Does this PR introduce a user-facing change?
/assign @x13n