-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(eks): spot support for managed nodegroups #11962
feat(eks): spot support for managed nodegroups #11962
Conversation
Title does not follow the guidelines of Conventional Commits. Please adjust title before merge. |
@iliapolo I'm ready for the first round. |
Hi @iliapolo Please check my comments before I continue revise it. Thanks. |
@pahud see my responses |
@iliapolo I am ready for the 2nd round. |
Hi @iliapolo All addressed. Please check it out again. |
)); | ||
test.done(); | ||
}, | ||
'create nodegroup with ondemand capacity type'(test: Test) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a duplicate of create nodegroup with on-demand capacity type
on line 230?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me change the name to create nodegroup with on-demand capacity type and multiple instance types
so we have two test cases:
- on-demand capacity type with one instance type in
instanceTypes
- on-demand capacity type with multiple instance types in
instanceTypes
, which is also accepted by CFN. Under the hood the CFN will create an ASG with launch template with multiple instance types but only the first one will be selected. It's not documented specificly in CFN but it's also accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pahud Wait, creating an on-demand
node group with multiple instance types will result in a launch template with just the first instance type being selected? ignoring all the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on a few minutes. Let me paste some screenshots here later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we create the MNG like this
this.cluster.addNodegroupCapacity('MNG', {
capacityType: eks.CapacityType.ON_DEMAND,
desiredSize: 10,
instanceTypes: [
new ec2.InstanceType('t3.large'),
new ec2.InstanceType('m5.large'),
new ec2.InstanceType('c5.large'),
],
})
We actually synthesize it to this
ClusterNodegroupMNGF9B82D86:
Type: AWS::EKS::Nodegroup
Properties:
ClusterName:
Ref: Cluster9EE0221C
NodeRole:
Fn::GetAtt:
- ClusterNodegroupMNGNodeGroupRole3AB840AB
- Arn
Subnets:
- subnet-05b946a306d81274b
- subnet-0ae2846156ce0118c
- subnet-063ca7ed4aacdbad7
AmiType: AL2_x86_64
ForceUpdateEnabled: true
InstanceTypes:
- t3.large
- m5.large
- c5.large
ScalingConfig:
DesiredSize: 10
MaxSize: 10
MinSize: 1
CapacityType: ON_DEMAND
And the MNG will create an ASG with LT with 100% OD + 0% Spot like:
And all the instances will be the first instance type
I believe this is expected from ASG's perspective when ASG+LT for 100% on-demand instance types, ASG will just not span multiple on-demand instance types. It's worth a note here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pahud Its not that all the other instance types are ignored, the distribution of instance types across the ASG depends on the allocation strategy, which for on-demand is prioritized
.
To my understanding this means it will try to fulfill all the capacity by trying the first instance type, but will use other types if there is still a capacity shortage and the first type is depleted.
I kind of lost sight of why we got into this discussion honestly :) - does this make a difference to us somehow?
Co-authored-by: Eli Polonsky <Eli.polonsky@gmail.com>
Hi @iliapolo All fixed. Please check it out again. |
@pahud I'm good with merging i think once the conflicts are resolved. |
Hi @iliapolo All fixed now. :-) |
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
This PR adds the `CapacityType` support and allows users to create Spot managed node groups for Amazon EKS. 1. The `CapacityType` attribute is supported by cloudformation but not yet documented. We tentatively use addPropertyOverride() to enable it. 2. `instanceType` will be deprecated and we introduced the new `instanceTypes` 3. `instanceTypes` with different CPU architectures will throw an error. 4. `amiType` is still optional, however, when specified, incorrect `amiType` will throw the error. 5. According to the [document](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html#cfn-eks-nodegroup-instancetypes), we are allowed to specify instance type(s) in either `instanceTypes` property or launch template but not both. As we can't check the content of the launch template passed in, we allow `instanceTypes` and launch template both specified and encourage to use `instanceTypes` when possible. ## Sample ```ts cluster.addNodegroupCapacity('extra-ng-spot', { instanceTypes: [ new ec2.InstanceType('c5.large'), new ec2.InstanceType('c5a.large'), new ec2.InstanceType('c5d.large'), ], minSize: 3, capacityType: eks.CapacityType.SPOT, }); ``` Closes aws#11827 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
…that is not compatible to the default instance type (#12441) > Note: both issues here were introduced in #11962 ## Problem 1 When creating a `Nodegroup` without passing instance types, we currently default to use `t3.medium`: https://github.com/aws/aws-cdk/blob/da1ed08a6a2de584f5ddf43dab4efbb530541419/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L294 This default is then used to calculate the expected AMI type, and assert that the configured AMI type is indeed as expected: https://github.com/aws/aws-cdk/blob/da1ed08a6a2de584f5ddf43dab4efbb530541419/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L302-L304 However, a user might configure instance types on the launch template, and an AMI type on the nodegroup. In this scenario, we still use the default instance type to perform the validation, which will fail if the ami type is not compatible with it. To make things worse, we don't actually use the default instance type at all, apart from the validation: https://github.com/aws/aws-cdk/blob/da1ed08a6a2de584f5ddf43dab4efbb530541419/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L329-L330 And in-fact, this default was only introduced in this [PR](#11962), which also added the problematic validation. ### Solution Drop the default instance type altogether, like it was before. The new validation will only take place if the user explicitly configured both `instanceTypes` and `amiType` on the nodegroup. Since the default value was never actually used, this doesn't incur any behavior change. ## Problem 2 When a launch template is used, we currently ignore the value of `amiType` explicitly passed by the user: https://github.com/aws/aws-cdk/blob/da1ed08a6a2de584f5ddf43dab4efbb530541419/packages/%40aws-cdk/aws-eks/lib/managed-nodegroup.ts#L324-L325 This behavior means that users who configured a launch template without a custom ami, and passing an `amiType` to the nodegroup, would now result in no ami specification at all, defaulting to whatever EKS does, which might not be what the user had in mind. There's no good reason to do this, we should either throw a validation error if both are used, or pass the explicit value nevertheless, even though it might cause problems. ### Solution When a user explicitly passes an AMI type, just use it and assume the user knows what he/she is doing. When a user does not explicitly pass it, only apply the default if a launch template is not used. > If we apply the default in the presence of a launch template, a user would not be able to escape if they also have a custom AMI in the launch template. This change means that users who previously "relied" on this override, might now experience a deployment failure if they are using a custom AMI in the launch template, those users can resolve the problem by removing the `amiType` property from the nodegroup (since it wasn't used, its not needed). I don't imagine many such users exist since this behavior is new and it doesn't make much sense to configure both a custom AMI and an `amiType`. -------------------- Fixes #12389 BREAKING CHANGE: Explicitly passing `amiType` to nodegroups will now take affect even if a launch template is configured as well. If your launch template contains a custom AMI, this will cause a deployment failure, to resolve, remove the explicit `amiType` from the nodegroup configuration. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
This PR adds the
CapacityType
support and allows users to create Spot managed node groups for Amazon EKS.CapacityType
attribute is supported by cloudformation but not yet documented. We tentatively use addPropertyOverride() to enable it.instanceType
will be deprecated and we introduced the newinstanceTypes
instanceTypes
with different CPU architectures will throw an error.amiType
is still optional, however, when specified, incorrectamiType
will throw the error.instanceTypes
property or launch template but not both. As we can't check the content of the launch template passed in, we allowinstanceTypes
and launch template both specified and encourage to useinstanceTypes
when possible.Sample
Closes #11827
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license