Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve managed node group bootstrap #1433

Closed
wants to merge 2 commits into from
Closed

feat: Improve managed node group bootstrap #1433

wants to merge 2 commits into from

Conversation

stevehipwell
Copy link
Contributor

@stevehipwell stevehipwell commented Jun 8, 2021

PR o'clock

Description

This PR could be replaced by #1577.

The managed node group bootstrap pattern has been modified to use the environment variables already defined in the bootstrap.sh script sourced from a new /etc/profile.d/bootstrap.sh file. The script still needs to be modified to use bootstrap.sh but this pattern can support additional bootstrap options not currently directly supported and other patterns (such as dynamic labels from instance metadata).

I've enabled support for managed node group --container-runtime via container_runtime and --use-max-pods via use_max_pods (use max pods can already be set via extra_bootstrap_args for un-managed node groups).

Checklist

@stevehipwell
Copy link
Contributor Author

@ArchiFleKs

@ArchiFleKs
Copy link
Contributor

@stevehipwell I've never tried it this way, but it seems way cleaner :) Did you manage to test the extra args also (like taint etc) ?

@stevehipwell
Copy link
Contributor Author

@ArchiFleKs I think native taint support has been merged ready to be released, which other arguments were you thinking of? I've tested an additional pre_userdata that sets the subnet-id in the /etc/profile.d/bootstrap.sh file and uses it in a custom label passed in to the module via kubelet_extra_args = "--node-labels=subnet=$${BOOTSTRAP_SUBNET_ID}".

@ArchiFleKs
Copy link
Contributor

Yes you're right. Nice.

@stevehipwell
Copy link
Contributor Author

@barryib could you take a look this would be a good to release with #1424.

@stevehipwell
Copy link
Contributor Author

@barryib could I get some feedback on this?

@stevehipwell
Copy link
Contributor Author

stevehipwell commented Jul 20, 2021

@barryib could you take a look a this? The pattern can be re-used with the new containerd support (via CONTAINER_RUNTIME="containerd") and I'd be happy to add this to the PR if you'd like. See #1433 (comment).

@stevehipwell
Copy link
Contributor Author

@barryib I've added support for --container-runtime now as I'd like to use this. Could you take a look at this PR?

@ArchiFleKs
Copy link
Contributor

Any chance of having this release soon ?

@stevehipwell
Copy link
Contributor Author

@ArchiFleKs I'll rebase this shortly.

@daroga0002
Copy link
Contributor

this looks be a breaking change as user_data is changing

Probably we will need consolidate a breaking changes PRs and make some major release as there is currently a few in queue

@stevehipwell
Copy link
Contributor Author

@daroga0002 I'm not sure about where you're seeing the breaking change? I'd classify this as a feature update as the public API is being added to but no existing behaviour will change.

@daroga0002
Copy link
Contributor

maybe I put this in bad words, behaviour will not change so it will not break, but it will cause that everybody using module and updating it will be impacted as launch templates will be updated.

Does you maybe tested this how it will impact cluster created before this change and running apply after those changes? Does node groups will be recreated or this will just silently update a launch template?

@stevehipwell
Copy link
Contributor Author

@daroga0002 the launch templates will change, but that's not a breaking change in relation to SemVer and the module. I'd expect the launch templates to change on a major or minor module upgrade and if I didn't want additional churn I'd just delay taking the changes until the AMI was updated.

@ArchiFleKs
Copy link
Contributor

ArchiFleKs commented Sep 2, 2021

This PR is very useful to use new max pod feature in VPC CNI 1.9

@stevehipwell
Copy link
Contributor Author

This PR is very useful to use new max pod feature in VPC CNI 1.9

@ArchiFleKs that's one of the things I'm already using this pattern for, but it's not only useful for the new prefix behaviour but also the existing custom networking.

@ArchiFleKs
Copy link
Contributor

This PR is very useful to use new max pod feature in VPC CNI 1.9

@ArchiFleKs that's one of the things I'm already using this pattern for, but it's not only useful for the new prefix behaviour but also the existing custom networking.

Been using it for a couple of weeks with a fork on ARM micro with new network prefix and containerd, working like a charm, would love to see this merged

@@ -9,8 +9,10 @@ data "cloudinit_config" "workers_userdata" {
content_type = "text/x-shellscript"
content = templatefile("${path.module}/templates/userdata.sh.tpl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this userdata keeps coming up often, maybe its time to stop adding more options here and just have a generic "bring your own userdata script" option. so either you can use the standard userdata script provided here which only supports pre_userdata and kubelet_extra_args, or you can provide your own userdata script that it will use instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daroga0002 thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly was thinking about same, we have already few PRs which are changing/modifying. I think maybe we should:

  1. make some basic template
  2. allow overwrite whole user_data (leaving templating to users outside module)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryantbiggs & @daroga0002 I disagree, this PR solves a lot of the need for regular changes. It allows important parameters to be captured as strongly typed variables (container_runtime, use_max_pods &kubelet_extra_args) and enables any additional userdata via pre_userdata which can now also interact with the AMI boostrap.sh script by adding variables to the new /etc/profile.d/bootstrap.sh file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevehipwell it solves the here and now for yourself and maybe a handful of others, but it doesn't solve for tomorrow when someone else wants to add a new piece of functionality, and the next user, and the next ... etc.

this approach is common amongst our modules - we try to provide a good, default "out of box" experience that will hopefully cover the 80% of use cases, but the number of possible combinations and customizations can explode very quickly to meet individual use cases. we see this most often with IAM roles and policies and we have used this pattern successfully - provide a baseline default IAM role/policy that makes provisioning the module easy and seamless, but also provide the ability to override the role/policy that the module uses with one provided by the user of the module

we use this pattern in this module already as well:
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/local.tf#L3-L7

also, we are looking for ways to reduce the complexity of this module and abstract away some of these niche configurations. if we can do this, it allows for more focus on larger initiatives within the project and more agility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryantbiggs the current design needs the user data to be templated to support some very common use cases and this PR extends this capability to some additional very common use cases. If you wish to change this you'd need to release a breaking change and significantly reduce capability for the 80% use case.

@daroga0002 variables need to be persisted or else they don't work between init steps, this is why I've designed the user data in this way for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryantbiggs @daroga0002 the pattern in this PR simplifies both the implementation and maintenance of customising /etc/eks/bootstrap.sh. It is now easy to add new managed values, such as I've done here for container_runtime and use_max_pods, by updating the initial creation of /etc/profile.d/eks-bootstrap.sh. It's also easy for end users to set additional environment variables by appending to this file.

As there is nothing happening besides linking to /etc/profile.d/eks-bootstrap.sh from /etc/eks/bootstrap.sh before the pre_userdata is templated there is no reason to add a second variable. A power user could always clear /etc/profile.d/eks-bootstrap.sh from pre_userdata so there is no reason to even need to provide a userdata override variable.

An example of what this PR offers end-users can be seen by the following snippet setting an environment variable in /etc/eks/bootstrap.sh.

node_group = {
  pre_userdata = <<-EOT
    printf 'PAUSE_CONTAINER_VERSION="%s"
    ' "3.1-eksbuild.1"  >> /etc/profile.d/eks-bootstrap.sh
  EOT
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like sourcing source /etc/profile.d/eks-bootstrap.sh into bootstrap, but still I dont think that we dont need a seperate variables to customize bootstrap script.

Currently we have https://github.com/awslabs/amazon-eks-ami/blob/548b1650e63f3ab582d79418c7b3336943642fd1/files/bootstrap.sh#L108-L118

so I am still pushing to make this more generic and allow to push any variable into bootstrap than create variables in TF code for each of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daroga0002 we already have a strongly typed variable for KUBELET_EXTRA_ARGS so for consistency the approach in this PR makes sense as the alternative would be to document multiple ways of setting KUBELET_EXTRA_ARGS; which after further thought I could do pretty easily. If we're happy with a breaking change then it's even simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daroga0002 I've created an alternate PR #1577 to use a more generic pattern. The new PR still needs testing but I'm happy to close this PR in favour of the new one if that's the agreed direction.

@stevehipwell
Copy link
Contributor Author

@bryantbiggs @daroga0002 @antonbabenko could you take another look at this and the comments above? I really think that this PR makes the managed node group module more generic while supporting important parameters in a strongly typed way.

@stevehipwell
Copy link
Contributor Author

Closed in favour of #1577.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants