Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use kubernetes provider to manage aws auth #355

Conversation

stijndehaes
Copy link
Contributor

PR o'clock

Description

This changes the aws auth configmap to be managed by the kubernetes provider. This is a breaking change as already created aws auth configmaps will have to be moved into the terraform state.

Checklist

  • terraform fmt and terraform validate both work from the root and examples/eks_test_fixture directories (look in CI for an example)
  • Tests for the changes have been added and passing (for bug fixes/features)
  • Test results are pasted in this PR (in lieu of CI)
  • I've added my change to CHANGELOG.md
  • Any breaking changes are highlighted above

@max-rocket-internet
Copy link
Contributor

Interesting!

Previously this was avoided because there was no depends_on for providers.

How does this work now? How does it work with 5 separate clusters defined?

@stijndehaes
Copy link
Contributor Author

I actually never tested that. We always only have one cluster per terraform state. I will try to test this somewhere this week. I guess you should be able to do this using provider aliases: https://www.terraform.io/docs/configuration/providers.html#alias-multiple-provider-instances

But then I should add the possibility to pass one through a variable.

@stijndehaes
Copy link
Contributor Author

stijndehaes commented May 4, 2019

I have done some testing and it is very easy to do it with more then one cluster. For an example see this comment:
hashicorp/terraform#18682 (comment)

I also always thought it was not possible to have depends_on for provider. The depends_on field is still not supported. But you can pass variables in the provider depending on other resources. I got the example from here: https://www.terraform.io/docs/providers/aws/d/eks_cluster_auth.html.
And since the data objects depend on the cluster creation everything nicely waits until the whole cluster is created. Before creating resources on the cluster.

I added an example in the eks test fixtures. Is this enough or should we document this somewhere?

@simonvanderveldt
Copy link

Previously this was avoided because there was no depends_on for providers.

How does this work now? How does it work with 5 separate clusters defined?

@max-rocket-internet That's correct. We've worked around this by implementing the aws-auth part in a separate module, the module then ensures the dependency (in this case the Kubernetes/EKS cluster needs to be created/available) is met by taking for example the cluster ID as a variable.

@max-rocket-internet
Copy link
Contributor

That's correct. We've worked around this by implementing the aws-auth part in a separate module

So we would have to move the aws-auth part out of this module for it to work correctly then?

@stijndehaes
Copy link
Contributor Author

That's correct. We've worked around this by implementing the aws-auth part in a separate module

So we would have to move the aws-auth part out of this module for it to work correctly then?

You don't have to move the aws-auth part outside of this module. Terraform is smart enough to notice the provider depends on a data object, that depends on a resource and waits nicely until the resource is created. I have tested this branch with a setup I am using as a client.

@stijndehaes stijndehaes force-pushed the feature/aws-auth-k8s branch from e5d63ea to 647de9e Compare May 7, 2019 17:27
@simonvanderveldt
Copy link

So we would have to move the aws-auth part out of this module for it to work correctly then?

No, you can use a nested module for that :)

You don't have to move the aws-auth part outside of this module. Terraform is smart enough to notice the provider depends on a data object, that depends on a resource and waits nicely until the resource is created. I have tested this branch with a setup I am using as a client.

AFAIK the issue with a provider depending on resource creation that's done by another provider hasn't been fixed. See the previously mentioned hashicorp/terraform#2430
Does it work if there are no resources and you just run a terraform plan?

@stijndehaes
Copy link
Contributor Author

@simonvanderveldt A terraform plan works. You can easily try it by checking out my branch and doing a terraform plan of eks_test_fixture in the examples folder. Just to be sure I double checked and ran terraform plan locally and it works.

@max-rocket-internet
Copy link
Contributor

A terraform plan works.

Sure but you have to apply it, from scratch. Just a plan is not enough 😄

We've worked around this by implementing the aws-auth part in a separate module

OK!

AFAIK the issue with a provider depending on resource creation that's done by another provider hasn't been fixed

Exactly, that's my point in my first comment.

you can use a nested module for that :)

If that works, then cool 👍 but should be part of this PR then?

already created aws auth configmaps will have to be moved into the terraform state.

Can you provide instructions on how to do this? Does it really matter anyway as it would just be applying the same configmap data again? i.e. nothing is lost.

@stijndehaes
Copy link
Contributor Author

@max-rocket-internet No worries I tested a terraform apply from scratch as well. I just hadn't tested only the terraform plan so I double checked it. But as I said earlier I have tested the branch with a brand new terraform apply. Not of the example but of another setup where I used this module in.

If you want we could have a google hangouts chat about this PR to make communication easier?

@stijndehaes
Copy link
Contributor Author

To be 100% sure I also just tested the examples/eks_test_fixture. This one also works correctly the configmap is nicely created after the cluster has been created

@@ -28,11 +28,6 @@ output "cluster_security_group_id" {
value = "${local.cluster_security_group_id}"
}

output "config_map_aws_auth" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just keep this as an output? I'm sure someone will still want it. e.g. so they can have the file in their version control or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the configmap as output, but it is a terraform resource now. Which is not the same as the rendered yaml. But I can't think of an easy way to achieve that

@max-rocket-internet
Copy link
Contributor

Sorry for the delay!

OK let's get this merged but I'm still a little confused here. Do we have any issues at all with this PR? Here's my summary, correct me if I'm wrong:

  1. depends_on for providers issue doesn't matter because it just works
  2. Multiple clusters in the same code base works also with this PR?
  3. Changes to the configmap are applied to the cluster correctly

Apart from rebase, what do we need to do to get this merged? What else needs to be tested?

@stijndehaes
Copy link
Contributor Author

Let me check that applying the configmap from terraform when it already exists is not a problem. That way no migration is needed.

And I will look into making the output work as before. But I am not sure if that is a possibility.

For the rest I have tested this with multiple clusters, the examples folder and another setup. So I think it has been tested enough :)

I don’t have time right now but I should be able to do this somewhere near the end of the week.

@stijndehaes
Copy link
Contributor Author

Just tested applying when the aws-auth configmap already exists.
This does not work:

* kubernetes_config_map.aws_auth: configmaps "aws-auth" already exists

So we have to provide a migration path.

@dpiddockcmp
Copy link
Contributor

Fun story: providers initialized with data sources don't work with terraform import. If you merge this please add, or link to, documentation of how to use terraform import with this module.

May make the migration path interesting too.

https://www.terraform.io/docs/commands/import.html#provider-configuration

@tristanpemble
Copy link

tristanpemble commented Jun 30, 2019

hey, I started to do something similar on my own branch before noticing this PR. with Terraform 0.12 you can use for loops to avoid template generation, not sure if that would be of interest. I prefer avoiding templates where possible.

here's what I had put together if you are interested:

locals {
  aws_auth_role_arns = [
    for role in compact(concat(
      coalescelist(
        aws_iam_instance_profile.workers_launch_template_mixed.*.role,
        data.aws_iam_instance_profile.custom_worker_group_launch_template_mixed_iam_instance_profile.*.role_name,
        [""]
      ),
      coalescelist(
        aws_iam_instance_profile.workers_launch_template.*.role,
        data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile.*.role_name,
        [""]
      ),
      coalescelist(
        aws_iam_instance_profile.workers.*.role,
        data.aws_iam_instance_profile.custom_worker_group_iam_instance_profile.*.role_name,
        [""]
      )
    )) : "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${role}"
  ]
}

resource "kubernetes_config_map" "aws_auth" {
  count = var.manage_aws_auth ? 1 : 0

  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }

  data = {
    mapRoles = jsonencode(concat([
      for role_arn in local.aws_auth_role_arns : {
        rolearn  = role_arn
        username = "system:node:{{EC2PrivateDNSName}}"
        groups   = ["system:bootstrappers", "system:nodes"]
      }
    ], [
      for v in var.map_roles : {
        rolearn  = v.role_arn
        username = v.username
        groups   = [v.group]
      }
    ]))

    mapUsers = jsonencode([
      for v in var.map_users : {
        userarn  = v.user_arn
        username = v.username
        groups   = [v.group]
      }
    ])

    mapAccounts = jsonencode(var.map_accounts)
  }

  depends_on = [
    aws_eks_cluster.this
  ]
}

@Vlaaaaaaad
Copy link

@tristanpemble I really like that approach and I'd really like to see something like that merged. maybe creating a new PR with that would help?

Right now we are using Atlantis to manage cross-account EKS clusters and we're forced to set manage_aws_auth = false.

I can help with testing next week.

@pierresteiner
Copy link
Contributor

I this one still a thing? I would have loved to see that coming in the v6 release

@sppwf
Copy link
Contributor

sppwf commented Sep 25, 2019

Hi Guys,

I used this thing on post kubernetes stage - i have deployed helm tiller to prepare eks for helm app deployment.
it worked fine for me, soe we can try using kubernetes provider to manage the config map. Thats will be huge improvement, as we do not need to have kubectl and aws-iam-authenticator on box that creates the EKS cluster.

Do we have to do some additional testings or what is the status?

Thanks
Sergiu

@barryib
Copy link
Member

barryib commented Sep 26, 2019

@sppwf The problem here is described by #355 (comment) and #355 (comment)

So we need to provided a migration path for users who are already managing the aws-auth configmap with this module (so with kubectl).

@erumble
Copy link

erumble commented Sep 26, 2019

Recently I ran into an issue where having the AWS provider assume a role to create an EKS cluster would cause the local exec kubectl to fail. The local exec only had access to the env var credentials of the entity that assumed the role, not the assumed role's credentials. Because of this, I figured I'd take a stab at updating the vendored copy of this module (v6.0.1) to use the kubernetes_config_map resource. Turns out it was much easier than I though.

Here are the snippets:

# aws_auth.tf
resource "kubernetes_config_map" "aws_auth" {
  count = var.manage_aws_auth ? 1 : 0

  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }

  data = {
    mapAccounts = yamlencode(var.map_accounts)
    mapRoles    = yamlencode(concat(local.workers_group_role_maps, var.map_roles))
    mapUsers    = yamlencode(var.map_users)
  }
}
# locals.tf
locals {
  # ... other local defs

  workers_group_role_maps = [
    for name in distinct(concat(
      aws_iam_instance_profile.workers_launch_template.*.role,
      data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile.*.role_name,
      aws_iam_instance_profile.workers.*.role,
      data.aws_iam_instance_profile.custom_worker_group_iam_instance_profile.*.role_name,
      )) : {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${name}"
      username = "system:node:{{EC2PrivateDNSName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes",
      ]
    }
  ]
}

Still doesn't address the migration issue though.

name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is inconsistency in the usage of the aws_eks_cluster_auth.
The first time runs perfectly, but the subsequents run it fails with unathorized.
My guess here is that once Terraform stored the token data in data, is already expired for the next runs. I couldn't find a way to force the token update on each run.
I've opened this issue: hashicorp/terraform-provider-aws#10362

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that it is not working in that context? We faced the same problem and it was caused by the depends_on in the "aws_eks_cluster_auth" - which is present in your sample, but not this code.

If the dependency is induced by referencing the output of the module it seems to work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem wasn't the depends_on it was that the cluster name must be got from the resource aws_eks_cluster, and not from a var or local.

@barryib
Copy link
Member

barryib commented Oct 25, 2019

I would love to see this merged. How can we move forward on this PR ? How can I help ?

@mzupan
Copy link

mzupan commented Oct 31, 2019

This would be a big help for those of us in terraform cloud that want to manage aws-auth configmap

@wperron
Copy link

wperron commented Nov 1, 2019

Maybe I'm missing something but it doesn't seem to work for me. I was trying to do something similar with my configuration (in my case, setting a k8s service account) and found this PR. I tried setting up the provider with the aws_eks_cluster_auth data block just like the proposed change and it failed. I then tried to simply apply the configuration as shown in the PR using the fork as the source for the module and it still crashed. This is how I set my config:

data "aws_eks_cluster" "this" {
    name = module.tools_cluster.cluster_id
}

data "aws_eks_cluster_auth" "this" {
    name = module.tools_cluster.cluster_id
}

provider "kubernetes" {
    host                   = data.aws_eks_cluster.this.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.this.certificate_authority.0.data)
    token                  = data.aws_eks_cluster_auth.this.token
    load_config_file       = false
    alias                  = "my_alias"
    version                = "~> 1.9"
}

module "eks" {
    # source        = "terraform-aws-modules/eks/aws"
    source        = "git@github.com:stijndehaes/terraform-aws-eks.git?ref=feature/aws-auth-k8s"
    # version       = "7.0.0"
    cluster_name  = "foobar"
    vpc_id        = module.vpc.vpc_id
    subnets       = module.vpc.private_subnets
    worker_groups = [
        {
            instance_type        = "t3.medium"
            asg_max_size         = 6
            asg_min_size         = 3
            asg_desired_capacity = 3
            autoscaling_enabled  = true
        }
    ]

    tags = local.aws_tags

    providers = {
        "kubernetes" = "kubernetes.my_alias"
    }
}

When the apply gets to the Kubernetes resources I get this error:

Error: Post https://<cluster-id>.gr7.us-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps: net/http: TLS handshake timeout

  on .terraform/modules/my-cluster/aws_auth.tf line 35, in resource "kubernetes_config_map" "aws_auth":
  35: resource "kubernetes_config_map" "aws_auth" {

@CarpathianUA
Copy link

@wperron,
By reading your code snippet, it looks like you gather a certificate from a provisioned EKS cluster via aws_eks_cluster data source and then try to use is for kubernetes provider?

@stijndehaes stijndehaes force-pushed the feature/aws-auth-k8s branch 2 times, most recently from f85fa4a to 03a3686 Compare November 11, 2019 15:22
This commit changes the way aws auth is managed. Before a local file
was used the generate the template and a null resource to apply it. This
is now switched to the terraform kubernetes provider.
@stijndehaes
Copy link
Contributor Author

@max-rocket-internet I added some documentation on the breaking change and how to upgrade. It is very short but should contain all information. Also rebased and retested the PR everything is working. I used the basic example to test everything.

@max-rocket-internet
Copy link
Contributor

Really sorry for delays. Time is money 💸

Anyway, I've tested this and it works great! Love it. I imported the configmap like this:

terraform import module.cluster1.kubernetes_config_map.aws_auth[0] kube-system/aws-auth

And it worked perfectly. I also tested adding and removing a user. All good 🎉

@@ -18,9 +18,29 @@ Read the [AWS docs on EKS to get connected to the k8s dashboard](https://docs.aw

## Usage example

A full example leveraging other community modules is contained in the [examples/basic directory](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/basic). Here's the gist of using it via the Terraform registry:
A full example leveraging other community modules is contained in the [examples/basic directory](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/basic).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this line

@@ -18,9 +18,29 @@ Read the [AWS docs on EKS to get connected to the k8s dashboard](https://docs.aw

## Usage example

A full example leveraging other community modules is contained in the [examples/basic directory](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/basic). Here's the gist of using it via the Terraform registry:
A full example leveraging other community modules is contained in the [examples/basic directory](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/basic).
Please do not forget to set the provider to the EKS cluster. This is needed to provision the aws_auth configmap in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think we remove all of this and put it in changelog. If we start a precedence of putting change related notes in the README it will get messy quite fast.

@@ -0,0 +1,14 @@
# Upgrading from version <= 7.x to 8.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we move these notes to the changelog as this will be the first place people look for upgrade notes.

Historically we haven't written much more than breaking for things like this but I think we should start writing more detail in the changelog and this change is the perfect example 😃

@@ -16,6 +16,7 @@ project adheres to [Semantic Versioning](http://semver.org/).
### Changed

- Updated instance_profile_names and instance_profile_arns outputs to also consider launch template as well as asg (by @ankitwal)
- **Breaking:** Configure the aws-auth configmap using the terraform kubernetes providers. Read the [docs](docs/upgrading-to-aws-auth-kubernetes-provider.md) for more info (by @sdehaes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a new subheading under here like ### Notes and detail the import and other notes.

@max-rocket-internet
Copy link
Contributor

I requested a few cosmetic changes just around where to place the upgrade notes. Let's resolve the conflicts and then we can merge ASAP 🙂

Thanks again @stijndehaes for your work on this, it's super appreciated 🙏

@max-rocket-internet
Copy link
Contributor

Actually, in the interest of moving things forward quicker, I'll just merge this and fix the docs myself 🚀

@max-rocket-internet max-rocket-internet merged commit 9363662 into terraform-aws-modules:master Nov 28, 2019
@ivanich
Copy link
Contributor

ivanich commented Dec 2, 2019

With this change I'm getting
Error: Post https://FGFFFFFFFJDJKDNDHEJFBF.yl4.us-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps: dial tcp xx.xx.xx.xx:443: i/o timeout on .terraform/modules/eks/aws_auth.tf line 45, in resource "kubernetes_config_map" "aws_auth": 45: resource "kubernetes_config_map" "aws_auth" {
Are you guys sure it was a good idea to merge this PR?

@Vlaaaaaaad
Copy link

As an aside, I am using the Terraform Kubernetes provider to manage some CMs for like... very easy service discovery and I can confirm that it’s often timing out( cluster not yet ready, DNS not yet propagated, and so on) or failing with unauthorized( no idea why). A second plan and apply usually works.

@max-rocket-internet
Copy link
Contributor

Interesting.

Whether you are using kubectl like before or the new provider config that is now merged, it wouldn't make any difference if you can't connect to the API endpoint or if DNS is not propagated.

BUT before there was for i in seq 1 10; do xxx so it would retry 10 times. Perhaps that's the issue here?

@ivanich
Copy link
Contributor

ivanich commented Dec 2, 2019

BUT before there was for i in seq 1 10; do xxx so it would retry 10 times. Perhaps that's the issue here?

I'm sure it is, and with this PR we can't do any retries anymore, in my opinion previous implementation was ugly but it worked as it should.

@max-rocket-internet
Copy link
Contributor

@rchernobelskiy
Copy link

Yeah we often observe eks master timeouts for a minute or two after creation

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.