Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IAM roles for AWS EKS service accounts not working #8926

Closed
serhatcetinkaya opened this issue May 5, 2020 · 11 comments
Closed

IAM roles for AWS EKS service accounts not working #8926

serhatcetinkaya opened this issue May 5, 2020 · 11 comments
Labels
bug Used to indicate a potential bug core/storage
Milestone

Comments

@serhatcetinkaya
Copy link

Describe the bug
IAM roles for AWS EKS service accounts still don't work with Vault. I was expecting the issue to be resolved after #7450 and #7738 . I suspect the problem is custom credential chain implementation in Vault.

In EKS we have dummy IAM roles with almost no permission attached on worker nodes, instead we give permissions directly to pod (either with native IAM to service account solution or with third party tools like kube2iam etc). Expecting latest Vault would work with IAM to service account solution after seeing changelog, we upgraded to 1.4.1 and saw that instead of using service account IAM role it tries to use worker node IAM role, authenticates successfully but fails due to lack of privileges.

To Reproduce
Steps to reproduce the behavior:

  1. Attach dummy IAM roles on worker nodes
  2. Create a service account for Vault and attach an IAM role for it with proper permissions (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html)
  3. Make the vault container run with service account
  4. In the logs see it tries to use worker instance role and fails

To easily reproduce use dynamodb backend and don't give any dynamodb permissions to worker node IAM role but give required permissions to IAM role that is attached to service account, it will fail immediately with following error:

vault Error initializing storage of type dynamodb: AccessDeniedException: User: arn:aws:sts::****:assumed-role/****005/***005 is not authorized to perform: dynamodb:DescribeTable on resource: arn:aws:dynamodb:us-east-1:***:table/****
 vault     status code: 400, request id: AVE****AJG
 vault stream closed

Expected behavior
I expect it to work like official SDK. With the same setup and different container when I do a call to AWS API the container uses IAM role from the service account.

Environment:
I used the official vault:1.4.1 container.

  • vault status output:
vault status
Key                      Value
---                      -----
Recovery Seal Type       shamir
Initialized              true
Sealed                   false
Total Recovery Shares    5
Threshold                3
Version                  1.4.1
Cluster Name             vault-cluster-6f587bab
Cluster ID               3d********ee
HA Enabled               true
HA Cluster               https://****:8201
HA Mode                  active
  • Vault CLI Version (retrieve with vault version): Vault v1.4.1
  • Server Operating System/Architecture:
Linux vault-0 4.14.165-133.209.amzn2.x86_64 #1 SMP Sun Feb 9 00:21:30 UTC 2020 x86_64 Linux
  • Necessary environment variables exist:
env | grep -i aws
AWS_ROLE_ARN=arn:aws:iam::****:role/kubernetes-dev-vault
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
  • Filepath from AWS_WEB_IDENTITY_TOKEN_FILE is readable by vault user:
/ $ whoami
vault
/ $ id vault
uid=100(vault) gid=1000(vault) groups=1000(vault),1000(vault)
/ $ cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token
eyJhb******VkN2E

Vault server configuration file(s):

disable_mlock = true
ui = true
listener "tcp" {
  address = "[::]:8200"
  cluster_address = "[::]:8201"
  tls_cert_file = "/vault/userconfig/vault-server-tls/tls.crt"
  tls_key_file = "/vault/userconfig/vault-server-tls/tls.key"
}
storage "dynamodb" {
  ha_enabled = "true"
  region     = "us-east-1"
  table      = "****"
}
seal "awskms" {
  kms_key_id = "0*****7"
}
telemetry {
  prometheus_retention_time = "30s",
  disable_hostname = true
}
max_lease_ttl = "87600h"

Additional context
In one of the related isssues, someone mentioned changing security context solves the problem but for me it didn't. Below is the securityContext used:

  securityContext:
    fsGroup: 1000
    runAsGroup: 1000
    runAsNonRoot: true
    runAsUser: 100

I tried different combinations mentioned in related issues.

@dogfish182
Copy link

we are seeing same issue,

  securityContext:
    fsGroup: 1000
    runAsGroup: 1000
    runAsNonRoot: true
    runAsUser: 100

this did not work for us either.

@ghost
Copy link

ghost commented May 8, 2020

@serhatcetinkaya @dogfish182 this is working,
Everything after commit e82399a94fc08a0c9849822eb6ebb40210dae6ef works
regarding this file : https://github.com/hashicorp/vault/blob/master/sdk/helper/awsutil/generate_credentials.go
line 65
You Need also provide AWS_ROLE_SESSION_NAME as an environment variable for container
AWS EKS will automatically ingest variables AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE based on service account
but AWS_ROLE_SESSION_NAME is required by vault (as they don't generate random value)
Try to build from master, add AWS_ROLE_SESSION_NAME env variable and it will work
Im wondering when this commit will be part of release ?

@dogfish182
Copy link

thanks for that @papovyr! I would also love to know when the fix will be rolled into a release, I can't find any kind of release cycle description anywhere though. Does anyone know?

We would prefer to wait for an official release and use the helm chart for simplicity and can probably just work around it until then.

@dgozalo
Copy link
Contributor

dgozalo commented May 12, 2020

As far as I know, this change hasn't been added to any Vault release yet. The milestone for this change is 1.5.

@pbernal pbernal added this to the triaged milestone Jun 2, 2020
@chancez
Copy link

chancez commented Jul 1, 2020

I saw it also got backported to 1.4.3 (not yet released)

@orirawlings
Copy link
Contributor

orirawlings commented Jul 7, 2020

I think this is fixed in 1.4.3, but it looks like there might be another bug (#9415) that requires that the otherwise optional parameter AWS_ROLE_SESSION_NAME to be set in the environment variables. To work-around, you'll just have to explicitly set a session name, rather than let one be randomly generated.

@serhatcetinkaya
Copy link
Author

I can confirm 1.4.3 works with AWS_ROLE_SESSION_NAME environment variable.

@tvoran
Copy link
Member

tvoran commented Jul 16, 2020

Now that #9416 has been merged to fix #9415, setting AWS_ROLE_SESSION_NAME should no longer be required in vault 1.4.4 and 1.5.0 (when they're released, that is).

@tvoran
Copy link
Member

tvoran commented Jul 30, 2020

Closing since 1.5.0 is now released, and 1.4.3 was verified to be working with AWS_ROLE_SESSION_NAME environment variable.

@tvoran tvoran closed this as completed Jul 30, 2020
@tvoran
Copy link
Member

tvoran commented Jul 31, 2020

@janwillies Have you verified the k8s service account matches the IAM role trust relationship conditions for name and namespace? Discussed more in this thread: #9576 (comment)

@janwillies
Copy link

Indeed that was the issue. I found out via getting the token from the pod and decoding. Thanks, and 1.5 works great so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/storage
Projects
None yet
Development

No branches or pull requests

10 participants