Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web Identity Token / EKS IAM Role Service Account (IRSA) support #112

Closed
ahmad-hamade opened this issue Nov 20, 2020 · 23 comments
Closed

Web Identity Token / EKS IAM Role Service Account (IRSA) support #112

ahmad-hamade opened this issue Nov 20, 2020 · 23 comments

Comments

@ahmad-hamade
Copy link

I was able to configure and run the provider successfully from my local machine but running the same from CI server returning health check timeout: no Elasticsearch node available.

I tried to run a simple python script that connects and uploads a document into ES from the CI server and it works just fine which eliminates any issue related to ES IAM role policy or security groups rules.

Any idea what could be the issue here?

@phillbaker
Copy link
Owner

Can you please share the provider configuration and resource configuration? If you're using AWS auth, please share the any relevant config files on the machine in question.

@ahmad-hamade
Copy link
Author

ahmad-hamade commented Nov 20, 2020

Hi @phillbaker,

I'm using TF0.13.

terraform {
  required_version = ">= 0.13"
  required_providers {
    elasticsearch = {
      source  = "phillbaker/elasticsearch"
      version = ">= 1.5"
    }
  }
}

Provider:

provider "elasticsearch" {
  url = var.es_endpoint
}

resource "elasticsearch_index_template" "template" {
  count = var.index_template != null ? 1 : 0
  name  = var.index_template.name
  body  = var.index_template.body
}

TF_Vars:

es_endpoint = format("https://%s", module.es.outputs.elasticsearch_endpoint)

index_template = {
  name = "logging_template"
  body = templatefile("index.json",
    {
      number_of_replicas = 0
      index_patterns     = format("%s-*", local.environment_name)
  })
}

I don't specify any AWS auth in my terraform AWS provider as I'm using saml2aws to assume and login to AWS using MFA.

As mentioned previously, everything works just fine on my laptop but in the CI server getting health check timeout: no Elasticsearch node available

PS. My CI is running in a Pod inside the EKS cluster and I'm using IRSA for authentication to access AWS resources.

All my other modules are working just fine inside the Pod except the module that has phillbaker/elasticsearch.
I tested connecting to the AWS ES instance from inside the Pod and I was able to upload dummy docs to a default index so my connectivity and IAM roles are not an issue.

@phillbaker
Copy link
Owner

As mentioned previously, everything works just fine on my laptop but in the CI server getting health check timeout

So seems like a networking or permissions issue from the CI environment 😀 . May be related to #89

I tested connecting to the AWS ES instance from inside the Pod and I was able to upload dummy docs to a default index so my connectivity and IAM roles are not an issue.

How did you do this test? If you used a curl can you share that?

Did you try setting healthcheck to false in the provider?

@ahmad-hamade
Copy link
Author

If I set healthcheck to false then I will be getting exactly the same behavior as what is mentioned in #89 terraform plan/apply is waiting forever.

The way how I tested the connectivity from inside CI pod is by running the below script (which will also return to me the document that I added) and it's working just fine:

from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3

host = 'MY_ES_URL'
region = 'eu-west-1'

service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

es = Elasticsearch(
    hosts = [{'host': host, 'port': 443}],
    http_auth = awsauth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)

document = {
    "title": "testing_tf"
}

es.index(index="dummy", doc_type="doc", id="2", body=document)

print(es.get(index="dummy", doc_type="doc", id="2"))

So I don't think it's related to a network connectivity or permissions.

@phillbaker
Copy link
Owner

phillbaker commented Nov 25, 2020

In order to narrow down the issue, can you try the following steps:

  1. hardcode the ES url in the provider block, e.g.:
provider "elasticsearch" {
  url = "https://....:9200"
}
  1. explicitly pass aws access, secret keys and token, region if necessary via the provider block: https://registry.terraform.io/providers/phillbaker/elasticsearch/latest/docs#aws_access_key, similar to what was generated in your script
  2. run terraform with debug logs: TF_LOG=DEBUG terraform apply | grep provider-elasticsearch

My guess is that there are slight differences in how this provider handles AWS authentication versus the terraform AWS provider.

@ahmad-hamade
Copy link
Author

Thanks, @phillbaker for your feedback.

I've ran aws sts assume-role --role-arn <MY_CI_ROLE> --role-session-name default to get the exported AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN environment variables in my pod and the terraform plan worked just fine.

I've noticed you've added a new provider variable aws_assume_role_arn in release 1.5.0 so I tried the following configuration with no luck:

provider "elasticsearch" {
  url = "https://vpc-<MASKED>.eu-west-1.es.amazonaws.com"

  aws_region          = "eu-west-1"
  aws_assume_role_arn = "arn:aws:iam::******:role/*******"
}
Here you can find the terraform logs:
2020-11-29T11:47:29.263Z [INFO]  plugin: configuring client automatic mTLS
2020-11-29T11:47:29.292Z [DEBUG] plugin: starting plugin: path=.terraform/plugins/registry.terraform.io/hashicorp/aws/3.18.0/linux_amd64/terraform-provider-aws_v3.18.0_x5 args=[.terraform/plugins/registry.terraform.io/hashicorp/aws/3.18.0/linux_amd64/terraform-provider-aws_v3.18.0_x5]
2020-11-29T11:47:29.293Z [DEBUG] plugin: plugin started: path=.terraform/plugins/registry.terraform.io/hashicorp/aws/3.18.0/linux_amd64/terraform-provider-aws_v3.18.0_x5 pid=8441
2020-11-29T11:47:29.293Z [DEBUG] plugin: waiting for RPC address: path=.terraform/plugins/registry.terraform.io/hashicorp/aws/3.18.0/linux_amd64/terraform-provider-aws_v3.18.0_x5
2020-11-29T11:47:29.326Z [INFO]  plugin.terraform-provider-aws_v3.18.0_x5: configuring server automatic mTLS: timestamp=2020-11-29T11:47:29.326Z
2020-11-29T11:47:29.359Z [DEBUG] plugin.terraform-provider-aws_v3.18.0_x5: plugin address: address=/tmp/plugin770720847 network=unix timestamp=2020-11-29T11:47:29.359Z
2020-11-29T11:47:29.359Z [DEBUG] plugin: using plugin: version=5
2020-11-29T11:47:29.509Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2020-11-29T11:47:29.512Z [DEBUG] plugin: plugin process exited: path=.terraform/plugins/registry.terraform.io/hashicorp/aws/3.18.0/linux_amd64/terraform-provider-aws_v3.18.0_x5 pid=8441
2020-11-29T11:47:29.512Z [DEBUG] plugin: plugin exited
2020-11-29T11:47:29.512Z [INFO]  plugin: configuring client automatic mTLS
2020-11-29T11:47:29.541Z [DEBUG] plugin: starting plugin: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 args=[.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0]
2020-11-29T11:47:29.543Z [DEBUG] plugin: plugin started: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 pid=8450
2020-11-29T11:47:29.543Z [DEBUG] plugin: waiting for RPC address: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0
2020-11-29T11:47:29.551Z [INFO]  plugin.terraform-provider-elasticsearch_v1.5.0: configuring server automatic mTLS: timestamp=2020-11-29T11:47:29.550Z
2020-11-29T11:47:29.580Z [DEBUG] plugin.terraform-provider-elasticsearch_v1.5.0: plugin address: address=/tmp/plugin657421152 network=unix timestamp=2020-11-29T11:47:29.580Z
2020-11-29T11:47:29.580Z [DEBUG] plugin: using plugin: version=5
2020-11-29T11:47:29.644Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unimplemented desc = unknown service plugin.GRPCStdio"
2020-11-29T11:47:29.647Z [DEBUG] plugin: plugin process exited: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 pid=8450
2020-11-29T11:47:29.647Z [DEBUG] plugin: plugin exited
2020/11/29 11:47:29 [INFO] terraform: building graph: GraphTypeValidate
2020/11/29 11:47:29 [DEBUG] ProviderTransformer: "elasticsearch_index_template.template" (*terraform.NodeValidatableResource) needs provider["registry.terraform.io/phillbaker/elasticsearch"]
2020/11/29 11:47:29 [DEBUG] ProviderTransformer: "elasticsearch_opendistro_ism_policy.ism" (*terraform.NodeValidatableResource) needs provider["registry.terraform.io/phillbaker/elasticsearch"]
2020/11/29 11:47:29 [DEBUG] pruning unused provider["registry.terraform.io/hashicorp/aws"]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "elasticsearch_opendistro_ism_policy.ism" references: [var.ism_template var.ism_template var.ism_template]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "output.es_ism_policy_id (expand)" references: [elasticsearch_opendistro_ism_policy.ism]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.es_endpoint" references: []
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.ism_template" references: []
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.index_template" references: []
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "elasticsearch_index_template.template" references: [var.index_template var.index_template var.index_template]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "output.es_index_template_id (expand)" references: [elasticsearch_index_template.template]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "provider[\"registry.terraform.io/phillbaker/elasticsearch\"]" references: []
2020/11/29 11:47:29 [DEBUG] Starting graph walk: walkValidate
2020-11-29T11:47:29.650Z [INFO]  plugin: configuring client automatic mTLS
2020-11-29T11:47:29.679Z [DEBUG] plugin: starting plugin: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 args=[.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0]
2020-11-29T11:47:29.680Z [DEBUG] plugin: plugin started: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 pid=8459
2020-11-29T11:47:29.680Z [DEBUG] plugin: waiting for RPC address: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0
2020-11-29T11:47:29.689Z [INFO]  plugin.terraform-provider-elasticsearch_v1.5.0: configuring server automatic mTLS: timestamp=2020-11-29T11:47:29.689Z
2020-11-29T11:47:29.718Z [DEBUG] plugin.terraform-provider-elasticsearch_v1.5.0: plugin address: address=/tmp/plugin875629567 network=unix timestamp=2020-11-29T11:47:29.718Z
2020-11-29T11:47:29.718Z [DEBUG] plugin: using plugin: version=5
2020-11-29T11:47:29.780Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unimplemented desc = unknown service plugin.GRPCStdio"
2020-11-29T11:47:29.785Z [DEBUG] plugin: plugin process exited: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 pid=8459
2020-11-29T11:47:29.785Z [DEBUG] plugin: plugin exited
2020/11/29 11:47:29 [INFO] backend/local: apply calling Refresh
2020/11/29 11:47:29 [INFO] terraform: building graph: GraphTypeRefresh
2020/11/29 11:47:29 [DEBUG] pruning unused provider["registry.terraform.io/phillbaker/elasticsearch"]
2020/11/29 11:47:29 [DEBUG] pruning unused provider["registry.terraform.io/hashicorp/aws"]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.es_endpoint" references: []
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.ism_template" references: []
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.index_template" references: []
2020/11/29 11:47:29 [WARN] ReferenceTransformer: reference not found: "elasticsearch_index_template.template"
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "output.es_index_template_id (expand)" references: []
2020/11/29 11:47:29 [WARN] ReferenceTransformer: reference not found: "elasticsearch_opendistro_ism_policy.ism"
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "output.es_ism_policy_id (expand)" references: []
2020/11/29 11:47:29 [DEBUG] Starting graph walk: walkRefresh
2020/11/29 11:47:29 [INFO] backend/local: apply calling Plan
2020/11/29 11:47:29 [INFO] terraform: building graph: GraphTypePlan
2020/11/29 11:47:29 [DEBUG] ProviderTransformer: "elasticsearch_opendistro_ism_policy.ism (expand)" (*terraform.nodeExpandPlannableResource) needs provider["registry.terraform.io/phillbaker/elasticsearch"]
2020/11/29 11:47:29 [DEBUG] ProviderTransformer: "elasticsearch_index_template.template (expand)" (*terraform.nodeExpandPlannableResource) needs provider["registry.terraform.io/phillbaker/elasticsearch"]
2020/11/29 11:47:29 [DEBUG] pruning unused provider["registry.terraform.io/hashicorp/aws"]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "output.es_index_template_id (expand)" references: [elasticsearch_index_template.template (expand)]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "output.es_ism_policy_id (expand)" references: [elasticsearch_opendistro_ism_policy.ism (expand)]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.index_template" references: []
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "elasticsearch_index_template.template (expand)" references: [var.index_template var.index_template var.index_template]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "elasticsearch_opendistro_ism_policy.ism (expand)" references: [var.ism_template var.ism_template var.ism_template]
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.es_endpoint" references: []
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "var.ism_template" references: []
2020/11/29 11:47:29 [DEBUG] ReferenceTransformer: "provider[\"registry.terraform.io/phillbaker/elasticsearch\"]" references: []
2020/11/29 11:47:29 [DEBUG] Starting graph walk: walkPlan
2020-11-29T11:47:29.788Z [INFO]  plugin: configuring client automatic mTLS
2020-11-29T11:47:29.818Z [DEBUG] plugin: starting plugin: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 args=[.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0]
2020-11-29T11:47:29.818Z [DEBUG] plugin: plugin started: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 pid=8469
2020-11-29T11:47:29.818Z [DEBUG] plugin: waiting for RPC address: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0
2020-11-29T11:47:29.827Z [INFO]  plugin.terraform-provider-elasticsearch_v1.5.0: configuring server automatic mTLS: timestamp=2020-11-29T11:47:29.827Z
2020-11-29T11:47:29.857Z [DEBUG] plugin.terraform-provider-elasticsearch_v1.5.0: plugin address: address=/tmp/plugin184895428 network=unix timestamp=2020-11-29T11:47:29.857Z
2020-11-29T11:47:29.857Z [DEBUG] plugin: using plugin: version=5
2020-11-29T11:47:29.919Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unimplemented desc = unknown service plugin.GRPCStdio"
2020-11-29T11:47:29.921Z [DEBUG] plugin.terraform-provider-elasticsearch_v1.5.0: 2020/11/29 11:47:29 [INFO] Using AWS: eu-west-1
2020/11/29 11:47:35 [ERROR] eval: *terraform.EvalConfigProvider, err: health check timeout: Head "https://<<<MASKED>>>.eu-west-1.es.amazonaws.com": RequestCanceled: request context canceled
caused by: context deadline exceeded: no Elasticsearch node available
2020/11/29 11:47:35 [ERROR] eval: *terraform.EvalSequence, err: health check timeout: Head "https://<<<MASKED>>>.eu-west-1.es.amazonaws.com": RequestCanceled: request context canceled
caused by: context deadline exceeded: no Elasticsearch node available
2020/11/29 11:47:35 [ERROR] eval: *terraform.EvalOpFilter, err: health check timeout: Head "https://<<<MASKED>>>.eu-west-1.es.amazonaws.com": RequestCanceled: request context canceled
caused by: context deadline exceeded: no Elasticsearch node available
2020/11/29 11:47:35 [ERROR] eval: *terraform.EvalSequence, err: health check timeout: Head "https://<<<MASKED>>>.eu-west-1.es.amazonaws.com": RequestCanceled: request context canceled
caused by: context deadline exceeded: no Elasticsearch node available

2020/11/29 11:47:35 [DEBUG] [aws-sdk-go] DEBUG: Request dynamodb/GetItem Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: dynamodb.eu-west-1.amazonaws.com
User-Agent: aws-sdk-go/1.31.9 (go1.14.7; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.13.5
Content-Length: 241
Accept-Encoding: identity
Authorization: <<<MASKED>>>
Content-Type: application/x-amz-json-1.0
X-Amz-Date: 20201129T114735Z
X-Amz-Security-Token: <<<MASKED>>>
X-Amz-Target: DynamoDB_20120810.GetItem

{"ConsistentRead":true,"Key":{"LockID":{"S":"804335263071-terraform-state/non-prod/dev/base-infra/logging/logging-es-config/es-config-logs/terraform.tfstate"}},"ProjectionExpression":"LockID, Info","TableName":"804335263071-terraform-locks"}
-----------------------------------------------------
Error: health check timeout: Head "https://<<<MASKED>>>.eu-west-1.es.amazonaws.com": RequestCanceled: request context canceled
caused by: context deadline exceeded: no Elasticsearch node available

  on main.tf line 1, in provider "elasticsearch":
   1: provider "elasticsearch" {


2020/11/29 11:47:35 [DEBUG] [aws-sdk-go] DEBUG: Response dynamodb/GetItem Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Content-Length: 494
Content-Type: application/x-amz-json-1.0
Date: Sun, 29 Nov 2020 11:47:35 GMT
Server: Server
X-Amz-Crc32: 1092377990
X-Amzn-Requestid: <<<MASKED>>>


-----------------------------------------------------
2020/11/29 11:47:35 [DEBUG] [aws-sdk-go] {"Item":{"LockID":{"S":"804335263071-terraform-state/non-prod/dev/base-infra/logging/logging-es-config/es-config-logs/terraform.tfstate"},"Info":{"S":"{\"ID\":\"0fcc478c-63bd-e52e-cbd7-ad64b0623ab1\",\"Operation\":\"OperationTypeApply\",\"Info\":\"\",\"Who\":\"runner@runner-infra-kfg25-f62sj\",\"Version\":\"0.13.5\",\"Created\":\"2020-11-29T11:47:29.116770918Z\",\"Path\":\"804335263071-terraform-state/non-prod/dev/base-infra/logging/logging-es-config/es-config-logs/terraform.tfstate\"}"}}}
2020/11/29 11:47:35 [DEBUG] [aws-sdk-go] DEBUG: Request dynamodb/DeleteItem Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: dynamodb.eu-west-1.amazonaws.com
User-Agent: aws-sdk-go/1.31.9 (go1.14.7; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.13.5
Content-Length: 181
Accept-Encoding: identity
Authorization: <<<MASKED>>>
Content-Type: application/x-amz-json-1.0
X-Amz-Date: 20201129T114735Z
X-Amz-Security-Token: <<<MASKED>>>
X-Amz-Target: DynamoDB_20120810.DeleteItem

{"Key":{"LockID":{"S":"804335263071-terraform-state/non-prod/dev/base-infra/logging/logging-es-config/es-config-logs/terraform.tfstate"}},"TableName":"804335263071-terraform-locks"}
-----------------------------------------------------
2020/11/29 11:47:36 [DEBUG] [aws-sdk-go] DEBUG: Response dynamodb/DeleteItem Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Content-Length: 2
Content-Type: application/x-amz-json-1.0
Date: Sun, 29 Nov 2020 11:47:36 GMT
Server: Server
X-Amz-Crc32: 2745614147
X-Amzn-Requestid: <<<MASKED>>>


-----------------------------------------------------
2020/11/29 11:47:36 [DEBUG] [aws-sdk-go] {}
2020-11-29T11:47:36.022Z [DEBUG] plugin: plugin process exited: path=.terraform/plugins/registry.terraform.io/phillbaker/elasticsearch/1.5.0/linux_amd64/terraform-provider-elasticsearch_v1.5.0 pid=8469
2020-11-29T11:47:36.022Z [DEBUG] plugin: plugin exited

@ahmad-hamade
Copy link
Author

As a workaround, I've to run the following script before terraform and issue would be resolved:

aws sts assume-role-with-web-identity \
 --role-arn $AWS_ROLE_ARN \
 --role-session-name tmp_es \
 --web-identity-token file://$AWS_WEB_IDENTITY_TOKEN_FILE \
 --duration-seconds 1000 > /tmp/irp-cred.txt
export AWS_ACCESS_KEY_ID="$(cat /tmp/irp-cred.txt | jq -r ".Credentials.AccessKeyId")"
export AWS_SECRET_ACCESS_KEY="$(cat /tmp/irp-cred.txt | jq -r ".Credentials.SecretAccessKey")"
export AWS_SESSION_TOKEN="$(cat /tmp/irp-cred.txt | jq -r ".Credentials.SessionToken")"
rm /tmp/irp-cred.txt

terraform plan

@phillbaker
Copy link
Owner

Looks very similar to the workaround suggested in this comment: hashicorp/terraform#22992 (comment).

That issue was closed by hashicorp/aws-sdk-go-base#33 and hashicorp/terraform#25134 and references aws/aws-sdk-go#3101.

I'll have to review the changes there, for now I would suggest the workaround posted above.

@phillbaker
Copy link
Owner

@ahmad-hamade can you confirm that you've set the following environment variables in your pod:

  • AWS_WEB_IDENTITY_TOKEN_FILE
  • AWS_SDK_LOAD_CONFIG=1

@ahmad-hamade
Copy link
Author

ahmad-hamade commented Nov 30, 2020

@phillbaker yes the below are exists and added by default since I'm using IAM Roles for Service Accounts (IRSA) in EKS except for AWS_SDK_LOAD_CONFIG

AWS_ROLE_ARN=arn:aws:iam::MASKED:role/infra-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token

@ahmad-hamade
Copy link
Author

Do you suggest me exporting AWS_SDK_LOAD_CONFIG and try again?

@phillbaker
Copy link
Owner

phillbaker commented Dec 1, 2020 via email

@ahmad-hamade
Copy link
Author

As per your suggestion and the reported issue aws/aws-sdk-go#2828, exporting AWS_SDK_LOAD_CONFIG must solve the issue to make awsession.NewSessionWithOptions support reading from web_identity_provider.

Unfortunately, I've tested that in my environment but it didn't help.

perhaps something being overridden to the below line which ignoring the behavior of reading the web_identity_provider even after settingAWS_SDK_LOAD_CONFIG to 1?

return awssession.Must(awssession.NewSessionWithOptions(sessOpts))

@ahmad-hamade ahmad-hamade changed the title Error: health check timeout: no Elasticsearch node available Web Identity Token / EKS IAM Role Service Account (IRSA) support Dec 4, 2020
@ahmad-hamade
Copy link
Author

ahmad-hamade commented Dec 4, 2020

So terraform AWS provider has fixed this issue by reading the shared credentials file which includes the current session token so maybe we can implement the same?

https://github.com/hashicorp/terraform-provider-aws/blob/bd828729b19030b366a64e4225eeb71e6d5eb0c2/vendor/github.com/hashicorp/aws-sdk-go-base/awsauth.go#L203

@phillbaker
Copy link
Owner

Thanks for the link, I'll take a look at the current session token.

@phillbaker
Copy link
Owner

After reading the code, it looks like one difference in configuration is that we're not sending a SharedCredentialsProvider, but that doesn't seem related to web identity credentials which are handled by the underlying AWS SDK Go.

@ahmad-hamade to clarify, based on the cluster URL you provide, it looks like you're using a VPC cluster. Can you confirm whether your access policies specify IAM users or roles? If so, requests would need to be signed with credentials and so the provider would need sign_aws_requests set. Since we have a recent version of the AWS SDK, a configuration like the following should work:

provider "elasticsearch" {
  url = "https://vpc-<MASKED>.eu-west-1.es.amazonaws.com"

  aws_region             = "eu-west-1" # must be set if the `url` is not of the form <region>.es.amazonaws.com
  sign_aws_requests = true
}

In your example script where you generate credentials with aws sts assume-role-with-web-identity before running terraform plan, were you setting sign_aws_requests?

@ahmad-hamade
Copy link
Author

ahmad-hamade commented Dec 6, 2020

Thanks @phillbaker for your further investigation.

I'm using AWS private ES cluster (accessible within my VPC resources) and my IAM role is allowed to connect, upload, and access indices.

To summarize what testing I've done so far:

  1. My IAM role used in Pod has full access to my AWS ES private cluster
  2. I'm using Web Identity Token in my K8s pod that is running in EKS (in the VPC where my ES is running)
  3. I was able to successfully writing a dummy document to ES using a simple python script (shared above) running from inside my pod (and the SDK was able to get the web identity token in boto3.Session().get_credentials()!)
  4. Terraform plan/apply automation running just fine in my Pod for all other AWS resources
  5. I've tried to set values to aws_region and sign_aws_requests (which is true by default) with no success

The only way to get the provider able to connect to my private ES was by exporting AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN variables with values consumed from assume-role-with-web-identity and terraform-provider-elasticsearch works just fine without even setting any value to aws_region or sign_aws_requests.

@phillbaker
Copy link
Owner

I've tried to set values to aws_region and sign_aws_requests (which is true by default) with no success

Ah, quite right, I forgot about that.

Can you confirm if there's a ~/.aws/config file on your pod that specifies a web_identity_token_file parameter?

@ahmad-hamade
Copy link
Author

No, there is no ~/.aws/config exists in the pod.

The only environment variables injected by EKS Pod Identity Webhook are AWS_ROLE_ARN and
AWS_WEB_IDENTITY_TOKEN_FILE that has a path a file contains token value /var/run/secrets/eks.amazonaws.com/serviceaccount/token

@phillbaker
Copy link
Owner

Well, here's what I see the code doing, which looks correct:

ES provider:

if m := awsUrlRegexp.FindStringSubmatch(parsedUrl.Hostname()); m != nil && signAWSRequests {
log.Printf("[INFO] Using AWS: %+v", m[1])
opts = append(opts, elastic7.SetHttpClient(awsHttpClient(m[1], d)), elastic7.SetSniff(false))

signer := awssigv4.NewSigner(awsSession(region, d).Config.Credentials)

return awssession.Must(awssession.NewSessionWithOptions(sessOpts))

In the SDK, AWS_WEB_IDENTITY_TOKEN_FILE environment variable is evaluated only in resolveCredentials, which in turn is only invoked in mergeConfigSrcs, which in turn is only invoked in newSession, which is invoked in NewSessionWithOptions.

https://github.com/aws/aws-sdk-go/blob/38c74caea1398949b67da14dfaa79cabe704a57f/aws/session/session.go#L333

https://github.com/aws/aws-sdk-go/blob/38c74caea1398949b67da14dfaa79cabe704a57f/aws/session/session.go#L459-L461

https://github.com/aws/aws-sdk-go/blob/38c74caea1398949b67da14dfaa79cabe704a57f/aws/session/session.go#L629-L630

https://github.com/aws/aws-sdk-go/blob/b6ab7f8d2ef9cce9ffe55475af0aae9445e4ec98/aws/session/credentials.go#L35-L41

https://github.com/aws/aws-sdk-go/blob/b6ab7f8d2ef9cce9ffe55475af0aae9445e4ec98/aws/session/credentials.go#L71-L78

@idallas456
Copy link

I too am seeing this issue with EKS roles. It would be nice to get this addressed so the workaround mentioned in here (#112 (comment)) isn't necessary. That workaround does fix the issue but is not an ideal solution

@phillbaker
Copy link
Owner

phillbaker commented Sep 17, 2021

I believe this actually is working. I've managed to test this on AWS using the following setup:

  • an AWS ES cluster (Opensearch 1.0)
  • a (self hosted) Kubernetes using IRSA
  • terraform v0.15.5
  • provider version v2.0.0-beta.1

a terraform file of the following:

terraform {
  required_providers {
    elasticsearch = {
      source = "phillbaker/elasticsearch"
      version = "2.0.0-beta.1"
    }
  }
}

provider "elasticsearch" {
  url = "https://vpc-terraform-XXX.us-east-2.es.amazonaws.com:443"
  aws_assume_role_arn = "arn:aws:iam::XXX:role/terraform-elasticsearch"
}

resource "elasticsearch_index" "test" {
  name = "terraform-test"
  number_of_shards = 1
  number_of_replicas = 1
}

The ES cluster has the following IAM access policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::XXX:role/terraform-elasticsearch"
        ]
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-east-2:XXX:domain/terraform/*"
    }
  ]
}

with the role having the attached policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "es:*",
            "Resource": "*"
        }
    ]
}

and a trust relationship with the to the OIDC endpoint, e.g.:

    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::xxx:oidc-provider/oidc-xxxx-xxxx-xxxx-xxxx-xxxxxxxxxx.s3.amazonaws.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity"
    }

Note - one difference between the script in #112 (comment) and this provider, is that boto (and the AWS terraform provider) appear to respect the AWS_ROLE_ARN environmental variable, this provider currently does not, so it's required to set aws_assume_role_arn in the provider config.

I don't have access to an EKS cluster to verify this there, but if this does not work for you, please include:

  • confirm whether the workaround above addresses the issue
  • provider version
  • elasticsearch version (and opendistro version if relevant, including whether fine grained access control is enabled)
  • redacted version of the terraform provider and resource configuration
  • terraform provider logs by setting TF_LOG_CORE=INFO TF_LOG_PROVIDER=TRACE
  • simplified AWS IAM roles and ES cluster access policy

@phillbaker
Copy link
Owner

I believe the issue with environmental variables has been fixed in 64f21df, please see some of the discussion in #124 (comment).

I'm going to close this issue for now, please let me know if there are further issues with IRSA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants