Regression issue on 1.5.1 #124

michelzanini · 2020-12-28T15:37:24Z

After upgrading to 1.5.1 I am getting the following error:

Error: health check timeout: Head "https://elasticsearch.mydomain.com": RequestCanceled: request context canceled
caused by: context deadline exceeded: no Elasticsearch node available

It could be related to aws_assume_role_arn as I use it on my provider config:

provider "elasticsearch" {
  url                 = "https://elasticsearch.mydomain.com"
  aws_region          = "eu-west-1"
  aws_profile         = ""
  aws_assume_role_arn = "arn:aws:iam::111111111:role/Role"
  sign_aws_requests   = true
}

It only seems to happen if I use aws_assume_role_arn and it does not when I use aws_profile.
I am using Elasticsearch 7.9.

Reverting back to 1.5.0 and the error disappears.

I see there's significant changes done in this PR #119, maybe it's related.

Thanks.

The text was updated successfully, but these errors were encountered:

phillbaker · 2020-12-29T02:24:55Z

Hello, sorry to hear you're having issues. It sounds like this might be related to f924ab6 (#114)

Thanks for providing details and an example provider config.

It only seems to happen if I use aws_assume_role_arn and it does not when I use aws_profile.

I'm not quite following here, are you saying that a different provider config does work in v1.5.1? (Can you share/clarify examples?)

michelzanini · 2020-12-29T12:04:26Z

I use Terragrunt to write a different Terraform file depending if I am on a CI environment or on a laptop.

When on a laptop, this is the config I use:

provider "elasticsearch" {
  url                 = "https://elasticsearch.mydomain.com"
  aws_region          = "eu-west-1"
  aws_profile         = "my_profile"
  aws_assume_role_arn = ""
  sign_aws_requests   = true
}

When on CI env, this is the one I use:

provider "elasticsearch" {
  url                 = "https://elasticsearch.mydomain.com"
  aws_region          = "eu-west-1"
  aws_profile         = ""
  aws_assume_role_arn = "arn:aws:iam::111111111:role/Role"
  sign_aws_requests   = true
}

On a laptop, it uses aws_profile. On CI server, it uses aws_assume_role_arn.
On 1.5.0 both config files work.
On 1.5.1, it seems only the laptop with aws_profile works.

phillbaker · 2020-12-30T03:46:31Z

Thanks @michelzanini. Any chance the CI is running on EKS (#112)?

michelzanini · 2020-12-30T12:42:20Z

No, it's running on a standard ec2 instance

Delorien84 · 2021-01-06T16:27:24Z

I can confirm that aws_assume_role_arn is not working on 1.5.1. It is running on EC2 instance with IAM role attached to that instance.

When I turn off healthcheck the execution block indefinitely .

My configuration is very similar:

provider "elasticsearch" {
  url                 = "https://custom.domain.com"
  aws_region          = "eu-west-1"
  aws_assume_role_arn = "arn:aws:iam::111111111:role/Role"
  sign_aws_requests   = true
}

lifeofguenter · 2021-01-07T13:22:38Z

For us we use aws_profile but it stopped working with 1.5.1:

provider "elasticsearch" {
  url               = "https://${module.logs_elasticsearch_remote.outputs.elasticsearch_endpoint}"
  aws_profile       = var.aws_profile
  sign_aws_requests = true
}

however, our profile looks like this:

[our-profile]
region            = us-east-1
credential_source = Ec2InstanceMetadata
role_arn          = arn:aws:iam::111111111111:role/ROLE_NAME

works fine on 1.5.0

phillbaker · 2021-01-09T02:04:53Z

Sorry for the delay here, I've reverted part of f924ab6 and tagged a v1.5.2-beta (https://github.com/phillbaker/terraform-provider-elasticsearch/tree/v1.5.2-beta). That should get pushed to terraform registry shortly. Can you all please give that try and let me know if this is resolved?

phillbaker · 2021-01-12T03:52:20Z

Hello, following up on this. Has anyone been able to try v1.5.2-beta?

lifeofguenter · 2021-01-12T09:55:42Z

On our side it did not fix the issue unfortunately:

[2021-01-12T09:50:31.485Z] - Using phillbaker/elasticsearch v1.5.2-beta from the shared cache directory

[2021-01-12T09:51:07.021Z] Error: health check timeout: Head "https://sssss.us-east-1.es.amazonaws.com": RequestCanceled: request context canceled
[2021-01-12T09:51:07.021Z] caused by: context deadline exceeded: no Elasticsearch node available
[2021-01-12T09:51:07.021Z] 
[2021-01-12T09:51:07.021Z] 
[2021-01-12T09:51:07.021Z] 
[2021-01-12T09:51:07.021Z] Error: no active connection found: no Elasticsearch node available

reverting to 1.5.0 still works

phillbaker · 2021-01-16T20:07:52Z

Thanks. I reverted the upgrade of the AWS client and released v1.5.2-beta1, can folks on this thread give that a try and update here?

phillbaker · 2021-02-08T03:30:19Z

Hi all, following up on this, has this been fixedin 1.5.2-beta1?

phillbaker · 2021-02-13T23:32:27Z

HI all, 1.5.2 has been released, I'm going to close this as fixed - I don't have a way to reproduce, so I can't test directly. Please re-open if there are further issues.

michelzanini · 2021-04-06T15:42:01Z

Sorry I did not have time to test this before. I tested with 1.5.4 and it seems it still not working.

michelzanini · 2021-04-07T17:55:22Z

I can confirm the commit that introduced this regression issue was #119.
I build binaries for every commit until it broke starting on that one.

I am going to have a deeper look now to see if I can spot the issue, but 100% it was there.
@phillbaker

phillbaker · 2021-04-08T03:45:23Z

Thanks @michelzanini that's very helpful. That strikes me as very odd, as #119 is primarily a change in timing of calls, as opposed to what calls are being made.

In order to narrow down the issue, could you try the following:

try setting sniff to false in the provider config
try setting elasticsearch_version to the correct elasticsearch version to skip pinging the cluster when creating a client

michelzanini · 2021-04-08T12:32:16Z

Even with sniff and elasticsearch_version I still get the errors:

Error: health check timeout: Head "https://elasticsearch.mydomain.com": RequestCanceled: request context canceled
caused by: context deadline exceeded: no Elasticsearch node available

  on main.tf line 8, in resource "elasticsearch_opendistro_role" "read_indexes_role":
   8: resource "elasticsearch_opendistro_role" "read_indexes_role" {



Error: health check timeout: Head "https://elasticsearch.mydomain.com": RequestCanceled: request context canceled
caused by: context deadline exceeded: no Elasticsearch node available

  on main.tf line 59, in resource "elasticsearch_opendistro_user" "developer_users":
  59: resource "elasticsearch_opendistro_user" "developer_users" {



Error: no active connection found: no Elasticsearch node available

  on main.tf line 72, in resource "elasticsearch_opendistro_ism_policy" "ism_policy":
  72: resource "elasticsearch_opendistro_ism_policy" "ism_policy" {

If I also set healthchek to false, then there's no error but the resources are never created and Terraform keeps running indefinitely. All resources keep printing Still creating... [100...s elapsed] etc...

This leads me to believe that there's some sort of race condition. I can't find the problem myself and I do not have enough Go or Elasticsearch knowledge to find this on my own.

I will park this for now and keep locked to 1.5.0.
Do you consider maybe reverting that PR #119 ?

Or else you can test this by creating one AWS instance and a Elasticsearch cluster, assign a IAM role to the box and run Terrafrom from there...

michelzanini · 2021-04-08T12:44:08Z

Not sure this will help but this is the logs that keeps like this forever:

(...)
elasticsearch_opendistro_role.read_indexes_role: Still creating... [40s elapsed]
2021/04/08 12:42:33 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:36 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/phillbaker/elasticsearch\"] (close)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:37 [TRACE] dag/walk: vertex "root" is waiting for "meta.count-boundary (EachMode fixup)"
2021/04/08 12:42:38 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:41 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/phillbaker/elasticsearch\"] (close)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:42 [TRACE] dag/walk: vertex "root" is waiting for "meta.count-boundary (EachMode fixup)"
elasticsearch_opendistro_role.read_indexes_role: Still creating... [50s elapsed]
2021/04/08 12:42:43 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:46 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/phillbaker/elasticsearch\"] (close)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:47 [TRACE] dag/walk: vertex "root" is waiting for "meta.count-boundary (EachMode fixup)"
2021/04/08 12:42:48 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:51 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/phillbaker/elasticsearch\"] (close)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:52 [TRACE] dag/walk: vertex "root" is waiting for "meta.count-boundary (EachMode fixup)"
elasticsearch_opendistro_role.read_indexes_role: Still creating... [60s elapsed]
2021/04/08 12:42:33 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:36 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/phillbaker/elasticsearch\"] (close)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:37 [TRACE] dag/walk: vertex "root" is waiting for "meta.count-boundary (EachMode fixup)"
2021/04/08 12:42:38 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:41 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/phillbaker/elasticsearch\"] (close)" is waiting for "elasticsearch_opendistro_role.read_indexes_role"
2021/04/08 12:42:42 [TRACE] dag/walk: vertex "root" is waiting for "meta.count-boundary (EachMode fixup)"
(...)

phillbaker · 2021-04-09T02:08:51Z

Do you consider maybe reverting that PR #119 ?

Unfortunately, #119 touches too many pieces of code to revert now.

Or else you can test this by creating one AWS instance and a Elasticsearch cluster, assign a IAM role to the box and run Terrafrom from there...

I don't currently have access to an AWS environment where I can test this unfortunately.

phillbaker · 2021-04-10T03:44:54Z

Here's one guess I have: the deferred instantiation of the client means that the client is initialized once per resource, versus once at provider instantiation. This may be a problem if there are many resources (which also require reads to prepare a plan) and the AWS client needs to query resources like the EC2 metadata API (which is rate limited).

@michelzanini @lifeofguenter approximately how many elasticsearch_* resources are being managed in terraform?

michelzanini · 2021-04-12T10:38:34Z

Hi @phillbaker, that makes whole lot of sense. I have around 10 resources more or less. Although you don't have AWS resources to test, you can still probably test this behaviour with debugging?

lifeofguenter · 2021-04-12T10:57:20Z

we also did not have a lot of resources. Maybe around 10 as well.

We heavily monitored IMDS and other rate-limits as this was indeed a general issue, but was not the cause in this case - I think.

I dont think this can be tested easily though...

I would most probably look into how other providers utilize aws-sdk. I do know though especially for signed requests and ES that there are some additional quirks.

I am not actively using this provider anymore else I would invest some time. I think using earlier versions is just fine for most use cases.

michelzanini · 2021-06-23T19:18:01Z

I can confirm this has been fixed on 1.5.7.

marksumm · 2021-09-23T17:53:47Z

This may be fixed for aws_assume_role_arn but transparent role-based authentication via EC2 metadata is broken after 1.5.0 as well. Unfortunately, I need to upgrade because of other bugs that are only fixed in later versions of the provider.

phillbaker · 2021-09-23T20:09:27Z

transparent role-based authentication via EC2 metadata is broken after 1.5.0

Hi @marksumm can you clarify exactly the method that's being used here? What environmental variables are set? What EC2 metadata is being used?

marksumm · 2021-09-23T21:47:05Z

@phillbaker I meant a situation where no authentication attributes or environment variables are passed to the provider, healthchecks are disabled, and AWS request signing is enabled. Running locally uses the AWS credentials file as expected, but running on an EC2 instance now hangs indefinitely because state refreshes for resources created using the provider never return. The EC2 instance has an assumed role and so a session token is available via the metadata endpoint. Everything described was working in 1.5.0.

phillbaker · 2021-09-24T02:02:02Z

@marksumm please share the elasticsearch provider config that is working on 1.5.0 and not working in more recent versions. What url does the ES cluster have? And is it self hosted or in the AWS Elastic/Opensearch service?

marksumm · 2021-09-24T07:12:25Z

@phillbaker The provider is configured like this...

provider "elasticsearch" {
  url               = "https://********.us-east-1.es.amazonaws.com"
  sign_aws_requests = true
  healthcheck       = false
}

The endpoint is apparently Elasticsearch 7.7, but it seems that AWS have already started to make changes to the API following the switch to OpenSearch. For example, index patterns should now be nested inside ISM policies and not created as separate resources. By the way, I tried setting AWS_SDK_LOAD_CONFIG=1, but it didn't help.

marksumm · 2021-09-24T09:59:31Z

@phillbaker I've noticed that if I log in to an affected EC2 instance and target an individual resource created by this provider during terraform plan (and there are no dependencies on other resources), then the state refresh operation no longer hangs. If I attempt to target more than one resource created by this provider, or run an unmodified terraform plan, then I see the hanging behaviour as before. This is true even for a configuration with a very small number of resources (3), which seems to point to an internal deadlock, rather than an API limiting issue. Interestingly, setting -parallelism 1 doesn't seem to help.

Fixes additional issue raised in #124.

phillbaker · 2021-09-25T15:58:49Z

Hi @marksumm this should be addressed in 64f21df, it'll be released in 2.0.0-beta.2 (coming shortly).

marksumm · 2021-09-27T08:39:43Z

@phillbaker It works! Thank you so much.

phillbaker closed this as completed Feb 13, 2021

phillbaker mentioned this issue Mar 6, 2021

Assume role configuration doesn't seem to work #149

Closed

phillbaker mentioned this issue Apr 6, 2021

Support AWS Single-Sign On (SSO) cached credentials when using AWS profiles #162

Closed

phillbaker reopened this Apr 6, 2021

phillbaker closed this as completed in 52c1774 Jun 6, 2021

raids mentioned this issue Jun 25, 2021

AWS SAML connection error on AWS ElasticSearch cluster since upgrading #189

Closed

phillbaker mentioned this issue Sep 7, 2021

context deadline exceeded: no Elasticsearch node available #213

Closed

phillbaker added a commit that referenced this issue Sep 25, 2021

[aws] Reuse session options, ensure synchronization.

64f21df

Fixes additional issue raised in #124.

phillbaker mentioned this issue Oct 3, 2021

Web Identity Token / EKS IAM Role Service Account (IRSA) support #112

Closed

mohamed-haidara-cko mentioned this issue May 27, 2022

Provider is slow when running from EC2 with other AWS providers #281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression issue on 1.5.1 #124

Regression issue on 1.5.1 #124

michelzanini commented Dec 28, 2020

phillbaker commented Dec 29, 2020 •

edited

Loading

michelzanini commented Dec 29, 2020

phillbaker commented Dec 30, 2020

michelzanini commented Dec 30, 2020

Delorien84 commented Jan 6, 2021

lifeofguenter commented Jan 7, 2021

phillbaker commented Jan 9, 2021

phillbaker commented Jan 12, 2021

lifeofguenter commented Jan 12, 2021 •

edited

Loading

phillbaker commented Jan 16, 2021

phillbaker commented Feb 8, 2021

phillbaker commented Feb 13, 2021

michelzanini commented Apr 6, 2021

michelzanini commented Apr 7, 2021

phillbaker commented Apr 8, 2021

michelzanini commented Apr 8, 2021

michelzanini commented Apr 8, 2021

phillbaker commented Apr 9, 2021 •

edited

Loading

phillbaker commented Apr 10, 2021

michelzanini commented Apr 12, 2021 •

edited

Loading

lifeofguenter commented Apr 12, 2021

michelzanini commented Jun 23, 2021

marksumm commented Sep 23, 2021

phillbaker commented Sep 23, 2021

marksumm commented Sep 23, 2021 •

edited

Loading

phillbaker commented Sep 24, 2021 •

edited

Loading

marksumm commented Sep 24, 2021

marksumm commented Sep 24, 2021 •

edited

Loading

phillbaker commented Sep 25, 2021

marksumm commented Sep 27, 2021

Regression issue on 1.5.1 #124

Regression issue on 1.5.1 #124

Comments

michelzanini commented Dec 28, 2020

phillbaker commented Dec 29, 2020 • edited Loading

michelzanini commented Dec 29, 2020

phillbaker commented Dec 30, 2020

michelzanini commented Dec 30, 2020

Delorien84 commented Jan 6, 2021

lifeofguenter commented Jan 7, 2021

phillbaker commented Jan 9, 2021

phillbaker commented Jan 12, 2021

lifeofguenter commented Jan 12, 2021 • edited Loading

phillbaker commented Jan 16, 2021

phillbaker commented Feb 8, 2021

phillbaker commented Feb 13, 2021

michelzanini commented Apr 6, 2021

michelzanini commented Apr 7, 2021

phillbaker commented Apr 8, 2021

michelzanini commented Apr 8, 2021

michelzanini commented Apr 8, 2021

phillbaker commented Apr 9, 2021 • edited Loading

phillbaker commented Apr 10, 2021

michelzanini commented Apr 12, 2021 • edited Loading

lifeofguenter commented Apr 12, 2021

michelzanini commented Jun 23, 2021

marksumm commented Sep 23, 2021

phillbaker commented Sep 23, 2021

marksumm commented Sep 23, 2021 • edited Loading

phillbaker commented Sep 24, 2021 • edited Loading

marksumm commented Sep 24, 2021

marksumm commented Sep 24, 2021 • edited Loading

phillbaker commented Sep 25, 2021

marksumm commented Sep 27, 2021

phillbaker commented Dec 29, 2020 •

edited

Loading

lifeofguenter commented Jan 12, 2021 •

edited

Loading

phillbaker commented Apr 9, 2021 •

edited

Loading

michelzanini commented Apr 12, 2021 •

edited

Loading

marksumm commented Sep 23, 2021 •

edited

Loading

phillbaker commented Sep 24, 2021 •

edited

Loading

marksumm commented Sep 24, 2021 •

edited

Loading