-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression issue on 1.5.1 #124
Comments
Hello, sorry to hear you're having issues. It sounds like this might be related to f924ab6 (#114) Thanks for providing details and an example provider config.
I'm not quite following here, are you saying that a different provider config does work in v1.5.1? (Can you share/clarify examples?) |
I use Terragrunt to write a different Terraform file depending if I am on a CI environment or on a laptop. When on a laptop, this is the config I use:
When on CI env, this is the one I use:
On a laptop, it uses |
Thanks @michelzanini. Any chance the CI is running on EKS (#112)? |
No, it's running on a standard ec2 instance |
I can confirm that When I turn off My configuration is very similar:
|
For us we use
however, our profile looks like this:
works fine on |
Sorry for the delay here, I've reverted part of f924ab6 and tagged a v1.5.2-beta (https://github.com/phillbaker/terraform-provider-elasticsearch/tree/v1.5.2-beta). That should get pushed to terraform registry shortly. Can you all please give that try and let me know if this is resolved? |
Hello, following up on this. Has anyone been able to try |
On our side it did not fix the issue unfortunately:
reverting to |
Thanks. I reverted the upgrade of the AWS client and released |
Hi all, following up on this, has this been fixedin 1.5.2-beta1? |
HI all, 1.5.2 has been released, I'm going to close this as fixed - I don't have a way to reproduce, so I can't test directly. Please re-open if there are further issues. |
Sorry I did not have time to test this before. I tested with 1.5.4 and it seems it still not working. |
I can confirm the commit that introduced this regression issue was #119. I am going to have a deeper look now to see if I can spot the issue, but 100% it was there. |
Thanks @michelzanini that's very helpful. That strikes me as very odd, as #119 is primarily a change in timing of calls, as opposed to what calls are being made. In order to narrow down the issue, could you try the following:
|
Even with
If I also set This leads me to believe that there's some sort of race condition. I can't find the problem myself and I do not have enough Go or Elasticsearch knowledge to find this on my own. I will park this for now and keep locked to 1.5.0. Or else you can test this by creating one AWS instance and a Elasticsearch cluster, assign a IAM role to the box and run Terrafrom from there... |
Not sure this will help but this is the logs that keeps like this forever:
|
Unfortunately, #119 touches too many pieces of code to revert now.
I don't currently have access to an AWS environment where I can test this unfortunately. |
Here's one guess I have: the deferred instantiation of the client means that the client is initialized once per resource, versus once at provider instantiation. This may be a problem if there are many resources (which also require reads to prepare a plan) and the AWS client needs to query resources like the EC2 metadata API (which is rate limited). @michelzanini @lifeofguenter approximately how many |
Hi @phillbaker, that makes whole lot of sense. I have around 10 resources more or less. Although you don't have AWS resources to test, you can still probably test this behaviour with debugging? |
we also did not have a lot of resources. Maybe around 10 as well. We heavily monitored IMDS and other rate-limits as this was indeed a general issue, but was not the cause in this case - I think. I dont think this can be tested easily though... I would most probably look into how other providers utilize aws-sdk. I do know though especially for signed requests and ES that there are some additional quirks. I am not actively using this provider anymore else I would invest some time. I think using earlier versions is just fine for most use cases. |
I can confirm this has been fixed on 1.5.7. |
This may be fixed for |
Hi @marksumm can you clarify exactly the method that's being used here? What environmental variables are set? What EC2 metadata is being used? |
@phillbaker I meant a situation where no authentication attributes or environment variables are passed to the provider, healthchecks are disabled, and AWS request signing is enabled. Running locally uses the AWS credentials file as expected, but running on an EC2 instance now hangs indefinitely because state refreshes for resources created using the provider never return. The EC2 instance has an assumed role and so a session token is available via the metadata endpoint. Everything described was working in 1.5.0. |
@marksumm please share the elasticsearch provider config that is working on 1.5.0 and not working in more recent versions. What url does the ES cluster have? And is it self hosted or in the AWS Elastic/Opensearch service? |
@phillbaker The provider is configured like this...
The endpoint is apparently Elasticsearch 7.7, but it seems that AWS have already started to make changes to the API following the switch to OpenSearch. For example, index patterns should now be nested inside ISM policies and not created as separate resources. By the way, I tried setting |
@phillbaker I've noticed that if I log in to an affected EC2 instance and target an individual resource created by this provider during |
Fixes additional issue raised in #124.
@phillbaker It works! Thank you so much. |
After upgrading to
1.5.1
I am getting the following error:It could be related to
aws_assume_role_arn
as I use it on my provider config:It only seems to happen if I use
aws_assume_role_arn
and it does not when I useaws_profile
.I am using Elasticsearch 7.9.
Reverting back to
1.5.0
and the error disappears.I see there's significant changes done in this PR #119, maybe it's related.
Thanks.
The text was updated successfully, but these errors were encountered: