Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_emr_cluster and aws_emr_security_configuration destroy issues #6279

Closed
copumpkin opened this issue Oct 26, 2018 · 14 comments · Fixed by #12578
Closed

aws_emr_cluster and aws_emr_security_configuration destroy issues #6279

copumpkin opened this issue Oct 26, 2018 · 14 comments · Fixed by #12578
Labels
bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service.
Milestone

Comments

@copumpkin
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.11.8 with AWS provider 1.30

I realize these aren't the latest versions but I don't think this module has changed recently and I haven't had a chance to test yet.

Affected Resource(s)

  • aws_emr_cluster
  • aws_emr_security_configuration

What I did

If I have terraform manage my aws_emr_cluster and pass a terraform-managed aws_emr_security_configuration into that cluster, terraform destroy fails consistently to destroy the security configuration.

Expected Behavior

terraform destroy successfully cleans up all the resources it created

Actual Behavior

Error: Error applying plan:

1 error(s) occurred:

* aws_emr_security_configuration.main (destroy): 1 error(s) occurred:

* aws_emr_security_configuration.main: InvalidRequestException: Security configuration 'tf-emr-sc-20181022205505705400000001' cannot be deleted because it is in use by active clusters.
	status code: 400, request id: e6338f23-d936-12e8-ad83-7bea3842861a

Steps to Reproduce

  1. Create a minimal EMR cluster with just aws_emr_cluster and give it an aws_emr_security_configuration.
  2. terraform apply
  3. terraform destroy

Important Factoids

I think it's just not waiting long enough before attempting to destroy the security configuration. If I try again a minute or two later, the destroy works fine.

@bflad bflad added the service/emr Issues and PRs that pertain to the emr service. label Oct 26, 2018
@bflad
Copy link
Contributor

bflad commented Oct 26, 2018

It doesn't look like the situation has changed, quickly looking at: https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_emr_security_configuration.go#L103

To fix, we can probably just add a resource.Retry() handler to the Delete function there that retries for a minute or two:

input := &emr.DeleteSecurityConfigurationInput{
  Name: aws.String(d.Id()),
}
err := resource.Retry(1*time.Minute, func() *resource.RetryError {
  _, err := conn.DeleteSecurityConfiguration(input)

  if isAWSErr(err, "InvalidRequestException", "does not exist") {
    return nil
  }

  if isAWSErr(err, "InvalidRequestException", "cannot be deleted because it is in use by active clusters") {
    return resource.RetryableError(err)
  }

  if err != nil {
    return resource.NonRetryableError(err)
  }

  return nil
})

if err != nil {
  return fmt.Errorf("error deleting EMR Security Configuration (%s): %s", d.Id(), err)
}

Then to acceptance test, just create a test configuration that creates a security configuration and a cluster that utilizes that security configuration. 👍

I'm wondering if the problem is pretty inconsistent though, because TestAccAWSEMRCluster_security_config already performs something like the test configuration I mention and looking through the last 6 months of our daily acceptance testing I can't find a failure with the above error.

@copumpkin
Copy link
Author

Hmm, I can try to reduce my situation then for a proper regression test. I assumed it had nothing to do with the rest of my cluster configuration but maybe it does, if you don't get the issue. Definitely happening 100% of the time here. Given how long EMR clusters take to spin up and down, it'll probably take me a bit to find what's going wrong, but I'll try to post back with some actual terraform.

@jepma
Copy link

jepma commented Dec 7, 2018

We notice the same behaviour, the versions we are using:

Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "null" (1.0.0)...
- Downloading plugin for provider "aws" (1.51.0)...
- Downloading plugin for provider "template" (1.0.0)...

We ended up using a sleep 100 to mitigate the issue, which is not ideal and we also would like to see it fixed 👍

@tedleman
Copy link

tedleman commented Feb 1, 2019

Would love to see this get fixed. Here is a non-sleep workaround for anyone who is interested. Requires aws and jq.

resource "aws_emr_cluster" "my_cluster" {

  ...

  provisioner "local-exec" {
    when     = "destroy"
    command  = "echo ${aws_emr_cluster.my_cluster.id} > cluster_id.txt"
  }
}

resource "aws_emr_security_configuration" "my_security" {

  ...

  provisioner "local-exec" {
    when     = "destroy"
    command  = "while [ ! `aws emr describe-cluster --cluster-id $(cat cluster_id.txt) | jq 'any(.Cluster.Status.State; contains(\"TERMINATED\"))' | grep true` ]; do sleep 5; done"
  }
}

@kdhunter
Copy link

I don't know if this is related or not, but we're observing "terraform destroy" jobs involving EMR clusters returning as "completed" while the cluster is still in the "Terminating" (as opposed to "Terminated") state.

@aeschright aeschright added needs-triage Waiting for first response or review from a maintainer. bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Jun 24, 2019
@joelthompson
Copy link
Contributor

Poking into this some more, I wonder if this isn't because the EMR cluster delete method only waits for there to be zero running instances in the cluster, not for AWS to report the cluster as being terminated, which I think was introduced in f7405d0. This likely causes Terraform to think the cluster has been terminated, and the security configuration can be deleted, when from the AWS side, the cluster still exists and the security configuration cannot be deleted.

Any thoughts on changing the cluster deletion wait to wait for EMR to report the state as terminated, rather than for it to have zero running instances?

@bflad
Copy link
Contributor

bflad commented Mar 11, 2020

Any thoughts on changing the cluster deletion wait to wait for EMR to report the state as terminated, rather than for it to have zero running instances?

Sounds like a great idea. 👍

@joelthompson
Copy link
Contributor

FYI, we've been using a provider patched with the code in #12578 and it seems like it has fixed this issue for us.

@ashkan3
Copy link

ashkan3 commented Jul 31, 2020

We actually had to get around this issue by adding lifecycle policy of create_before_destroy to the
aws_emr_security_configuration resource.

resource "aws_emr_security_configuration" "my_config" {

  ...

  lifecycle {
    create_before_destroy = true
  }
}

@annyip
Copy link

annyip commented Nov 16, 2020

We actually had to get around this issue by adding lifecycle policy of create_before_destroy to the
aws_emr_security_configuration resource.

resource "aws_emr_security_configuration" "my_config" {

  ...

  lifecycle {
    create_before_destroy = true
  }
}

@ashkan3 this didn't seem to work for me? Still getting the dependency error when destroying....

@Sleepy-GH
Copy link

Any progress on this by any chance? I'm still having issues by this and the workaround with the create_before_destroy isn't working for me.

@breathingdust
Copy link
Member

Hi all 👋 Just letting you know that this is issue is featured on this quarters roadmap. If a PR exists to close the issue a maintainer will review and either make changes directly, or work with the original author to get the contribution merged. If you have written a PR to resolve the issue please ensure the "Allow edits from maintainers" box is checked. Thanks for your patience and we are looking forward to getting this merged soon!

@breathingdust breathingdust added this to the Roadmap milestone Nov 10, 2021
@github-actions github-actions bot modified the milestones: Roadmap, v3.70.0 Dec 16, 2021
@github-actions
Copy link

This functionality has been released in v3.70.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service.
Projects
None yet