Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_ecs_capacity_provider will not destroy properly when needing replacement. #14393

Open
bholzer opened this issue Jul 30, 2020 · 11 comments · May be fixed by #39720
Open

aws_ecs_capacity_provider will not destroy properly when needing replacement. #14393

bholzer opened this issue Jul 30, 2020 · 11 comments · May be fixed by #39720
Labels
bug Addresses a defect in current functionality. service/autoscaling Issues and PRs that pertain to the autoscaling service. service/ecs Issues and PRs that pertain to the ecs service.

Comments

@bholzer
Copy link

bholzer commented Jul 30, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Issue description

When I have deployed an autoscaling group and an ecs capacity provider, any change that requires a replacement of the capacity provider fails and times out. This appears to have been an issue that was resolved with 2.67, per the issue at #11286

When I use this version, however, I still appear to be unable to destroy a capacity provider.

Manually destroying the provider and running an apply again seems to be a decent workaround.

Terraform CLI and Terraform AWS Provider Version

Terraform version: 0.12.26
AWS provider version: 2.67

Affected Resource(s)

  • aws_ecs_capacity_provider

Terraform Configuration Files

resource "aws_launch_configuration" "ecs_container_instance" {
  name_prefix   = "${var.name}ECSContainerInstance"
  image_id      = data.aws_ami.ecs_ami.id
  instance_type = "t3.medium"
  iam_instance_profile = aws_iam_instance_profile.container_instance_profile.name
  security_groups = concat([aws_security_group.ssh.id], var.security_group_ids)

  root_block_device {
    encrypted = true
  }

  user_data = templatefile("${path.module}/container_instance_user_data.sh", {
    cluster_name = var.name,
    authorized_keys_base64 = filebase64("${path.module}/authorized_keys")
  })

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "container_instance_cluster" {
  name_prefix           = "${var.name}ContainerInstanceCluster"
  launch_configuration  = aws_launch_configuration.ecs_container_instance.name
  min_size              = 1
  max_size              = 5
  vpc_zone_identifier   = var.subnet_ids

  lifecycle {
    create_before_destroy = true
    ignore_changes = [tags]
  }

  tags = [
    { key = "Name", value = "${var.name}ECSContainerInstance", propagate_at_launch = true }
  ]
}

resource "random_string" "random" {
  length = 16
  special = false
}

resource "aws_ecs_capacity_provider" "ec2" {
  name = "${var.name}EC2Provider-${random_string.random.result}"

  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.container_instance_cluster.arn
    managed_termination_protection = "DISABLED"

    managed_scaling {
      maximum_scaling_step_size = 3
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }
}

Panic Output

Error: error waiting for ECS Capacity Provider (arn:aws:ecs:us-east-1:xxxxxxxxx:capacity-provider/alphaEC2Provider-w1tJAnjjax5r0kr9) to delete: timeout while waiting for state to become 'INACTIVE' (last state: 'ACTIVE', timeout: 20m0s)

Expected Behavior

The capacity provider should be replaced as the plan output suggests.

Actual Behavior

An apply times out.

Steps to Reproduce

  1. terraform apply
  2. Make a change the requires the capacity provider to be replaced, like changing the name.
  3. terraform apply
@ghost ghost added service/autoscaling Issues and PRs that pertain to the autoscaling service. service/ecs Issues and PRs that pertain to the ecs service. labels Jul 30, 2020
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Jul 30, 2020
@mikalai-t
Copy link

Faced the same. Try to add create_before_destroy lifecycle policy to the capacity provider resource. Logically then the sequence of API calls made by Terraform should be:

  • create new capacity provider
  • modify capacity provider setting in the existing ECS cluster
  • destroy unused capacity provider

@apottere
Copy link

apottere commented Aug 26, 2020

@mikalai-t did you actually get that working, or was that just a suggestion to try? We're running into this issue, but an ASG can aparently only have one capacity provider so create_before_destroy doesn't fix it.

Error: error creating capacity provider: ClientException: The specified Auto Scaling group ARN is already being used by another capacity provider. Specify a unique Auto Scaling group ARN and try again.

@andy-codes
Copy link

Did anyone make any progress with this? Facing the same issue.

@cageyv
Copy link
Contributor

cageyv commented Oct 28, 2020

@andy-codes @apottere maybe this hack can solve some problems
I try to use asg group name as is capacity provider name. So it make possible use create_before_destroy and so on.
I'am on terraform 0.13 and aws 3.11.0

resource "aws_ecs_capacity_provider" "this" {
  # Forcing new capacity provider name depends on ASG name
  name = aws_autoscaling_group.this.name
  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.this.arn
    managed_termination_protection = "ENABLED" # required protect_from_scale_in = true in ASG
    managed_scaling {
      maximum_scaling_step_size = 2
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }
  lifecycle {
    create_before_destroy = true
  }
}

@tymik
Copy link

tymik commented May 10, 2021

@andy-codes @apottere maybe this hack can solve some problems
I try to use asg group name as is capacity provider name. So it make possible use create_before_destroy and so on.
I'am on terraform 0.13 and aws 3.11.0

resource "aws_ecs_capacity_provider" "this" {
  # Forcing new capacity provider name depends on ASG name
  name = aws_autoscaling_group.this.name
  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.this.arn
    managed_termination_protection = "ENABLED" # required protect_from_scale_in = true in ASG
    managed_scaling {
      maximum_scaling_step_size = 2
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }
  lifecycle {
    create_before_destroy = true
  }
}

nope, this does not resolve issue - if you change something different than name, e.g. managed_termination_protection you end up with:

Error: error creating capacity provider: ClientException: The specified capacity provider already exists. To change the configuration of an existing capacity provider, update the capacity provider.

I don't know if this works properly for changing the name and I believe it does, but for everything else, without changing the name, I expect to get the very same error as I got for managed_termination_protection.

And honestly, I am not sure if we can do anything about that unless AWS allows the capacity provider to be changed without recreation from API, when it doesn't seem to be necessary.
Changing the very same parameter for capacity provider from AWS Console works flawlessly.

@Wyfy0107
Copy link

Any update on this? I'm having the same issue

@justinretzolk justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Sep 22, 2021
@mwkaufman
Copy link

We saw this issue and are optimistic that updating our aws provider to 3.47.0+ will provide a workaround due to this feature update: #16942 In older versions of the provider changing almost anything forced a new resource. This bug probably still exists if you try to update the name or ASG ARN, but otherwise you can avoid it.

@smokentar
Copy link

This issue still exists with aws v4.21.0

An ugly workaround is to re-create your ASGs and CPs on every run:

  1. Add create_before_destroy lifecycle rule to all capacity providers and autoscaling groups you have
  2. Introduce a random string re-generated on every run
resource "random_id" "suffix" {
  keepers = {
    suffix = "${timestamp()}"
  }
  byte_length = 8
}
  1. Use this random string as a suffix to your ASG names and CP names
resource "aws_ecs_capacity_provider" "some_cp" {
  name = "${var.cp-name}-cp-${random_id.suffix.id}"
}  
resource "aws_autoscaling_group" "some_asg" {
  name  = "${var.asg-name}-${random_id.suffix.id}"
}

@michal-kosinski
Copy link

michal-kosinski commented Jul 5, 2022

We were using ASG with name_prefix and then used that name for CP. It works fine till you want to assign CP to the ECS services. Destroying CP means it needs to be unassigned from every service first.
To overcome this we've switched to static naming for ASG and associated CP together with instance_refresh configuration block on ASG. After updating the launch template new EC2 instances are rolled out without downtime and without creating a new ASG. TF doesn't need to destroy CP anymore.

@decentralgabe
Copy link

Using a random string suffix does not work for the terraform ecs module since the capacity provider's name is a key value. If you try to make the key a dynamic string you get the following error on apply:

│ on .terraform/modules/ipfs.ecs_ipfs/main.tf line 90, in resource "aws_ecs_capacity_provider" "this":
│ 90: for_each = { for k, v in var.autoscaling_capacity_providers : k => v if var.create }
│ ├────────────────
│ │ var.autoscaling_capacity_providers will be known only after apply
│ │ var.create is true

│ The "for_each" map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will identify the instances of this resource.

│ When working with unknown values in for_each, it's better to define the map keys statically in your configuration and place apply-time results only in the map values.

│ Alternatively, you could use the -target planning option to first apply only the resources that the for_each value depends on, and then apply a second time to fully converge

Then back to the destroying/timeout death loop...

@SmashingQuasar
Copy link

This issue is still present.
Generally speaking, I think there is a logical issue with terraform destroy.
This is a conscious action, which means that when it is ran, the expected result is the effective destruction of the target resource.
We often get stuck into strange situations that are not handled by Terraform where AWS refuses to delete a resource because it is "in-use". The command is not terraform destroy-if-possible. I find it really dangerous that Terraform fails that often on destroy runs and it is making me question the utility of the destroy command.
In reality, in my scenario, Terraform handles all resources that are tied to this capacity provider. This means Terraform could be perfectly capable of first removing each resource one by one, and then remove the capacity provider.

I feel like there is too many cases where the AWS provider simply assumes things are going to work on AWS's end whilst they clearly don't. I'm not going to stray for the capacity provider topic but this is a rampant issue in the AWS provider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. service/autoscaling Issues and PRs that pertain to the autoscaling service. service/ecs Issues and PRs that pertain to the ecs service.
Projects
None yet