aws_ecs_capacity_provider will not destroy properly when needing replacement. #14393

bholzer · 2020-07-30T00:06:37Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Issue description

When I have deployed an autoscaling group and an ecs capacity provider, any change that requires a replacement of the capacity provider fails and times out. This appears to have been an issue that was resolved with 2.67, per the issue at #11286

When I use this version, however, I still appear to be unable to destroy a capacity provider.

Manually destroying the provider and running an apply again seems to be a decent workaround.

Terraform CLI and Terraform AWS Provider Version

Terraform version: 0.12.26
AWS provider version: 2.67

Affected Resource(s)

aws_ecs_capacity_provider

Terraform Configuration Files

resource "aws_launch_configuration" "ecs_container_instance" {
  name_prefix   = "${var.name}ECSContainerInstance"
  image_id      = data.aws_ami.ecs_ami.id
  instance_type = "t3.medium"
  iam_instance_profile = aws_iam_instance_profile.container_instance_profile.name
  security_groups = concat([aws_security_group.ssh.id], var.security_group_ids)

  root_block_device {
    encrypted = true
  }

  user_data = templatefile("${path.module}/container_instance_user_data.sh", {
    cluster_name = var.name,
    authorized_keys_base64 = filebase64("${path.module}/authorized_keys")
  })

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "container_instance_cluster" {
  name_prefix           = "${var.name}ContainerInstanceCluster"
  launch_configuration  = aws_launch_configuration.ecs_container_instance.name
  min_size              = 1
  max_size              = 5
  vpc_zone_identifier   = var.subnet_ids

  lifecycle {
    create_before_destroy = true
    ignore_changes = [tags]
  }

  tags = [
    { key = "Name", value = "${var.name}ECSContainerInstance", propagate_at_launch = true }
  ]
}

resource "random_string" "random" {
  length = 16
  special = false
}

resource "aws_ecs_capacity_provider" "ec2" {
  name = "${var.name}EC2Provider-${random_string.random.result}"

  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.container_instance_cluster.arn
    managed_termination_protection = "DISABLED"

    managed_scaling {
      maximum_scaling_step_size = 3
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }
}

Panic Output

Error: error waiting for ECS Capacity Provider (arn:aws:ecs:us-east-1:xxxxxxxxx:capacity-provider/alphaEC2Provider-w1tJAnjjax5r0kr9) to delete: timeout while waiting for state to become 'INACTIVE' (last state: 'ACTIVE', timeout: 20m0s)

Expected Behavior

The capacity provider should be replaced as the plan output suggests.

Actual Behavior

An apply times out.

Steps to Reproduce

terraform apply
Make a change the requires the capacity provider to be replaced, like changing the name.
terraform apply

The text was updated successfully, but these errors were encountered:

mikalai-t · 2020-08-16T23:33:25Z

Faced the same. Try to add create_before_destroy lifecycle policy to the capacity provider resource. Logically then the sequence of API calls made by Terraform should be:

create new capacity provider
modify capacity provider setting in the existing ECS cluster
destroy unused capacity provider

apottere · 2020-08-26T02:11:41Z

@mikalai-t did you actually get that working, or was that just a suggestion to try? We're running into this issue, but an ASG can aparently only have one capacity provider so create_before_destroy doesn't fix it.

Error: error creating capacity provider: ClientException: The specified Auto Scaling group ARN is already being used by another capacity provider. Specify a unique Auto Scaling group ARN and try again.

andy-codes · 2020-09-14T09:47:42Z

Did anyone make any progress with this? Facing the same issue.

cageyv · 2020-10-28T20:32:52Z

@andy-codes @apottere maybe this hack can solve some problems
I try to use asg group name as is capacity provider name. So it make possible use create_before_destroy and so on.
I'am on terraform 0.13 and aws 3.11.0

resource "aws_ecs_capacity_provider" "this" {
  # Forcing new capacity provider name depends on ASG name
  name = aws_autoscaling_group.this.name
  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.this.arn
    managed_termination_protection = "ENABLED" # required protect_from_scale_in = true in ASG
    managed_scaling {
      maximum_scaling_step_size = 2
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }
  lifecycle {
    create_before_destroy = true
  }
}

tymik · 2021-05-10T12:55:19Z

@andy-codes @apottere maybe this hack can solve some problems
I try to use asg group name as is capacity provider name. So it make possible use create_before_destroy and so on.
I'am on terraform 0.13 and aws 3.11.0

resource "aws_ecs_capacity_provider" "this" {
  # Forcing new capacity provider name depends on ASG name
  name = aws_autoscaling_group.this.name
  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.this.arn
    managed_termination_protection = "ENABLED" # required protect_from_scale_in = true in ASG
    managed_scaling {
      maximum_scaling_step_size = 2
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }
  lifecycle {
    create_before_destroy = true
  }
}

nope, this does not resolve issue - if you change something different than name, e.g. managed_termination_protection you end up with:

Error: error creating capacity provider: ClientException: The specified capacity provider already exists. To change the configuration of an existing capacity provider, update the capacity provider.

I don't know if this works properly for changing the name and I believe it does, but for everything else, without changing the name, I expect to get the very same error as I got for managed_termination_protection.

And honestly, I am not sure if we can do anything about that unless AWS allows the capacity provider to be changed without recreation from API, when it doesn't seem to be necessary.
Changing the very same parameter for capacity provider from AWS Console works flawlessly.

Wyfy0107 · 2021-08-14T10:16:38Z

Any update on this? I'm having the same issue

mwkaufman · 2021-09-28T21:51:43Z

We saw this issue and are optimistic that updating our aws provider to 3.47.0+ will provide a workaround due to this feature update: #16942 In older versions of the provider changing almost anything forced a new resource. This bug probably still exists if you try to update the name or ASG ARN, but otherwise you can avoid it.

smokentar · 2022-07-02T20:56:04Z

This issue still exists with aws v4.21.0

An ugly workaround is to re-create your ASGs and CPs on every run:

Add create_before_destroy lifecycle rule to all capacity providers and autoscaling groups you have
Introduce a random string re-generated on every run

resource "random_id" "suffix" {
  keepers = {
    suffix = "${timestamp()}"
  }
  byte_length = 8
}

Use this random string as a suffix to your ASG names and CP names

resource "aws_ecs_capacity_provider" "some_cp" {
  name = "${var.cp-name}-cp-${random_id.suffix.id}"
}  
resource "aws_autoscaling_group" "some_asg" {
  name  = "${var.asg-name}-${random_id.suffix.id}"
}

michal-kosinski · 2022-07-05T05:08:31Z

We were using ASG with name_prefix and then used that name for CP. It works fine till you want to assign CP to the ECS services. Destroying CP means it needs to be unassigned from every service first.
To overcome this we've switched to static naming for ASG and associated CP together with instance_refresh configuration block on ASG. After updating the launch template new EC2 instances are rolled out without downtime and without creating a new ASG. TF doesn't need to destroy CP anymore.

decentralgabe · 2023-01-22T06:32:41Z

Using a random string suffix does not work for the terraform ecs module since the capacity provider's name is a key value. If you try to make the key a dynamic string you get the following error on apply:

│ on .terraform/modules/ipfs.ecs_ipfs/main.tf line 90, in resource "aws_ecs_capacity_provider" "this":
│ 90: for_each = { for k, v in var.autoscaling_capacity_providers : k => v if var.create }
│ ├────────────────
│ │ var.autoscaling_capacity_providers will be known only after apply
│ │ var.create is true
│
│ The "for_each" map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will identify the instances of this resource.
│
│ When working with unknown values in for_each, it's better to define the map keys statically in your configuration and place apply-time results only in the map values.
│
│ Alternatively, you could use the -target planning option to first apply only the resources that the for_each value depends on, and then apply a second time to fully converge

Then back to the destroying/timeout death loop...

SmashingQuasar · 2024-05-14T12:45:27Z

This issue is still present.
Generally speaking, I think there is a logical issue with terraform destroy.
This is a conscious action, which means that when it is ran, the expected result is the effective destruction of the target resource.
We often get stuck into strange situations that are not handled by Terraform where AWS refuses to delete a resource because it is "in-use". The command is not terraform destroy-if-possible. I find it really dangerous that Terraform fails that often on destroy runs and it is making me question the utility of the destroy command.
In reality, in my scenario, Terraform handles all resources that are tied to this capacity provider. This means Terraform could be perfectly capable of first removing each resource one by one, and then remove the capacity provider.

I feel like there is too many cases where the AWS provider simply assumes things are going to work on AWS's end whilst they clearly don't. I'm not going to stray for the capacity provider topic but this is a rampant issue in the AWS provider.

ghost added service/autoscaling Issues and PRs that pertain to the autoscaling service. service/ecs Issues and PRs that pertain to the ecs service. labels Jul 30, 2020

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Jul 30, 2020

chris-malloy mentioned this issue Apr 7, 2021

Need to create capacity provider before destroying. 7Factor/terraform-ecs-cluster#26

Merged

justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Sep 22, 2021

YukiMichishita linked a pull request Oct 15, 2024 that will close this issue

Fix issue preventing replacing when ASG capacity provider is associated with a cluster #39720

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws_ecs_capacity_provider will not destroy properly when needing replacement. #14393

aws_ecs_capacity_provider will not destroy properly when needing replacement. #14393

bholzer commented Jul 30, 2020 •

edited

Loading

mikalai-t commented Aug 16, 2020

apottere commented Aug 26, 2020 •

edited

Loading

andy-codes commented Sep 14, 2020

cageyv commented Oct 28, 2020 •

edited

Loading

tymik commented May 10, 2021

Wyfy0107 commented Aug 14, 2021

mwkaufman commented Sep 28, 2021

smokentar commented Jul 2, 2022

michal-kosinski commented Jul 5, 2022 •

edited

Loading

decentralgabe commented Jan 22, 2023

SmashingQuasar commented May 14, 2024

aws_ecs_capacity_provider will not destroy properly when needing replacement. #14393

aws_ecs_capacity_provider will not destroy properly when needing replacement. #14393

Comments

bholzer commented Jul 30, 2020 • edited Loading

Community Note

Issue description

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

Terraform Configuration Files

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

mikalai-t commented Aug 16, 2020

apottere commented Aug 26, 2020 • edited Loading

andy-codes commented Sep 14, 2020

cageyv commented Oct 28, 2020 • edited Loading

tymik commented May 10, 2021

Wyfy0107 commented Aug 14, 2021

mwkaufman commented Sep 28, 2021

smokentar commented Jul 2, 2022

michal-kosinski commented Jul 5, 2022 • edited Loading

decentralgabe commented Jan 22, 2023

SmashingQuasar commented May 14, 2024

bholzer commented Jul 30, 2020 •

edited

Loading

apottere commented Aug 26, 2020 •

edited

Loading

cageyv commented Oct 28, 2020 •

edited

Loading

michal-kosinski commented Jul 5, 2022 •

edited

Loading