fails to destroy asg #176

tomdavidson · 2018-10-21T10:52:27Z

I have issues

More than what will be helped here ... The terrafrom-aws-eks module fails to destroy.

I'm submitting a...

bug report
feature request
support request
kudos, thank you, warm fuzzy

What is the current behavior?

terraform destroy exits non-zero :

* module.eks.aws_autoscaling_group.workers (destroy): 1 error(s) occurred:
* aws_autoscaling_group.workers: group still has 3 instances

If this is a bug, how to reproduce? Please include a code sample if relevant.

Apply a plan with the following module config:

module "eks" {
  source                 = "terraform-aws-modules/eks/aws"
  version                = "1.7.0"
  cluster_name           = "${local.cluster_name}"
  cluster_delete_timeout = "25m"
  config_output_path     = "${local.config_path}"
  subnets                = "${local.subnets}"
  tags                   = "${local.tags}"
  vpc_id                 = "${local.vpc_id}"

  workers_group_defaults = {
    additional_security_group_ids = ""          # A comma delimited list of additional security group ids to include in worker launch config
    asg_desired_capacity          = "3"         # Desired worker capacity in the autoscaling group.
    asg_max_size                  = "6"         # Maximum worker capacity in the autoscaling group.
    asg_min_size                  = "3"         # Minimum worker capacity in the autoscaling group.
    autoscaling_enabled           = true        # Sets whether policy and matching tags will be added to allow autoscaling.
    enable_monitoring             = true        # Enables/disables detailed monitoring.
    instance_type                 = "t3.medium" # Size of the workers instances.
    protect_from_scale_in         = true        # Prevent AWS from scaling in, so that cluster-autoscaler is solely responsible.
    public_ip                     = false       # Associate a public ip address with a worker
    root_volume_size              = "100"       # root volume size of workers instances.
    root_volume_type              = "gp2"       # root volume type of workers instances, can be 'standard', 'gp2', or 'io1'
    target_group_arns             = ""          # A comma delimited list of ALB target group ARNs to be associated to the ASG
  }
}

Then attempt to terraform destroy

What's the expected behavior?

That terraform destroy successfully deletes the asg and exits 0.

Are you able to fix this problem and submit a PR? Link here if you have already.

Willing to help, but could use some direction pinpointing the issue.

Environment details

Affected module version: 1.7.0
OS: ubuntu 18.04
Terraform version: 0.11.8

Any other relevant info

protect_from_scale_in = true seems to be the culprit. If I provision without it, I can successfully destroy.

The text was updated successfully, but these errors were encountered:

max-rocket-internet · 2018-10-23T09:24:40Z

protect_from_scale_in = true seems to be the culprit. If I provision without it, I can successfully destroy.

Cool, then problem solved, right? If you want to enable protect_from_scale_in then what could we do in this module? You've already decided to protect your instances from termination. If you want to be able to destroy your ASG and instances then don't enable that setting 🙂

tomdavidson · 2018-10-25T07:07:16Z

That is not very satisfying. A plan that can not be destroyed seems defective. Deploying the cluster with it enable and then doing an update to disable and then trying the destroy also fails.

If this is as designed and not a bug, I think a more descriptive on the functionality and impact would be helpful.

Thanks.

Vlaaaaaaad · 2018-12-06T12:20:53Z

Just hit this and this is not a behavior I expected.
We are planning to use this module to create ephemeral clusters and being able to delete them is vital. At the same time, we very much want to run cluster-autoscaler so we do need to enable protect_from_scale_in.

What's the best way we could do this? How are you all handling this problem?

The best way I can think of is adding a local-exec or remote-exec provisioner with when = "destroy" that just deletes the remaining instances/ disables scale-in protection on the ASGs.
Do you have any other ideas? Would such a PR be accepted?

Vlaaaaaaad · 2019-01-08T16:32:12Z

Hi,

We have another module wrapping this module and @bmihaescu created the workaround I mentioned above:

resource "null_resource" "eks-predestroy" {
    provisioner "local-exec" {
    when = "destroy"

    interpreter = ["/bin/bash", "-c"]

    command = <<CMD
ASGName=$(aws autoscaling describe-auto-scaling-groups | grep "${local.cluster_name}" -A 100 -B 100 | grep AutoScalingGroupName | tr -d '" ,' | cut -f2 -d ":")

for asg in $ASGName; do
    InstanceID=$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name "$asg" | grep InstanceId | tr -d '" ,' | cut -f2 -d ":")

    for instance in $InstanceID; do
        aws autoscaling set-instance-protection --instance-ids "$instance" --auto-scaling-group-name "$asg" --no-protected-from-scale-in

    done
done
CMD
    }
}

It's a bit crude and could be converted to use jq but it seems to work fine for us and the EKS cluster gets deleted properly on destroy.
Is there any value in upstreaming this script? Should we create a PR?

LE: aaaand due to hashicorp/terraform#13549 this only works on terraform destroy and not on tf apply when the module is deleted 😞

RothAndrew · 2019-01-08T16:43:58Z

This is totally 100% "working as intended". If you set protect-from-scale-in to true you have to manually terminate the instances. If I set protect-from-scale-in to true I am intentionally and explicitly saying I don't want anything to happen to those instances until I intentionally and explicitly tell the ASG otherwise.

RothAndrew · 2019-01-08T16:47:57Z

Changing the aws_autoscaling_group to fail fast if protect-from-scale-in is true should be the right choice here as proposed here

max-rocket-internet · 2019-01-09T10:54:10Z

This is totally 100% "working as intended". If you set protect-from-scale-in to true you have to manually terminate the instances.

Exactly. You can't have it both ways.

AnthonyFoiani-at · 2021-05-11T16:57:08Z

Given how many folks have encountered this, I'd like the developers to reconsider. I see at least three arguments for being able to destroy ASGs declared with protect_from_scale_in = true:

I deeply appreciate declarative configuration, but by its nature, it can only be a reflection of the steady state. The huge value to Terraform is its ability to take a stack in its current running state and see if there is any drift from the declared state; if differences are found, Terraform can safely evolve the stack to match the requested state.

This means that the live state will transiently not match the declared state: not all such operations are atomic.

Looking through that lens, it seems reasonable to have a mental model where moving from the "live" state to a "destroyed" state would require massaging some properties to temporarily not match the declared state.
Consider object-oriented programming. Every method should preserve class and instance invariants on entry and exit from the method -- but within the methods (and especially within the constructor and destructor!) those invariants don't hold. With this perspective, viewing protect_from_scale_in = true as an instance invariant makes sense, but the constructor might have to do multiple steps to get it there, and likewise the destructor might need to take multiple steps to destroy it.
This flag indicates that we don't want the instances destroyed by Amazon AutoScaling. Looking at the name, it has nothing to do with whether Terraform should be careful about them! The documentation even explicitly says (emphasis mine):

Indicates whether the instance is protected from termination by Amazon EC2 Auto Scaling when scaling in.

Destruction / tear-down is a very distinct operation from scaling in! (And if we can only implement graceful destroy through scaling down to 0, that's IMHO still a leaky abstraction that Terraform should be bridging, not exposing.)

Given these three arguments -- and the many duplicate requests / confusions in this and related threads -- I ask the authors to reconsider the viewpoint above.

The existing behavior might be matching the AWS semantics 1-to-1, but that's arguably less useful than adopting the viewpoints I give earlier -- and isn't Terraform all about making AWS more useful?

If the authors think that this kind of one-way switch / extra protection is valuable, maybe adopt some other mechanism?

github-actions · 2022-11-21T02:28:54Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tomdavidson closed this as completed Oct 25, 2018

Vlaaaaaaad mentioned this issue Dec 6, 2018

Terraform fails to destroy autoscaling group if scale in protection is enabled hashicorp/terraform-provider-aws#5278

Closed

max-rocket-internet mentioned this issue Mar 5, 2019

Destroy fails when protect_from_scale_in = true #288

Closed

dpiddockcmp mentioned this issue Mar 25, 2020

Destruction of worker ASG blocked by protect_from_scale_in = true #818

Closed

4 tasks

github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fails to destroy asg #176

fails to destroy asg #176

tomdavidson commented Oct 21, 2018

max-rocket-internet commented Oct 23, 2018

tomdavidson commented Oct 25, 2018 •

edited

Loading

Vlaaaaaaad commented Dec 6, 2018

Vlaaaaaaad commented Jan 8, 2019 •

edited

Loading

RothAndrew commented Jan 8, 2019

RothAndrew commented Jan 8, 2019

max-rocket-internet commented Jan 9, 2019

AnthonyFoiani-at commented May 11, 2021

github-actions bot commented Nov 21, 2022

fails to destroy asg #176

fails to destroy asg #176

Comments

tomdavidson commented Oct 21, 2018

I have issues

I'm submitting a...

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Any other relevant info

max-rocket-internet commented Oct 23, 2018

tomdavidson commented Oct 25, 2018 • edited Loading

Vlaaaaaaad commented Dec 6, 2018

Vlaaaaaaad commented Jan 8, 2019 • edited Loading

RothAndrew commented Jan 8, 2019

RothAndrew commented Jan 8, 2019

max-rocket-internet commented Jan 9, 2019

AnthonyFoiani-at commented May 11, 2021

github-actions bot commented Nov 21, 2022

tomdavidson commented Oct 25, 2018 •

edited

Loading

Vlaaaaaaad commented Jan 8, 2019 •

edited

Loading