Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider/aws: Changing count of instances with volume attachments causes all attachments to be forced to new resources #5240

Closed
SpencerBrown opened this issue Feb 22, 2016 · 15 comments

Comments

@SpencerBrown
Copy link
Contributor

here's the scenario: (using latest 0.6.11)

have a cluster of aws_instance with a count

each instance has an aws_ebs_volume and a corresponding aws_volume_attachment, each using the same count (obviously)

all is well for the initial plan/apply

Now, increase the count by 1. Expect to simply add another instance with its ebs volume and attachment.

Instead, Terraform wants to force new resource for ALL the volume attachments. not good!

here's an example (ive removed some of the irrelevant detail so this might now work as is)

resource "aws_instance" "kube_worker" {
  count = "5"
  ami = "ami-something"
  instance_type = "t2.micro"
  availability_zone = "us-west-2a"
  subnet_id = "sn-something"
}

resource "aws_ebs_volume" "docker" {
  count = "5"
  availability_zone = "us-west-2a"
  type = "gp2"
  size = "10"
}

resource "aws_volume_attachment" "docker" {
  count = "5"
  device_name = "/dev/xvdd"
  volume_id = "${element(aws_ebs_volume.docker.*.id, count.index)}"
  instance_id = "${element(aws_instance.kube_worker.*.id, count.index)}"
}

if you plan/apply this, then change the 5's to 6's and re-plan, you get something that wants to force a new resource fo the first 5 volume attachments, because it thinks the instance_id and volume_id have changed (which they have not, obviously).

( I unfortunately did not save the actual log.)

This of course fails, because the volumes are still there and attached and Terraform cannot re-attach them.

My only recourse was to taint the existing instances and rebuild them all. This is bad, as I would like to be able to non-disruptively add a new node to my Kubernetes cluster using Terraform. I used to be able to do this before I had these volume attachments on each node.

@SpencerBrown
Copy link
Contributor Author

another error case:

Build the instances as above. Then taint aws_instance.kube_worker.0 and plan. Plan shows this output:

-/+ aws_volume_attachment.docker.0
    device_name:  "/dev/xvdd" => "/dev/xvdd"
    force_detach: "" => "<computed>"
    instance_id:  "i-b316bf6b" => "${element(aws_instance.kube_worker.*.id, count.index)}" (forces new resource)
    volume_id:    "vol-77d8fcb7" => "vol-77d8fcb7"

-/+ aws_volume_attachment.docker.1
    device_name:  "/dev/xvdd" => "/dev/xvdd"
    force_detach: "" => "<computed>"
    instance_id:  "i-b216bf6a" => "${element(aws_instance.kube_worker.*.id, count.index)}" (forces new resource)
    volume_id:    "vol-49d8fc89" => "vol-49d8fc89"

Notice that it wants to rebuild the aws_volume_attachment for aws_instance.kube_worker.1 which it should not do.

Running plan causes:

Error applying plan:

1 error(s) occurred:

* aws_volume_attachment.docker.1: Error waiting for Volume (vol-49d8fc89) to detach from Instance: i-b216bf6a

because the instance at count 1 is still running and has the volume attached.

To make this work, I have toterminate all the aws_instance.kube_worker.* instances on the AWS console. Running terraform taint or terraform destroy on the instances does not work.

@SpencerBrown
Copy link
Contributor Author

I think this issue is caused by the problem reported in #2957.

@miguelaferreira
Copy link
Contributor

I'm seeing the same issue, while using the cloudstack provider. Every time I increase the count, terraform wants to update all resources (not just the new one).

@hsergei
Copy link

hsergei commented Sep 21, 2016

I've had same issue with EBS vols, which I could work around by moving from separate EBS resource definition to incorporating it into aws_instance resource. Then it started to happen with EIP which can't be defined within aws_instance. The work around that seem to work so far is adding ignore_changes for the attribute that appears in "Mismatch reason". For me it was adding this block to aws_eip definition :

  lifecycle {
    ignore_changes = [ "instance" ]
  }

@PaulCapestany
Copy link

@hsergei thank you for posting your workaround (p.s. it works equally as well for aws_volume_attachment if you do ignore_changes = [ "volume", "instance" ])

@derFunk
Copy link

derFunk commented Mar 17, 2017

I also used the user_data but added a check if the device needs to be formatted, in order to avoid accidental data loss:

function format_if_necessary() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') format_if_necessary ${1}" >> ~/user-data.log 2>&1
    # return if $1 is not a block device path
    [ -b ${1} ] || return 1
    # format the block device if it isn't already
    [ $(sudo blkid ${1} > /dev/null 2>&1; echo $?) = 0 ] || sudo mkfs -t ext4 ${1} >> ~/user-data.log 2>&1
}

@LeslieK
Copy link

LeslieK commented Aug 20, 2017

@PaulCapestany Your solution of using ignore_changes = ["volume", "instance"] does not work when I destroy an instance. Terraform has the correct plan: destroy instance and destroy it's volume_attachment. terraform apply fails: "aws_volume_attachment.attach.2: Error waiting for Volume (vol-xyz) to detach from Instance"

Did you try to destroy an instance and see the expected result: instance and attachment destroyed, volume detached?

@hsergei
Copy link

hsergei commented Aug 20, 2017

No I did not test destroy. I guess you can always remove ignore_changes and let TF use its default behavior

@LeslieK
Copy link

LeslieK commented Aug 20, 2017

@hsergei @PaulCapestany : I succeeded only by stopping the instance I want to destroy via console.
terraform plan => destroy instance[3] and attach[3] (good plan!)
terraform apply => error: attach[3] waiting for volume to detach
result: instance is unchanged; volume is in "busy" state; attach[3] ??
via console: stop instance
terraform plan => destroy instance[3] (looks like attach[3] was destroyed)
terraform apply => success: instance[3] destroyed
volume[3] is in "available" state

looks like terraform does not stop instance; this causes volume to move from "in-use" to "busy"
and attach is destroyed because the second terraform plan did not have to destroy it so it must have been destroyed on the first terraform plan

verified: terraform's order of destroying resources:

  1. volume_attachment
  2. instance
    error happened after volume_attachment was destroyed but before instance was destroyed

would be better to destroy the instance before destroying the attachment so that if
destroying instance failed, attachment still exists

@LeslieK
Copy link

LeslieK commented Aug 20, 2017

the bug with aws_volume_attachment is that the destroy does not unmount the volume from a running instance
apparently stopping the instance helps terraform unmount the volume and the instance can then be destroyed
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-detaching-volume.html

@c4milo
Copy link
Contributor

c4milo commented Aug 21, 2017

Use destroy provisioners to unmount volumes.

@oranenj
Copy link

oranenj commented Aug 21, 2017

I'm hitting this too. I would like Terraform to just destroy the instances (causing the volumes to be detached "naturally") but apparently this is not possible, because terraform never gets to actually destroying the instance because it wants to destroy the attachment first, which will never work.

Stopping the instance first helps, but I don't see why it should be necessary.

@LeslieK
Copy link

LeslieK commented Aug 21, 2017

@c4milo : please give an example with a code snippet. in the docs, I don't know which provisioner you might be talking about. Thanks.

@c4milo
Copy link
Contributor

c4milo commented Aug 21, 2017

@ghost
Copy link

ghost commented Apr 7, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants