Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use instance_template with instance_group_manager anymore #4934

Closed
migibert opened this issue Nov 18, 2019 · 10 comments
Closed

Unable to use instance_template with instance_group_manager anymore #4934

migibert opened this issue Nov 18, 2019 · 10 comments

Comments

@migibert
Copy link
Contributor

migibert commented Nov 18, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.12.15

  • provider.google v2.20.0
  • provider.google-beta v2.20.0

Affected Resource(s)

  • google_compute_instance_template

Terraform Configuration Files

A reproducer code is available here: https://gist.github.com/migibert/a51ad6521f565f4060aedd93b4337c33#file-reproducer-tf

Debug Output

https://gist.github.com/migibert/a51ad6521f565f4060aedd93b4337c33#file-tf-output-log

Panic Output

No Panic Output

Expected Behavior

I am using instance templates with images and instance group manager to manage an immutable infrastructure pattern.

I have been using this configuration from ~1 year and it used to run without any problem.

Actual Behavior

When I update the base image for a template, a cycle is detected and an error raises, preventing the update of the infrastructure.

Error: Cycle: google_compute_instance_template.igm-primary (destroy deposed 9a0c0b62), google_compute_instance_template.igm-canary (destroy deposed 885f672d), google_compute_instance_group_manager.igm-basic

Steps to Reproduce

  1. terraform apply to create the basic configuration (an IGM with 2 templates using using a base image)
  2. Update the base image (let's say, update it to debian-10)
  3. terraform apply to update the configuration (an IGM with 2 templates using using a base image)

Important Factoids

I tried with both a user account and a service account but none works.

My hypothesis is that it is related to lifecycle create_before_destroy instruction because here is the workaround I found...

  • Update the base image and apply

Error: Cycle: google_compute_instance_template.igm-primary (destroy deposed 9a0c0b62), google_compute_instance_template.igm-canary (destroy deposed 885f672d), google_compute_instance_group_manager.igm-basic

  • Update the base image and apply targeting only the templates

Error: Error deleting instance template: googleapi: Error 400: The instance_template resource 'projects//global/instanceTemplates/reproducer-stable-20191118131559890900000002' is already being used by 'projects//zones/us-central1-c/instanceGroupManagers/reproducer', resourceInUseByAnotherResource

Error: Error deleting instance template: googleapi: Error 400: The instance_template resource 'projects//global/instanceTemplates/reproducer-canary-20191118131559890900000001' is already being used by 'projects//zones/us-central1-c/instanceGroupManagers/reproducer', resourceInUseByAnotherResource

BUT! Despite the errors, it creates the templates (it just fails to delete the old ones).

  • Apply targeting the instance group manager

OK

  • Apply without targeting any resource

google_compute_instance_template.igm-canary: Destroying... [id=reproducer-canary-20191118131559890900000001]
google_compute_instance_template.igm-primary: Destroying... [id=reproducer-stable-20191118131559890900000002]
google_compute_instance_template.igm-canary: Destruction complete after 4s
google_compute_instance_template.igm-primary: Destruction complete after 4s

Apply complete! Resources: 0 added, 0 changed, 2 destroyed.

References

@ghost ghost added the bug label Nov 18, 2019
@slevenick
Copy link
Collaborator

Interesting....

I was unable to reproduce in Terraform v0.12.13, but when I upgraded to 0.12.15 I see the errors about cycles as well.

I'll do some digging, but a potential fix would be to downgrade to 0.12.13 and see if that works for you

@slevenick
Copy link
Collaborator

Looks like this is caused by hashicorp/terraform#23374

It was introduced in terraform core version 0.12.14, so downgrading to 0.12.13 should work until the fix is ready.

@japgolly
Copy link

downgrading to 0.12.13 should work until the fix is ready.

I've got a similar issue and tried downgrading but unfortunately that doesn't work:

Error: Error loading state: state snapshot was created by Terraform v0.12.15, which is newer than current v0.12.13; upgrade to Terraform v0.12.15 or greater to work with this state

@slevenick
Copy link
Collaborator

Looks like the upstream PR was merged in, so I would guess this will be fixed in the next release of terraform core.

Unfortunately there isn't a way to work around this within the provider itself

@migibert
Copy link
Contributor Author

terraform core version v0.12.16 has been released including the PR fixing the issue causing the problem. Thanks for pointing me on the correct upstream issue!

It looks better but it does not seem to completely fix the issue:

Here is the output with the same scenario (a change in the base image):

Plan: 2 to add, 1 to change, 2 to destroy.

Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value: yes

google_compute_instance_template.igm-canary: Creating...
google_compute_instance_template.igm-primary: Creating...
google_compute_instance_template.igm-primary: Creation complete after 5s [id=reproducer-stable-20191119084605404500000001]
google_compute_instance_template.igm-primary: Destroying... [id=reproducer-stable-20191118135449093000000001]
google_compute_instance_template.igm-canary: Creation complete after 5s [id=reproducer-canary-20191119084605404500000002]
google_compute_instance_group_manager.igm-basic: Modifying... [id=/us-central1-c/reproducer]
google_compute_instance_group_manager.igm-basic: Still modifying... [id=
/us-central1-c/reproducer, 10s elapsed]
google_compute_instance_group_manager.igm-basic: Still modifying... [id=/us-central1-c/reproducer, 20s elapsed]
google_compute_instance_group_manager.igm-basic: Modifications complete after 28s [id=
/us-central1-c/reproducer]
google_compute_instance_template.igm-canary: Destroying... [id=reproducer-canary-20191118135449093000000002]
google_compute_instance_template.igm-canary: Destruction complete after 3s

Error: Error deleting instance template: googleapi: Error 400: The instance_template resource 'projects/***/global/instanceTemplates/reproducer-stable-20191118135449093000000001' is already being used by 'projects/***zones/us-central1-c/instanceGroupManagers/reproducer', resourceInUseByAnotherResource

Then it works fine on the second execution (because it only remains resources to delete).
Is it still a terraform-core issue?

The debug log is here: https://gist.github.com/migibert/c88cfac6020761c9d7f903251d047574

@slevenick
Copy link
Collaborator

slevenick commented Nov 19, 2019

This looks like a provider problem now!

What is happening is that terraform builds a graph of operations that need to occur to get to the intended state, in this case that looks something like:

              create new template
               /           \
delete old template     update instance group manager

Where we create the new template before anything else, but then updating the IGM and deleting the old template happen in parallel. This is an issue because the API requires us to update the IGM before deleting the old template, as the IGM references the old template. I imagine something changed in how terraform core builds the graph between the last couple versions which is why we are seeing this now. I would guess that we were getting lucky in the ordering of operations before, causing the update to the IGM to happen before the delete.

I believe I can fix this by adding a retry to the delete of the instance template, so that it will wait long enough for the IGM to be updated to not reference it anymore

@slevenick
Copy link
Collaborator

Not entirely sure this is a provider issue anymore. I've filed an issue upstream about the change in behavior between 0.12.13 and 0.12.16.

@migibert
Copy link
Contributor Author

Thanks for investigating, I will closely monitor the upstream issue!

@slevenick
Copy link
Collaborator

Going to close this out as it should be fixed in the next version of terraform core via that upstream issue and fix

@ghost
Copy link

ghost commented Mar 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants