Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deletion_protection argument to google_container_cluster #10168

Closed
joe-a-t opened this issue Sep 27, 2021 · 18 comments
Closed

Add deletion_protection argument to google_container_cluster #10168

joe-a-t opened this issue Sep 27, 2021 · 18 comments

Comments

@joe-a-t
Copy link

joe-a-t commented Sep 27, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment. If the issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If the issue is assigned to a user, that user is claiming responsibility for the issue. If the issue is assigned to "hashibot", a community member has claimed the issue already.

Description

Add deletion_protection argument to google_container_cluster, the same as how it currently exists for some other longer-lived resources like sql_database_instance. This would be extremely useful since, similar to losing a database, losing a GKE cluster is typically a difficult process for people to replace.

New or Affected Resource(s)

  • google_container_cluster

Potential Terraform Configuration

resource "google_container_cluster" "primary" {
  name     = "my-gke-cluster"
  location = "us-central1"

  remove_default_node_pool = true
  initial_node_count       = 1

  # New addition
  deletion_protection = true
}

References

b/299312662

@rileykarson
Copy link
Collaborator

Capturing my thoughts from triage earlier today:

  • We have a precedent to do this for resources that contain customer databases
  • Conversely, I think we've rejected the same on google_compute_instance and google_compute_disk, which are compute products and much more reasonable to delete as part of a normal workflow (even if they do hold data)
  • If we decide we can make this change we should do it, hence near-term goals

I'm undecided if we should make the change at all or not, as GKE is a mix between a compute product / having a database (the apiserver). I'm leaning yes, given that deleting a GKE cluster is pretty clearly not part of normal operations.

I wish Terraform was less delete-prone from the start, so that this wasn't decided on a case-by-case basis by our team!

@joe-a-t
Copy link
Author

joe-a-t commented Sep 28, 2021

Definitely, hashicorp/terraform#3116 would be highly preferable and allow you to rip out all of the google provider specific deletion_protection stuff. I highly doubt that it will happen though since it's been open without progress for 6+ years so it would be great if we could add the Google provider protection for GKE clusters.

@joe-a-t
Copy link
Author

joe-a-t commented Apr 4, 2023

@rileykarson want to huge bump this in priority, just happened again with #14203 where this would have been extremely helpful

@joe-a-t
Copy link
Author

joe-a-t commented Apr 19, 2023

@rileykarson now that we are over 2 weeks from the incident that caused us to re-bump this, can you please update on the current status? If this is not getting executed on in the near term, my company's leadership is pushing for us to do some very hacky work arounds that I really don't want to do, especially since it sounds like we are aligned on the value and desirability of adding this feature.

@rileykarson
Copy link
Collaborator

I'm following up with some relevant stakeholders internally to Google and will get back to this when I've collected enough feedback. This is likely to take O(weeks) to turn around a final decision on.

The initial folks I talked to expressed concern about adding a deletion_protection field that defaults to true in a minor version- this is a kind of change we've been a little hesitant about adding even for database products and given the precedent initially set w/ Bigtable. GKE falls somewhere between a compute product, a database product, and a configuration management product depending on the exact users so it's not quite the same. On the other hand I think that introducing a false default would introduce a lot of opportunity for inconsistency (there's never just one exception w/ these kind of things...)- I want all of these to be consistent.

my company's leadership is pushing for us to do some very hacky work arounds that I really don't want to do, especially since it sounds like we are aligned on the value and desirability of adding this feature.

Depending on your internal platform, I've heard that customers have used policy tools like Sentinel or gcloud terraform vet to guard against deletions in these kind of cases.

Additionally I'll raise https://github.com/hashicorp/terraform-provider-google/wiki/Customer-Contact#raising-gcp-internal-issues-with-the-provider-development-team as an extra feedback mechanism, in addition to this issue.


Thanks for kicking up the discussion on those hashicorp/terraform issues! I'm not really satisfied with the central response at the moment. I'll discuss with them through our channels. I'm leaning towards adding some provider-wide improvements to this stuff to the 5.0.0 thunderdome for consideration. Out of the top 25 issues, 6 have to do with lifecycle.prevent_destroy and that's after the issue with the largest scope (hashicorp/terraform#3116) got locked way back:

@joe-a-t
Copy link
Author

joe-a-t commented Apr 21, 2023

Regarding the default conversation, what about introducing the variable now with a default to false (and a warning that it will change) then in 5.0.0, change the default to true? By taking that path, those of us who would benefit from this get it immediately but you get to wait until a major release before having it impact people who don't care about the addition of this feature.

Additionally I'll raise https://github.com/hashicorp/terraform-provider-google/wiki/Customer-Contact#raising-gcp-internal-issues-with-the-provider-development-team as an extra feedback mechanism, in addition to this issue.

Yes, obviously I can't see the document behind those links, but I believe we have been following it via our TAM Fabio who I believe has chatted with you about this. We've also had support case 44319400 which spawned https://issuetracker.google.com/issues/277234815 as part of our full court press on this issue. We are very heavy users of the provider so we'd also love to sit down if you're ever open to doing customer interviews.

@joe-a-t
Copy link
Author

joe-a-t commented May 11, 2023

@rileykarson would ^ be a reasonable approach to move forwards with in the immediate term?

@rileykarson
Copy link
Collaborator

Sorry, been a bit swamped! quick updated: Based on the mentioned internal discussions we're definitely going to avoid the default-to-true option in a minor release, and hold that for a major.

Yes, obviously I can't see the document behind those links, but...

I'll follow up through those channels!

I've reemphasized the chain of issues above with HashiCorp, also, so hopefully we see some additional activity as a result. Gonna continue doing so intermittently.

Regarding the default conversation, what about introducing the variable now with a default to false (and a warning that it will change) then in 5.0.0, change the default to true?

I'm really hesitant to break the rule that deletion_protection defaults to being true. While it's true that you'd need to know it is / isn't supported today, adding a second invariant to check complicates things more for folks. For example, if someone cmd/ctrl-f'ed for the field name but didn't read the description in detail.

@joe-a-t
Copy link
Author

joe-a-t commented May 12, 2023

I'm really hesitant to break the rule that deletion_protection defaults to being true. While it's true that you'd need to know it is / isn't supported today, adding a second invariant to check complicates things more for folks. For example, if someone cmd/ctrl-f'ed for the field name but didn't read the description in detail.

I think the part that I am struggling with as someone who has dealt with multiple, very impactful major incidents related to this is the balancing of the real, demonstrated multiple times harm of NOT having this option vs the potential for temporary confusion until the next major release. Like to me the benefits of adding it now and accepting the temporary risk of confusion for people who don't read the docs closely before the next major release and don't explicitly set a value are super clear because I've lived through multiple MIs related to this. Also, it is always possible for someone who does not closely read the docs to shoot themselves in the foot so I don't put a lot of weight in that argument as a reason why we should not have an option that prevents major incidents for people who have read the docs and do understand what they are doing, especially since there is a clear timeline (the next major release) for resolving those concerns about users being confused by the default behavior.

@joe-a-t
Copy link
Author

joe-a-t commented May 23, 2023

@rileykarson does that make sense and would implementing this in the next release be possible? This is a really big deal for my company and we really don't want to wait until the fall.

Additionally, I think there is a pretty good argument that introducing the argument before the behavior of the resource changes would be advantageous for users since they could explicitly set the argument for their desired behavior before there is a change to the default behavior. If the argument is only introduced when the default behavior is changed, then people have to make sure they are timing the introduction of the new argument and the provider version upgrade if they want to maintain existing behavior of deletion_protection = false

@rileykarson
Copy link
Collaborator

rileykarson commented May 23, 2023

I'll discuss w/ some folks on the team! I don't think that the next release would be possible (that cutoff is EOD today), but within the next few weeks would be if we decide to move forward w/ it.

Edit: Removing milestone to make sure this comes up in triage.

@rileykarson rileykarson removed this from the Near-Term Goals milestone May 23, 2023
@rileykarson rileykarson added this to the 5.0.0 milestone Jun 5, 2023
@joe-a-t
Copy link
Author

joe-a-t commented Jun 6, 2023

@rileykarson wanted to see what the results of triage were since I saw you added the 5.0.0 milestone. Is the idea of adding the flag now with a default of false on the table/happening soon or has that idea been rejected in favor of adding it with default of true in 5.0.0 and having all of the associated upgrade timing concerns as a result?

@rileykarson
Copy link
Collaborator

Sorry, meant to follow up yesterday after clicking the button! I followed up w/ a few other maintainers and the general consensus was that we should wait for 5.0.0 for this, avoiding introducing a default that's different from other deletion_protection fields.

@sanmaym
Copy link

sanmaym commented Sep 18, 2023

do we have at ETA on 5.0.0 release ? @rileykarson

@rileykarson
Copy link
Collaborator

Oct 2, see #15582

@sanmaym
Copy link

sanmaym commented Sep 19, 2023 via email

@c2thorn
Copy link
Collaborator

c2thorn commented Sep 26, 2023

closed with GoogleCloudPlatform/magic-modules#9013

Targeted for the 5.0.0 release which is scheduled for next week.

@c2thorn c2thorn closed this as completed Sep 26, 2023
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants