-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
access_context_manager_service_perimeter egress/ingress rule resource exists in statefile but terraform wants to create again #19203
Comments
Confirmed |
Hi, we also switched from "inline" Ingress/Egress Rules to using the new separate resource and where confused when we found out that several rules are present multiple times in a perimeter. After further investigation turns out we faced the same issue as you are describing above and our Pipeline that executes Terraform has "thought" multiple times that a rule is missing (although being in the state) and added it again. Since the ACM Api allows duplicates , this ends up in quite a mess. We now switched to provider version 6.12 as it contains several "permadiff" bugfixes for these resources. So far it looks much better and we will continue to keep an eye on the matter if it is really gone. |
Thank you so much for sharing! I really appreciate the extra sanity check. So the company I work for has enterprise support with Google. I raised a critical ticket (shortly after creating the initial report here) because this was really screwing us up. It took forever and much chasing to finally get it pushed through to the right people at Google. After that I spent weeks trying to reproduce the bug (after I had reconfigured our pipeline to always send full debug logs for every run) and I sent a metric ton of log data over to them. It took a long time but after I finally got it to the right engineer they listened and spent a lot of time with me in working sessions and looked and dug through logs with me. I had written several heavily customized modules to handle or vpc service controls and rules (because the base resources are terribly unwieldy for large complex deployments, I was assured they are doing some things to help on that front as well) and I was worried they would write me off because we were using custom modules but thankfully they didn't! They released 6.11 finally with a fix for the first issue but introduced another similar and worse permadiff ironically in the process of fixing the first and also removed the Terraform protection against creating duplicate policies. So when we went to 6.11 things actually got worse (and perhaps that is what you experienced because before that Terraform would just error out mid deployment when trying to create a duplicate rule). So I took that back to the lead engineer at Google with more debug logs and they were much quicker this time as we partly knew what to look for. I was able to more quickly reproduce the issue as well. They just released 6.12 this past week and it's in my sprint to move our deployments to it and test. So so glad to hear you are having a better experience with it! It's encouraging to know the months spent collecting logs and working with support benefited more than just our organization 🙃! Anyhow, I will be testing this next sprint but just wanted to say thank you for responding and sharing your experience. When this first started happenieng I was digging through all my module code for 2 weeks trying to determine if it was self caused or a bug. Was really thankful when I got to the right engineering team and they took it seriously and every additional sanity check from outside parties is a real encouragement. Cheers |
Hi, wanted to share some bad news here. The issue is still happening for us, also on provider version 6.12.0. I just came across the following issue, which seems to perfectly explain the behavior and also the steps on how to reproduce When the perimeter is deployed and we then deploy Ingress/Egress rules as separate resources, everything seems fine. We will also open a Google Support Case for this ! |
That is sad news :(. Our environment has been stable but I am not sure if we have done any project additions to perimeters lately. I just skimmed that other issue you linked and one thing I will note is that our perimeter resource and our rule resources are all in the same deployment. It sounds like in that other issue they are deploying rules in a separate deployment. I will read through it some more though and will see if I can't reproduce it on our side. |
I want to apologize for the myriad of issues around these resources. The design of these resources does not match the API behavior in some cases and that is leading to these issues. We are addressing them as soon as they pop up and we are working to develop detection for such issues so that we can address them before they affect you. I believe the two issues being discussed here are related but different. I am hoping the same fix can be applied for the other issue therefore mitigating both but I am in the process of testing and determining whether this is the case. I assure we are working as fast as possible to address these. |
Created GoogleCloudPlatform/magic-modules#12572 to address this |
Community Note
Terraform Version & Provider Version(s)
Terraform v1.3.10
on Alpine Linux/GCP Storage Bucket for Statefile Backend Storage
Affected Resource(s)
Affected Resources
google_access_context_manager_service_perimeter_egress_policy
google_access_context_manager_service_perimeter_ingress_policy
I would guess this also effects the new dry-run versions of these resources as well but we haven't started using those yet.
We have several fairly large GCP VPC Service Control Perimeter deployments. We use Terraform to manage the perimeters and all of the rules.
A major issue that has cropped up recently is that terraform wants to (re)create ingress/egress rules that we are creating via our deployment.
Example:
Deployment Day 1 - Perimeters created, rules added
Day 2 - Rules added, some rules updated (multiple applies throughout the day)
Day 3 - X - everything continues working just fine
Then - randomly - we will run a plan action and terraform will want to create rules like they don't already exist. We have analyzed the state file, and the rules (policies) are in there. There doesn't seem to be much rhyme or reason to when it happens but when it does, it may impact 1 rule or multiple.
I have tried to recreate this in our lower environments but thus far haven't found a silver bullet as to a cause.
I have checked the state file directly and I can also see in the plan output where it is clearly refreshing the object from state (i.e. the named resource is in the state file). The resource is definitely in the state file. But terraform wants to create the exact same named resource again with an identical configuration.
If we go ahead and do an apply, this will result in an API error from GCP stating that the rule already exists.
We typically have to "fix" the issue by going into the console, manually making a minor change to the existing rules (that were already created by terraform) and then running the apply which will then go through as GCP will allow the rules to be created because they differ. After that point, we can go in and delete the old rules that were created by an earlier deployment. This isn't tenable however as we have a lot of rules and it can be hard to step through everything and safely modify and remove rules manually (hence the reason for using TF in the first place).
We were using NESTED policies inside of the google_access_context_manager_service_perimeter resource before we switched to using the separate linked policy resource. This issue never occurred when using nested policies - however we really want to use independent policy resources (for a litany of reason).
Anyhow, that is why I think it is a bug or issue related to the two mentioned affected resources above.
Terraform Configuration
here is an example of an egress_policy block. We are using for_each with a map and the map key values are used to title the resources. The keys are static so the resource names do not change in state. We are doing something similar with ingress policies.
Debug Output
Unfortunately I am unable to recreate the issue (despite much efforts) and our terraform is running via a devops pipeline that has discarded the builds that had the output. If the issue crops up again I will do my best to get debug logs from the plan action.
Expected Behavior
Terraform Plan should see the resources in the state file and NOT want to recreate them.
Actual Behavior
Terraform Plan wants to create resources that already exist in state and in our environment which causes an apply failure because GCP will not allow duplicates of VPC Service Control perimeter policies (nor would we want them).
I should clarify this is a brand new resource creation action, not a replace and update-in-place. It's like TF has no knowledge of the existing resource even though it is in state.
Steps to reproduce
I am not sure what triggers this behavior. It only happens every so often (once every 12 - 20 applies perhaps) and there doesn't seem to be any consistent changes or anything that precede it happening.
In short, setup a VPC Service control perimeter with terraform and have it protect 20 some odd projects. Added 30 - 40 ingress and egress rules via separate linked resources in the terraform config and keep making changes to the rules and planning and applying until the issue crops up.
Important Factoids
No response
References
This description on the AWS Provider thread seems almost identical to what we are experiencing however the issue it was said to be a duplicate it of is not the same.
hashicorp/terraform#3498
This was marked as duplicate of another issue. That other issue is not the same because in that case the created resources were never making it INTO the statefile, However in the above issue and our issue, the resources ARE in the statefile.
b/362264399
The text was updated successfully, but these errors were encountered: