access_context_manager_service_perimeter egress/ingress rule resource exists in statefile but terraform wants to create again #19203

roman2025 · 2024-08-21T16:41:09Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to a user, that user is claiming responsibility for the issue.
Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

Terraform v1.3.10
on Alpine Linux/GCP Storage Bucket for Statefile Backend Storage

provider registry.terraform.io/hashicorp/google v5.42.0

Affected Resource(s)

Affected Resources
google_access_context_manager_service_perimeter_egress_policy
google_access_context_manager_service_perimeter_ingress_policy

I would guess this also effects the new dry-run versions of these resources as well but we haven't started using those yet.

We have several fairly large GCP VPC Service Control Perimeter deployments. We use Terraform to manage the perimeters and all of the rules.

A major issue that has cropped up recently is that terraform wants to (re)create ingress/egress rules that we are creating via our deployment.

Example:
Deployment Day 1 - Perimeters created, rules added
Day 2 - Rules added, some rules updated (multiple applies throughout the day)
Day 3 - X - everything continues working just fine

Then - randomly - we will run a plan action and terraform will want to create rules like they don't already exist. We have analyzed the state file, and the rules (policies) are in there. There doesn't seem to be much rhyme or reason to when it happens but when it does, it may impact 1 rule or multiple.

I have tried to recreate this in our lower environments but thus far haven't found a silver bullet as to a cause.

I have checked the state file directly and I can also see in the plan output where it is clearly refreshing the object from state (i.e. the named resource is in the state file). The resource is definitely in the state file. But terraform wants to create the exact same named resource again with an identical configuration.

If we go ahead and do an apply, this will result in an API error from GCP stating that the rule already exists.

We typically have to "fix" the issue by going into the console, manually making a minor change to the existing rules (that were already created by terraform) and then running the apply which will then go through as GCP will allow the rules to be created because they differ. After that point, we can go in and delete the old rules that were created by an earlier deployment. This isn't tenable however as we have a lot of rules and it can be hard to step through everything and safely modify and remove rules manually (hence the reason for using TF in the first place).

We were using NESTED policies inside of the google_access_context_manager_service_perimeter resource before we switched to using the separate linked policy resource. This issue never occurred when using nested policies - however we really want to use independent policy resources (for a litany of reason).

Anyhow, that is why I think it is a bug or issue related to the two mentioned affected resources above.

Terraform Configuration

here is an example of an egress_policy block. We are using for_each with a map and the map key values are used to title the resources. The keys are static so the resource names do not change in state. We are doing something similar with ingress policies.

resource "google_access_context_manager_service_perimeter_egress_policy" "status_this" {
    for_each = local.enforced ? {for p in local.egress_policies : p["title"] => p} : {}
    perimeter = google_access_context_manager_service_perimeter.this.name
    depends_on = [
        google_access_context_manager_service_perimeter_resource.status_projects
    ]
    lifecycle {
        create_before_destroy = true
    }
    egress_from {
        identity_type = lookup(each.value["from"], "identity_type", null)
        identities    = lookup(each.value["from"], "identities", null)
    }
    egress_to {
        resources = [for project in lookup(each.value["to"], "resources", ["*"]) : startswith(project,"projects/") || project == "*" ? project : format("%s/%s", "projects",lookup(local.gcp_projects_map, project, "NAME_NOT_FOUND"))]
        dynamic "operations" {
            for_each = lookup(each.value["to"], "operations", [])
            content {
                service_name = operations.key
                dynamic "method_selectors" {
                    for_each = operations.key != "*" ? merge(
                        { for v in lookup(operations.value, "methods", []) : v => "method" },
                        { for v in lookup(operations.value, "permissions", []) : v => "permission" }
                    ) : {}
                    content {
                        method     = method_selectors.value == "method" ? method_selectors.key : null
                        permission = method_selectors.value == "permission" ? method_selectors.key : null
                    }
                }
            }
        }
    }
}

Debug Output

Unfortunately I am unable to recreate the issue (despite much efforts) and our terraform is running via a devops pipeline that has discarded the builds that had the output. If the issue crops up again I will do my best to get debug logs from the plan action.

Expected Behavior

Terraform Plan should see the resources in the state file and NOT want to recreate them.

Actual Behavior

Terraform Plan wants to create resources that already exist in state and in our environment which causes an apply failure because GCP will not allow duplicates of VPC Service Control perimeter policies (nor would we want them).

I should clarify this is a brand new resource creation action, not a replace and update-in-place. It's like TF has no knowledge of the existing resource even though it is in state.

Steps to reproduce

I am not sure what triggers this behavior. It only happens every so often (once every 12 - 20 applies perhaps) and there doesn't seem to be any consistent changes or anything that precede it happening.

In short, setup a VPC Service control perimeter with terraform and have it protect 20 some odd projects. Added 30 - 40 ingress and egress rules via separate linked resources in the terraform config and keep making changes to the rules and planning and applying until the issue crops up.

Important Factoids

No response

References

This description on the AWS Provider thread seems almost identical to what we are experiencing however the issue it was said to be a duplicate it of is not the same.

hashicorp/terraform#3498

This was marked as duplicate of another issue. That other issue is not the same because in that case the created resources were never making it INTO the statefile, However in the above issue and our issue, the resources ARE in the statefile.

b/362264399

ggtisc · 2024-08-26T16:15:14Z

Confirmed permadiff issue

steffencircle · 2024-11-20T09:03:06Z

Hi,

we also switched from "inline" Ingress/Egress Rules to using the new separate resource and where confused when we found out that several rules are present multiple times in a perimeter.

After further investigation turns out we faced the same issue as you are describing above and our Pipeline that executes Terraform has "thought" multiple times that a rule is missing (although being in the state) and added it again.

Since the ACM Api allows duplicates , this ends up in quite a mess.

We now switched to provider version 6.12 as it contains several "permadiff" bugfixes for these resources.
We wiped all Ingress/Egress Rules and re-deployed them with provider version 6.12.

So far it looks much better and we will continue to keep an eye on the matter if it is really gone.

roman2025 · 2024-11-22T03:24:20Z

Thank you so much for sharing! I really appreciate the extra sanity check.

So the company I work for has enterprise support with Google. I raised a critical ticket (shortly after creating the initial report here) because this was really screwing us up. It took forever and much chasing to finally get it pushed through to the right people at Google. After that I spent weeks trying to reproduce the bug (after I had reconfigured our pipeline to always send full debug logs for every run) and I sent a metric ton of log data over to them.

It took a long time but after I finally got it to the right engineer they listened and spent a lot of time with me in working sessions and looked and dug through logs with me. I had written several heavily customized modules to handle or vpc service controls and rules (because the base resources are terribly unwieldy for large complex deployments, I was assured they are doing some things to help on that front as well) and I was worried they would write me off because we were using custom modules but thankfully they didn't!

They released 6.11 finally with a fix for the first issue but introduced another similar and worse permadiff ironically in the process of fixing the first and also removed the Terraform protection against creating duplicate policies. So when we went to 6.11 things actually got worse (and perhaps that is what you experienced because before that Terraform would just error out mid deployment when trying to create a duplicate rule). So I took that back to the lead engineer at Google with more debug logs and they were much quicker this time as we partly knew what to look for. I was able to more quickly reproduce the issue as well.

They just released 6.12 this past week and it's in my sprint to move our deployments to it and test. So so glad to hear you are having a better experience with it! It's encouraging to know the months spent collecting logs and working with support benefited more than just our organization 🙃!

Anyhow, I will be testing this next sprint but just wanted to say thank you for responding and sharing your experience. When this first started happenieng I was digging through all my module code for 2 weeks trying to determine if it was self caused or a bug. Was really thankful when I got to the right engineering team and they took it seriously and every additional sanity check from outside parties is a real encouragement.

Cheers

steffencircle · 2024-12-03T10:26:41Z

Hi,

wanted to share some bad news here. The issue is still happening for us, also on provider version 6.12.0.

I just came across the following issue, which seems to perfectly explain the behavior and also the steps on how to reproduce
#20519

When the perimeter is deployed and we then deploy Ingress/Egress rules as separate resources, everything seems fine.
But as soon as we change something for the perimeter ( in our case it was an additional project ) then the following run for ingress/Egress rules sees a diff and "thinks" it has to redeploy them again.
In our case this ended up in a Quota/Limit Violation for the Perimeter Settings.

We will also open a Google Support Case for this !

roman2025 · 2024-12-03T14:21:32Z

That is sad news :(. Our environment has been stable but I am not sure if we have done any project additions to perimeters lately. I just skimmed that other issue you linked and one thing I will note is that our perimeter resource and our rule resources are all in the same deployment. It sounds like in that other issue they are deploying rules in a separate deployment. I will read through it some more though and will see if I can't reproduce it on our side.

Charlesleonius · 2024-12-05T19:14:52Z

I want to apologize for the myriad of issues around these resources. The design of these resources does not match the API behavior in some cases and that is leading to these issues. We are addressing them as soon as they pop up and we are working to develop detection for such issues so that we can address them before they affect you. I believe the two issues being discussed here are related but different. I am hoping the same fix can be applied for the other issue therefore mitigating both but I am in the process of testing and determining whether this is the case. I assure we are working as fast as possible to address these.

Charlesleonius · 2024-12-16T07:48:58Z

Created GoogleCloudPlatform/magic-modules#12572 to address this

roman2025 added the bug label Aug 21, 2024

github-actions bot added forward/review In review; remove label to forward service/accesscontextmanager labels Aug 21, 2024

ggtisc self-assigned this Aug 23, 2024

ggtisc removed their assignment Aug 26, 2024

ggtisc removed the forward/review In review; remove label to forward label Aug 26, 2024

modular-magician added the forward/linked label Aug 26, 2024

Charlesleonius mentioned this issue Dec 16, 2024

Fix permadiff when Access Context Manager returns a different order for ingress / egress rule identities GoogleCloudPlatform/magic-modules#12572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

access_context_manager_service_perimeter egress/ingress rule resource exists in statefile but terraform wants to create again #19203

access_context_manager_service_perimeter egress/ingress rule resource exists in statefile but terraform wants to create again #19203

roman2025 commented Aug 21, 2024 •

edited by modular-magician

Loading

ggtisc commented Aug 26, 2024

steffencircle commented Nov 20, 2024

roman2025 commented Nov 22, 2024 •

edited

Loading

steffencircle commented Dec 3, 2024

roman2025 commented Dec 3, 2024

Charlesleonius commented Dec 5, 2024

Charlesleonius commented Dec 16, 2024

access_context_manager_service_perimeter egress/ingress rule resource exists in statefile but terraform wants to create again #19203

access_context_manager_service_perimeter egress/ingress rule resource exists in statefile but terraform wants to create again #19203

Comments

roman2025 commented Aug 21, 2024 • edited by modular-magician Loading

Community Note

Terraform Version & Provider Version(s)

Affected Resource(s)

Terraform Configuration

Debug Output

Expected Behavior

Actual Behavior

Steps to reproduce

Important Factoids

References

ggtisc commented Aug 26, 2024

steffencircle commented Nov 20, 2024

roman2025 commented Nov 22, 2024 • edited Loading

steffencircle commented Dec 3, 2024

roman2025 commented Dec 3, 2024

Charlesleonius commented Dec 5, 2024

Charlesleonius commented Dec 16, 2024

roman2025 commented Aug 21, 2024 •

edited by modular-magician

Loading

roman2025 commented Nov 22, 2024 •

edited

Loading