Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Service Account creation by ignoring 403 errors on read polling #11811

Merged

Conversation

sbocinec
Copy link
Contributor

@sbocinec sbocinec commented Sep 25, 2024

Fix issues with google_service_account resource creation due to eventual consistency of GCP IAM API. Fixes hashicorp/terraform-provider-google#19624 and potentially other bugs.

iam: addressed `google_service_account` creation issues caused by the eventual consistency of the GCP IAM API by ignoring 403 errors returned on polling the service account after creation.

@modular-magician modular-magician added the awaiting-approval Pull requests that need reviewer's approval to run presubmit tests label Sep 25, 2024
@sbocinec sbocinec marked this pull request as ready for review September 25, 2024 15:59
@github-actions github-actions bot requested a review from c2thorn September 25, 2024 15:59
Copy link

Hello! I am a robot. Tests will require approval from a repository maintainer to run.

@c2thorn, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look.

You can help make sure that review is quick by doing a self-review and by running impacted tests locally.

Copy link

@c2thorn This PR has been waiting for review for 3 weekdays. Please take a look! Use the label disable-review-reminders to disable these notifications.

@@ -152,7 +152,8 @@ func resourceGoogleServiceAccountCreate(d *schema.ResourceData, meta interface{}

// We poll until the resource is found due to eventual consistency issue
// on part of the api https://cloud.google.com/iam/docs/overview#consistency
err = transport_tpg.PollingWaitTime(resourceServiceAccountPollRead(d, meta), transport_tpg.PollCheckForExistence, "Creating Service Account", d.Timeout(schema.TimeoutCreate), 1)
// IAM API returns 403 when the queried SA is not found, so we must ignore both 404 & 403 errors
err = transport_tpg.PollingWaitTime(resourceServiceAccountPollRead(d, meta), transport_tpg.PollCheckForExistenceWith403, "Creating Service Account", d.Timeout(schema.TimeoutCreate), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have eventual consistency checks for 403's in https://github.com/GoogleCloudPlatform/magic-modules/pull/11811/files#diff-0541a83ba5cbaa8bd8a0cd9128218cc5766b18aeaa97015ccaa06b301511e08cR145

Wouldn't this just add even more polling for 403's?

In your testing, is the resource failing at this specific line? If so, what if we just moved the existing sleep already present in https://github.com/GoogleCloudPlatform/magic-modules/pull/11811/files#diff-0541a83ba5cbaa8bd8a0cd9128218cc5766b18aeaa97015ccaa06b301511e08cR165 to before this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have eventual consistency checks for 403'
Yes, the addition was actually introduced by me. The problem how I understand it is, it only handles the first Read after initial creation of the account. When the initial Create + Read succeeds, the resource is added to the TF state.

The poller then polls to check if the account indeed exists to be sure the SA has been eventually created. But here is the problem - as the IAM API is eventually consistent, it returns 403 when the SA is not found (yes, it's not a mistake, not 404, but 403 is returned here, it's a bit misleading as it masks not found behind IAM_PERMISSION_DENIED for seurity reasons). As the poller immediatelly exits on any other error than 404, all the other defensive logic that attempts to deal with the EC is ineffective as the provider immediately fails.

Also, the 10s sleep that was also meant to deal with that is actually never reached https://github.com/GoogleCloudPlatform/magic-modules/pull/11811/files#diff-0541a83ba5cbaa8bd8a0cd9128218cc5766b18aeaa97015ccaa06b301511e08cR162-R165

You can see it in the log output in the issue I 've submitted hashicorp/terraform-provider-google#19624 - notice it fails within 1s as the poll read returns 403 and the whole provider fails.
image

I personally think, the 10s sleep could even be removed iif the poller would handle 403 as I'm proposing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you for providing the logs. I think this makes sense, and looks like the previous sleep did not actually address this issue.

Considering we are just swapping out the PollCheck used, I think this should be good to merge.

@@ -152,7 +152,8 @@ func resourceGoogleServiceAccountCreate(d *schema.ResourceData, meta interface{}

// We poll until the resource is found due to eventual consistency issue
// on part of the api https://cloud.google.com/iam/docs/overview#consistency
err = transport_tpg.PollingWaitTime(resourceServiceAccountPollRead(d, meta), transport_tpg.PollCheckForExistence, "Creating Service Account", d.Timeout(schema.TimeoutCreate), 1)
// IAM API returns 403 when the queried SA is not found, so we must ignore both 404 & 403 errors
err = transport_tpg.PollingWaitTime(resourceServiceAccountPollRead(d, meta), transport_tpg.PollCheckForExistenceWith403, "Creating Service Account", d.Timeout(schema.TimeoutCreate), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you for providing the logs. I think this makes sense, and looks like the previous sleep did not actually address this issue.

Considering we are just swapping out the PollCheck used, I think this should be good to merge.

@c2thorn c2thorn self-requested a review October 2, 2024 19:40
@c2thorn
Copy link
Member

c2thorn commented Oct 2, 2024

running build first to confirm no issues

@modular-magician modular-magician added service/iam-serviceaccount and removed awaiting-approval Pull requests that need reviewer's approval to run presubmit tests labels Oct 2, 2024
@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 1 file changed, 2 insertions(+), 1 deletion(-))
google-beta provider: Diff ( 1 file changed, 2 insertions(+), 1 deletion(-))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 147
Passed tests: 117
Skipped tests: 30
Affected tests: 0

Click here to see the affected service packages
  • resourcemanager

🟢 All tests passed!

View the build log

@c2thorn c2thorn merged commit e6af55b into GoogleCloudPlatform:main Oct 2, 2024
13 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

google_service_account read after creation issue - apply fails with 403 right after creation
3 participants