-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Service Account creation by ignoring 403 errors on read polling #11811
Fix Service Account creation by ignoring 403 errors on read polling #11811
Conversation
Hello! I am a robot. Tests will require approval from a repository maintainer to run. @c2thorn, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look. You can help make sure that review is quick by doing a self-review and by running impacted tests locally. |
@c2thorn This PR has been waiting for review for 3 weekdays. Please take a look! Use the label |
@@ -152,7 +152,8 @@ func resourceGoogleServiceAccountCreate(d *schema.ResourceData, meta interface{} | |||
|
|||
// We poll until the resource is found due to eventual consistency issue | |||
// on part of the api https://cloud.google.com/iam/docs/overview#consistency | |||
err = transport_tpg.PollingWaitTime(resourceServiceAccountPollRead(d, meta), transport_tpg.PollCheckForExistence, "Creating Service Account", d.Timeout(schema.TimeoutCreate), 1) | |||
// IAM API returns 403 when the queried SA is not found, so we must ignore both 404 & 403 errors | |||
err = transport_tpg.PollingWaitTime(resourceServiceAccountPollRead(d, meta), transport_tpg.PollCheckForExistenceWith403, "Creating Service Account", d.Timeout(schema.TimeoutCreate), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have eventual consistency checks for 403's in https://github.com/GoogleCloudPlatform/magic-modules/pull/11811/files#diff-0541a83ba5cbaa8bd8a0cd9128218cc5766b18aeaa97015ccaa06b301511e08cR145
Wouldn't this just add even more polling for 403's?
In your testing, is the resource failing at this specific line? If so, what if we just moved the existing sleep already present in https://github.com/GoogleCloudPlatform/magic-modules/pull/11811/files#diff-0541a83ba5cbaa8bd8a0cd9128218cc5766b18aeaa97015ccaa06b301511e08cR165 to before this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have eventual consistency checks for 403'
Yes, the addition was actually introduced by me. The problem how I understand it is, it only handles the first Read after initial creation of the account. When the initial Create + Read succeeds, the resource is added to the TF state.
The poller then polls to check if the account indeed exists to be sure the SA has been eventually created. But here is the problem - as the IAM API is eventually consistent, it returns 403 when the SA is not found (yes, it's not a mistake, not 404, but 403 is returned here, it's a bit misleading as it masks not found behind IAM_PERMISSION_DENIED for seurity reasons). As the poller immediatelly exits on any other error than 404, all the other defensive logic that attempts to deal with the EC is ineffective as the provider immediately fails.
Also, the 10s sleep that was also meant to deal with that is actually never reached https://github.com/GoogleCloudPlatform/magic-modules/pull/11811/files#diff-0541a83ba5cbaa8bd8a0cd9128218cc5766b18aeaa97015ccaa06b301511e08cR162-R165
You can see it in the log output in the issue I 've submitted hashicorp/terraform-provider-google#19624 - notice it fails within 1s as the poll read returns 403 and the whole provider fails.
I personally think, the 10s sleep could even be removed iif the poller would handle 403 as I'm proposing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thank you for providing the logs. I think this makes sense, and looks like the previous sleep did not actually address this issue.
Considering we are just swapping out the PollCheck used, I think this should be good to merge.
@@ -152,7 +152,8 @@ func resourceGoogleServiceAccountCreate(d *schema.ResourceData, meta interface{} | |||
|
|||
// We poll until the resource is found due to eventual consistency issue | |||
// on part of the api https://cloud.google.com/iam/docs/overview#consistency | |||
err = transport_tpg.PollingWaitTime(resourceServiceAccountPollRead(d, meta), transport_tpg.PollCheckForExistence, "Creating Service Account", d.Timeout(schema.TimeoutCreate), 1) | |||
// IAM API returns 403 when the queried SA is not found, so we must ignore both 404 & 403 errors | |||
err = transport_tpg.PollingWaitTime(resourceServiceAccountPollRead(d, meta), transport_tpg.PollCheckForExistenceWith403, "Creating Service Account", d.Timeout(schema.TimeoutCreate), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thank you for providing the logs. I think this makes sense, and looks like the previous sleep did not actually address this issue.
Considering we are just swapping out the PollCheck used, I think this should be good to merge.
running build first to confirm no issues |
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are.
|
Tests analyticsTotal tests: 147 Click here to see the affected service packages
🟢 All tests passed! View the build log |
Fix issues with
google_service_account
resource creation due to eventual consistency of GCP IAM API. Fixes hashicorp/terraform-provider-google#19624 and potentially other bugs.