Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry Okta API requests on connection errors #246

Closed
khitrenovich opened this issue Dec 17, 2020 · 2 comments · Fixed by #254
Closed

Retry Okta API requests on connection errors #246

khitrenovich opened this issue Dec 17, 2020 · 2 comments · Fixed by #254
Labels
enhancement Asking for new behavior or feature

Comments

@khitrenovich
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

We are running Terraform from a Docker container. On top of that, in our CI we use GitHub Actions (which adds at least one another layer of virtualization), and Okta Preview tenants (which are less stable and capable). As a result, we are hitting a relatively high amount of network errors

Recently we added more Okta resources to be managed by Terraform, and network errors while talking to Okta became a real pain - on average, every other build is failing on connection reset. Ability to configure API call retries on network errors will help us a lot to increase CI stability.

Sample errors:

Error: failed to get user custom schema: failed to get user type: Get "https://...../api/v1/meta/types/user/default": read tcp 172.17.0.2:47366->35.172.155.67:443: read: connection timed out
Error: failed to find factor: Get "https://...../api/v1/org/factors": read tcp 172.17.0.2:47364->35.172.155.67:443: read: connection timed out
Error: failed to get inline hook: Get "https://...../api/v1/inlineHooks/calrrxzogs7IKpn3b0h7": read tcp 172.17.0.2:55408->35.172.155.69:443: read: connection timed out
Error: failed to update auth server policy rule: Put "https://...../api/v1/authorizationServers/ausvizvhm6vlenFDt0h7/policies/00pvizsv1vaWuiQER0h7/rules/0prvizv0097tqfRS00h7": read tcp 172.17.0.2:59782->35.172.155.67:443: read: connection timed out
Error: failed to get auth server: Get "https://...../api/v1/authorizationServers/ausrbrpud5xXlHUCt0h7/claims/oclrbs0fmx0radqoz0h7": read tcp 172.17.0.2:44416->35.172.155.67:443: read: connection timed out
Error: failed to sync groups and users for OAuth application: failed to list application users: Get "https://...../api/v1/apps/0oarbrtmj7TqQUL9W0h7/users?limit=200": read tcp 172.17.0.2:40116->35.172.155.67:443: read: connection timed out
Error: failed to get authorization server: Get "https://...../api/v1/authorizationServers/ausrbrpud5xXlHUCt0h7": read tcp 172.17.0.2:52004->35.172.155.67:443: read: connection reset by peer
Error: failed to get auth server policy: Get "https://...../api/v1/authorizationServers/ausvizvhm6vlenFDt0h7/policies/00pvizsv1vaWuiQER0h7": read tcp 172.17.0.2:33930->35.172.155.69:443: read: connection timed out
Error: failed to get OAuth application: Get "https://...../api/v1/apps/0oavixatdcrZn5DFm0h7": read tcp 172.17.0.2:34922->35.172.155.67:443: read: connection timed out

Note:
In our setup we manage a significant amount of AWS resources and several Okta resources. The fact that we rarely (if not never) failing on connection issues to AWS causes me to believe that AWS provider had solved that issue somehow. Maybe their implementation can be used as a reference...

@khitrenovich khitrenovich added the enhancement Asking for new behavior or feature label Dec 17, 2020
@khitrenovich
Copy link
Author

Another related error I just spotted -

Error: failed to get user base schema: failed to get user type: Get "https://...../api/v1/meta/types/user/default": net/http: TLS handshake timeout

@bogdanprodan-okta
Copy link
Contributor

Hi @khitrenovich! Thanks for submitting this issue. We might add retry logic for HTTP client for such errors. I'll do some investigation on AWS and see what can we reference from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Asking for new behavior or feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants