Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test(s): Error code 9, message: encountered some transient internal error, please try again later #12902

Closed
melinath opened this issue Oct 28, 2022 · 6 comments

Comments

@melinath
Copy link
Collaborator

melinath commented Oct 28, 2022

Failure rate: 100% since 2022-10-08

Impacted tests:

  • TestAccSqlDatabaseInstance_Timezone
  • TestAccSqlDatabaseInstance_SqlServerAuditConfig
  • TestAccSqlDatabaseInstance_ActiveDirectory
  • TestAccSqlDatabaseInstance_withPrivateNetwork_withAllocatedIpRangeClone
  • TestAccSqlDatabaseInstance_withPrivateNetwork_withAllocatedIpRangeReplica
  • TestAccSqlDatabaseInstance_withPrivateNetwork_withAllocatedIpRange
  • TestAccSqlDatabaseInstance_withPrivateNetwork_withoutAllocatedIpRange
  • TestAccServiceNetworkingConnection_create
  • TestAccRedisInstance_redisInstancePrivateServiceExample
  • TestAccMemcacheInstance_update
  • TestAccMemcacheInstance_memcacheInstanceBasicExample

Nightly builds:

Message:

Error: Error waiting for Create Service Networking Connection: Error code 9, message: encountered some transient internal error, please try again later

b/256181958

@melinath melinath changed the title Failing test(s): TestAccSqlDatabaseInstance_Timezone Failing test(s): TestAccSqlDatabaseInstance_Timezone: Error code 9, message: encountered some transient internal error, please try again later Oct 28, 2022
@melinath melinath changed the title Failing test(s): TestAccSqlDatabaseInstance_Timezone: Error code 9, message: encountered some transient internal error, please try again later Failing test(s): Error code 9, message: encountered some transient internal error, please try again later Oct 28, 2022
@melinath melinath added this to the Near-Term Goals milestone Oct 31, 2022
@AarshDhokai
Copy link
Contributor

Cumulatively all the tests are failing 100% in builds :
Google Cloud and Google Cloud Beta

@AarshDhokai
Copy link
Contributor

b/261701420

@roaks3
Copy link
Collaborator

roaks3 commented Dec 14, 2022

I came across this one in a recent PR, so did a little investigating.

Some historical context I was able to dig up:

=== RUN   TestAccApigeeInstance_apigeeInstanceIpRangeTestExample
=== PAUSE TestAccApigeeInstance_apigeeInstanceIpRangeTestExample
=== CONT  TestAccApigeeInstance_apigeeInstanceIpRangeTestExample
    provider_test.go:307: Step 1/2 error: Error running apply: exit status 1
        
        Error: Error waiting to create Instance: Error waiting for Creating Instance: Error code 9, message: couldn\t find a free IP space of /22 to launch an instance. Verify the peering ranges are available as per https://cloud.google.com/apigee/docs/api-platform/get-started/install-cli#service-networking and try again
        
          with google_apigee_instance.apigee_instance,
          on terraform_plugin_test.tf line 57, in resource "google_apigee_instance" "apigee_instance":
          57: resource "google_apigee_instance" "apigee_instance" {
        
--- FAIL: TestAccApigeeInstance_apigeeInstanceIpRangeTestExample (612.07s)

@roaks3
Copy link
Collaborator

roaks3 commented Dec 14, 2022

My current theory is that we are reserving too much IP space with our tests, so at some point as our tests are being run in parallel, we run out of space and begin failing subsequent tests. I believe we should try some combination of reserving /24 blocks across existing tests, and maybe trying bootstrapped network connections so that we don't need to reserve as many blocks.

@roaks3
Copy link
Collaborator

roaks3 commented Jan 21, 2023

I checked in on this today and the Error code 9, message: encountered some transient internal error, please try again later error has not occurred on any of these tests for the last 2 weeks (starting with the Jan 6 build).

Some of these tests are failing for other reasons now, but I think it's safe to close this ticket.

As for the cause/resolution, it isn't clear to me exactly what resolved this. We had been in the process of capturing more resources with our sweepers, and the Jan 6 build was the first build in multiple months that the sweeper was able to run all the way through (due to disabling the CloudIdsEndpoints sweeper and OSPolicyAssignment sweeper). My theory is that we had accumulated networking resources in the test project, perhaps due to some of the heavier resources like Composer/SQL/Dataproc not being cleaned up, and that was in some way preventing new SQL instances from being provisioned. But it is interesting that this seemed to start with GoogleCloudPlatform/magic-modules#6617 and end with GoogleCloudPlatform/magic-modules#7071 🤷 .

@roaks3 roaks3 closed this as completed Jan 21, 2023
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants