Failing test(s): Error code 9, message: encountered some transient internal error, please try again later #12902

melinath · 2022-10-28T17:42:51Z

Failure rate: 100% since 2022-10-08

Impacted tests:

TestAccSqlDatabaseInstance_Timezone
TestAccSqlDatabaseInstance_SqlServerAuditConfig
TestAccSqlDatabaseInstance_ActiveDirectory
TestAccSqlDatabaseInstance_withPrivateNetwork_withAllocatedIpRangeClone
TestAccSqlDatabaseInstance_withPrivateNetwork_withAllocatedIpRangeReplica
TestAccSqlDatabaseInstance_withPrivateNetwork_withAllocatedIpRange
TestAccSqlDatabaseInstance_withPrivateNetwork_withoutAllocatedIpRange
TestAccServiceNetworkingConnection_create
TestAccRedisInstance_redisInstancePrivateServiceExample
TestAccMemcacheInstance_update
TestAccMemcacheInstance_memcacheInstanceBasicExample

Nightly builds:

https://ci-oss.hashicorp.engineering/buildConfiguration/GoogleCloud_ProviderGoogleCloudGoogleProject/348525?buildTab=tests&expandedTest=-7672761041235634148

Message:

Error: Error waiting for Create Service Networking Connection: Error code 9, message: encountered some transient internal error, please try again later

b/256181958

The text was updated successfully, but these errors were encountered:

AarshDhokai · 2022-12-02T11:39:42Z

Cumulatively all the tests are failing 100% in builds :
Google Cloud and Google Cloud Beta

AarshDhokai · 2022-12-07T13:04:48Z

b/261701420

roaks3 · 2022-12-14T21:14:32Z

I came across this one in a recent PR, so did a little investigating.

Some historical context I was able to dig up:

This issue has come up for a number of VCR tests, and one recent case was discussed more here: connectorEnforcement field added along with modified test GoogleCloudPlatform/magic-modules#6667
That PR led to discussion with the service team on b/254484541 (which relates to an earlier ticket b/251305449)
The above ticket is not resolved, but there is speculation that the root cause is the IP ranges being too large (the suggestion was to try prefix_length = 24 instead of prefix_length = 16
Practically speaking, /24 should be more than enough addresses for our SQL tests, probably our other tests too. Our documentation does recommend /16, but we don't need every test to reserve that large of a block: https://cloud.google.com/sql/docs/mysql/private-ip#allocated_range_size
There have been some prior efforts that seem to be related to these errors, like introducing the bootstrapped network (Add bootstrapped test networks for service networking tests GoogleCloudPlatform/magic-modules#2920, Bootstrap network for redis tests GoogleCloudPlatform/magic-modules#3039) and using a PATCH instead of POST for creating the connection (Use Patch instead of Create, even if creating GoogleCloudPlatform/magic-modules#2446)
There was an additional test added that reserved a /16 block the day before this was noticed in our tests, which might indicate we crossed a threshold with that addition: Add new Resource Cloudids Endpoint GoogleCloudPlatform/magic-modules#6617
We have tests that use /16 blocks and appear to have no issue, most of which are in Apigee
There is a very similar error in one of the Apigee tests that implies the IP space is being exhausted:

=== RUN   TestAccApigeeInstance_apigeeInstanceIpRangeTestExample
=== PAUSE TestAccApigeeInstance_apigeeInstanceIpRangeTestExample
=== CONT  TestAccApigeeInstance_apigeeInstanceIpRangeTestExample
    provider_test.go:307: Step 1/2 error: Error running apply: exit status 1
        
        Error: Error waiting to create Instance: Error waiting for Creating Instance: Error code 9, message: couldn\t find a free IP space of /22 to launch an instance. Verify the peering ranges are available as per https://cloud.google.com/apigee/docs/api-platform/get-started/install-cli#service-networking and try again
        
          with google_apigee_instance.apigee_instance,
          on terraform_plugin_test.tf line 57, in resource "google_apigee_instance" "apigee_instance":
          57: resource "google_apigee_instance" "apigee_instance" {
        
--- FAIL: TestAccApigeeInstance_apigeeInstanceIpRangeTestExample (612.07s)

roaks3 · 2022-12-14T21:20:00Z

My current theory is that we are reserving too much IP space with our tests, so at some point as our tests are being run in parallel, we run out of space and begin failing subsequent tests. I believe we should try some combination of reserving /24 blocks across existing tests, and maybe trying bootstrapped network connections so that we don't need to reserve as many blocks.

roaks3 · 2023-01-21T01:01:17Z

I checked in on this today and the Error code 9, message: encountered some transient internal error, please try again later error has not occurred on any of these tests for the last 2 weeks (starting with the Jan 6 build).

Some of these tests are failing for other reasons now, but I think it's safe to close this ticket.

As for the cause/resolution, it isn't clear to me exactly what resolved this. We had been in the process of capturing more resources with our sweepers, and the Jan 6 build was the first build in multiple months that the sweeper was able to run all the way through (due to disabling the CloudIdsEndpoints sweeper and OSPolicyAssignment sweeper). My theory is that we had accumulated networking resources in the test project, perhaps due to some of the heavier resources like Composer/SQL/Dataproc not being cleaned up, and that was in some way preventing new SQL instances from being provisioned. But it is interesting that this seemed to start with GoogleCloudPlatform/magic-modules#6617 and end with GoogleCloudPlatform/magic-modules#7071 🤷 .

github-actions · 2023-02-20T02:17:44Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

melinath added the test-failure label Oct 28, 2022

melinath changed the title ~~Failing test(s): TestAccSqlDatabaseInstance_Timezone~~ Failing test(s): TestAccSqlDatabaseInstance_Timezone: Error code 9, message: encountered some transient internal error, please try again later Oct 28, 2022

melinath changed the title ~~Failing test(s): TestAccSqlDatabaseInstance_Timezone: Error code 9, message: encountered some transient internal error, please try again later~~ Failing test(s): Error code 9, message: encountered some transient internal error, please try again later Oct 28, 2022

melinath added the size/xs label Oct 31, 2022

melinath added this to the Near-Term Goals milestone Oct 31, 2022

sanghaniJ mentioned this issue Nov 2, 2022

connectorEnforcement field added along with modified test GoogleCloudPlatform/magic-modules#6667

Merged

5 tasks

AarshDhokai added the priority/1 label Dec 1, 2022

AarshDhokai added the forward/linked label Dec 7, 2022

roaks3 mentioned this issue Dec 14, 2022

Fix sql_database_instance creation to remove root user earlier GoogleCloudPlatform/magic-modules#6922

Merged

5 tasks

roaks3 closed this as completed Jan 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing test(s): Error code 9, message: encountered some transient internal error, please try again later #12902

Failing test(s): Error code 9, message: encountered some transient internal error, please try again later #12902

melinath commented Oct 28, 2022 •

edited by megan07

Loading

AarshDhokai commented Dec 2, 2022

AarshDhokai commented Dec 7, 2022

roaks3 commented Dec 14, 2022 •

edited

Loading

roaks3 commented Dec 14, 2022

roaks3 commented Jan 21, 2023

github-actions bot commented Feb 20, 2023

Failing test(s): Error code 9, message: encountered some transient internal error, please try again later #12902

Failing test(s): Error code 9, message: encountered some transient internal error, please try again later #12902

Comments

melinath commented Oct 28, 2022 • edited by megan07 Loading

AarshDhokai commented Dec 2, 2022

AarshDhokai commented Dec 7, 2022

roaks3 commented Dec 14, 2022 • edited Loading

roaks3 commented Dec 14, 2022

roaks3 commented Jan 21, 2023

github-actions bot commented Feb 20, 2023

melinath commented Oct 28, 2022 •

edited by megan07

Loading

roaks3 commented Dec 14, 2022 •

edited

Loading