Skip to content

Latest commit

 

History

History
575 lines (376 loc) · 22.3 KB

TROUBLESHOOTING.md

File metadata and controls

575 lines (376 loc) · 22.3 KB

Troubleshooting

Terminology

See GLOSSARY.md.


Problems


Common issues


Project quota exceeded

Error message:

Error code 8, message: The project cannot be created because you have exceeded your allotted project quota

Cause:

This message means you have reached your project creation quota.

Solution:

In this case, you can use the Request Project Quota Increase form to request a quota increase.

In the support form, for the field Email addresses that will be used to create projects, use the email address of projects_step_terraform_service_account_email that is created by the Terraform Example Foundation 0-bootstrap step.

Notes:

Default branch setting

Error message:

error: src refspec master does not match any

Cause:

This could be due to init.defaultBranch being set to something other than main.

Solution:

  1. Determine your default branch:

    git config init.defaultBranch

    Outputs main if you are in the main branch.

  2. If your default branch is not set to main, set it:

    git config --global init.defaultBranch main

Terraform State Snapshot lock

Error message:

When running the build for the branch production in step 3-networks in your Foundation CI/CD Pipeline the build fails with:

state snapshot was created by Terraform v1.x.x, which is newer than current v1.5.7; upgrade to Terraform v1.x.x or greater to work with this state

Cause:

The manual deploy step for the shared environment in 3-networks was executed with a Terraform version newer than version v1.5.7 used in the Foundation CI/CD Pipeline.

Solution:

You have two options:

Downgrade your local Terraform version

You will need to re-run the deploy of the 3-networks shared environment with Terraform v1.5.7.

Steps:

  • Go to folder gcp-networks/envs/shared/.
  • Update backend.tf with your bucket name from the 0-bootstrap step.
  • Run terraform destroy in the folder using the Terraform v1.x.x version.
  • Delete the Terraform state file in gs://YOUR-TF-STATE-BUCKET/terraform/networks/envs/shared/default.tfstate. This bucket is in your Seed Project.
  • Install Terraform v1.5.7.
  • Re-run the manual deploy of 3-networks shared environment using Terraform v1.5.7.

Upgrade your 0-bootstrap runner image Terraform version

Replace 1.x.x with the actual version of your local Terraform version in the following instructions:

  • Go to folder 0-bootstrap.
  • Edit the local terraform_version in the Terraform cb.tf file:
    • Upgrade local terraform_version from "1.5.7" to "1.x.x"
  • Run terraform init.
  • Run terraform plan and review the output.
  • Run terraform apply.

Application authenticated using end user credentials

Error message:

When running gcloud commands in Cloud Shell like

gcloud scc notifications describe <scc_notification_name> --organization YOUR_ORGANIZATION_ID

or

gcloud access-context-manager policies list --organization YOUR_ORGANIZATION_ID --format="value(name)"

you receive the error:

Error 403: Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the X.googleapis.com.
We recommend configuring the billing/quota_project setting in gcloud or using a service account through the auth/impersonate_service_account setting.
For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/.

Cause:

When using application default credentials in Cloud Shell a billing project is not available for APIs like securitycenter.googleapis.com or accesscontextmanager.googleapis.com.

Solution:

you can re-run the command using impersonation or providing a billing project:

  • Impersonate the Terraform Service Account
--impersonate-service-account=terraform-org-sa@<SEED_PROJECT_ID>.iam.gserviceaccount.com
  • Provide a billing project
--billing-project=<A-VALID-PROJECT-ID>

If you provide a billing project, you must have the serviceusage.services.use permission on the billing_project.

Cannot assign requested address error in Cloud Shell

Error message:

When using Google Cloud Shell to deploy the code in ths repository, you may face an error like

dial tcp [2607:f8b0:400c:c15::5f]:443: connect: cannot assign requested address

when Terraform calls the Google APIs.

Cause:

This is a known terraform issue regrading IPv6.

Solution:

At this time the alternatives are:

  1. To use a workaround to force Google API calls in Cloud Shell to use an IP from the private.googleapis.com range (199.36.153.8/30 ) or
  2. To deploy the foundation code from a local machine that supports IPv6.

If you use the workaround, the API list should include the ones that are allowed in the terraform-example-foundation policy library.

Error: Unsupported attribute

Error message:

Error: Unsupported attribute

  on main.tf line 22, in locals:
  22:   org_id               = data.terraform_remote_state.bootstrap.outputs.common_config.org_id
    ├────────────────
    │ data.terraform_remote_state.bootstrap.outputs is object with no attributes

This object does not have an attribute named "common_config".

Cause:

The stages after 0-bootstrap use terraform_remote_state data source to read common configuration like the organization ID from the output of the 0-bootstrap stage. The error means that the Terraform state of the 0-bootstrap stage was not copied to the Terraform state bucket created in stage 0-bootstrap.

Solution:

Follow the instructions at the end of the Deploying with Cloud Build section in the 0-bootstrap README to copy the Terraform state to the Cloud Storage bucket created in stage 0-bootstrap and retry planning/applying the stage you are deploying.

Error: Error adding network peering

Error message:

Error: Error adding network peering: googleapi: Error 403: Rate Limit Exceeded
Details:
[
  {
    "@type": "type.googleapis.com/google.rpc.ErrorInfo",
    "domain": "compute.googleapis.com",
    "metadatas": {
      "containerId": "76352966089",
      "containerType": "PROJECT",
      "location": "global"
    },
    "reason": "CONCURRENT_OPERATIONS_QUOTA_EXCEEDED"
  }
]
, rateLimitExceeded

Cause:

In a deploy using the Hub and Spoke network mode, an error occurs when adding the network peering between the restricted Hub network and the restricted Spoke network or the base Hub network and the base Spoke network due to too many peering operations.

Solution:

This is a transient error and the deploy can be retried. Wait for at least a minute and retry the deploy.

Error: Unknown project id on 4-project step context

Error message:

Error 400: Unknown project id: 'prj-<business-unity>-<environment>-sample-base-<random-suffix>', invalid

Cause:

When you try to run 4-projects step without requesting additional project quota for project service account created in 0-bootstrap step you may face the error above, even after the project quota issue is resolved, due to an inconsistency in terraform state.

Solution:

You will need to mark some Terraform resources as tainted in order to trigger the recreation of the missing projects to fix the inconsistent in the terraform state.

  1. In a terminal, navigate to the path where the error is being reported.

    For example, if the unknown project ID is prj-bu1-p-sample-base-shared, you should go to ./gcp-projects/business_unit_1/production (business_unit_1 due to bu1 and production due to p, see naming conventions for more information on the projects naming guideline).

    cd ./gcp-projects/<business_unit>/<environment>
  2. Run the terraform init command so you can pull the remote state.

    terraform init
  3. Run the terraform state list command, filtering by random_project_id_suffix. This command will give you all the expected projects that should be created for this BU and environment that uses a random suffix.

    terraform state list | grep random_project_id_suffix
  4. Identify the folder which is the parent of the projects of the environment. If the Terraform Example Foundation is deployed directly under the organization use --organization, if the Terraform Example Foundation is deployed under a folder use --folder. The "ORGANIZATION_ID" and "PARENT_FOLDER" are the input values provided for the 0-bootstrap step.

    gcloud resource-manager folders list [ --organization=ORGANIZATION_ID ][ --folder=PARENT_FOLDER ]
  5. The result of the gcloud command will look like the following output. Using the production environment for this example, the folder ID for the environment would be 333333333333.

    DISPLAY_NAME         PARENT_NAME                     ID
    fldr-bootstrap       folders/PARENT_FOLDER  111111111111
    fldr-common          folders/PARENT_FOLDER  222222222222
    fldr-production      folders/PARENT_FOLDER  333333333333
    fldr-nonproduction  folders/PARENT_FOLDER  444444444444
    fldr-development     folders/PARENT_FOLDER  555555555555
    
  6. Run the gcloud projects list command to. Replace id_of_the_environment_folder with the proper ID of the folder retrieved in the previous step. This command will give you all the projects that were actually created.

    gcloud projects list --filter="parent=<id_of_the_environment_folder>"
  7. For each resource listed in the terraform state step for a project that is not returned by the gcloud projects list step, we should mark that resource as tainted to force it to be recreated in order to fix the inconsistency in the terraform state.

    terraform taint <resource>[index]

    For example, in the following command we are marking as tainted the env secrets project. You may need to run the terraform taint command multiple times, depending on how many missing projects you have.

    terraform taint module.env.module.env_secrets_project.module.project.module.project-factory.random_string.random_project_id_suffix[0]
  8. After running the terraform taint command for all the non-matching items, go to Cloud Build and trigger a retry action for the failed job. This should complete successfully, if you encounter another similar error for another BU/environment that will require you to follow this guide again but instead changing paths according to the BU/environment reported in the error log.

Notes:

  • Make sure you run the taint command just for the resources that contain the [number] at the end of the line returned by terraform state list step. You don't need to run for the groups (the resources that don't have the [] at the end).

Error: Error getting operation for committing purpose for TagValue

Error message:

Error: Error waiting to create TagValue: Error waiting for Creating TagValue: Error code 13, message: Error getting operation for committing purpose for TagValue: tagValues/{tag_value_id}

Cause:

Sometimes when deploying a google_tags_tag_value the error occurs and Terraform is not able to finish the execution.

Solution:

  1. This is a transient error and the deploy can be retried.
  2. A retry policy was added to prevent this error during the integration test.

Caller does not have permission in the Organization

Error message:

Error: Error when reading or editing Organization Not Found : <organization-id>: googleapi: Error 403: The caller does not have permission, forbidden

Cause:

User running Terraform is missing Organization Administrator predefined role at the Organization level.

Solution:

  • If the user does not have the role Organization Administrator try the following:

You will need to request the roles to be granted to your user by your organization administration team.

  • If the user does have the role Organization Administrator try the following:
gcloud auth application-default login
gcloud auth list # <- confirm that correct account has a star next to it

Re-run terraform after.

Billing quota exceeded

Error message:

Error: Error setting billing account "XXXXXX-XXXXXX-XXXXXX" for project "projects/some-project": googleapi: Error 400: Precondition check failed., failedPrecondition

Cause:

Most likely this is related to a billing quota issue.

Solution:

try

gcloud alpha billing projects link projects/some-project --billing-account XXXXXX-XXXXXX-XXXXXX

If output states Cloud billing quota exceeded, you can use the Request Billing Quota Increase form to request a billing quota increase.

Terraform Error acquiring the state lock

Error message:

Error: Error acquiring the state lock

Cause:

This message means that you are trying to apply a Terraform configuration with a remote backend that is in a locked state.

If the Terraform process was unable to finish due to an unexpected event, i.e build timeout or terraform process killed. It will keep the Terraform State locked.

Solution:

The following commands are an example of how to unlock the development environment from step 2-environments that is one part of the Foundation Example. It can also be applied in the same way to the other parts.

  1. Clone the repository where you got the Terraform State lock. The following example assumes development environment from step 2-environments:

    gcloud source repos clone gcp-environments --project=YOUR_CLOUD_BUILD_PROJECT_ID
  2. Navigate into the repo and change to the development branch:

    cd gcp-environments
    git checkout development
  3. If your project does not have a remote backend you can jump skip the next 2 commands and jump to terraform init command.

  4. If your project has a remote backend you will have to update backend.tf with the remote state backend bucket. You can get this information from step 0-bootstrap by running the following command:

    terraform output gcs_bucket_tfstate
  5. Update backend.tf with the remote state backend bucket you got on previously inside <YOUR-REMOTE-STATE-BACKEND-BUCKET>:

    for i in `find . -name 'backend.tf'`; do sed -i'' -e 's/UPDATE_ME/<YOUR-REMOTE-STATE-BACKEND-BUCKET>/' $i; done
  6. Navigate into envs/development where your terraform config files are in and run terraform init:

    cd envs/development
    terraform init
  7. At this point, you will be able to get Terraform State lock information and unlock your state.

  8. After running terraform apply you should get an error message like the following:

    terraform apply
    Acquiring state lock. This may take a few moments...
    ╷
    │ Error: Error acquiring the state lock
    │
    │ Error message: writing "gs://<YOUR-REMOTE-STATE-BACKEND-BUCKET>/<PATH-TO-TERRAFORM-STATE>/<tf state file name>.tflock" failed: googleapi: Error 412: At least one
    │ of the pre-conditions you specified did not hold., conditionNotMet
    │ Lock Info:
    │   ID:        1664568683005669
    │   Path:      gs://<YOUR-REMOTE-STATE-BACKEND-BUCKET>/<PATH-TO-TERRAFORM-STATE>/<tf state file name>.tflock
    │   Operation: OperationTypeApply
    │   Who:       user@domain
    │   Version:   1.0.0
    │   Created:   2022-09-30 20:11:22.90644727 +0000 UTC
    │   Info:
    │
    │
    │ Terraform acquires a state lock to protect the state from being written
    │ by multiple users at the same time. Please resolve the issue above and try
    │ again. For most commands, you can disable locking with the "-lock=false"
    │ flag, but this is not recommended.
    
  9. With the lock ID you will be able to remove the Terraform State lock using terraform force-unlock command. It is a strong recommendation to review the official documentation regarding terraform force-unlock command before executing it.

  10. After unlocking the Terraform State you will be able to execute a terraform plan for review of the state. The following links can help you to recover the Terraform State for your configuration and move on:

    1. Manipulating Terraform State
    2. Moving Resources
    3. Importing Infrastructure

Terraform State lock possible causes:

  • If you realize that the Terraform State lock was due to a build timeout increase the build timeout on build configuration.

Terraform deploy fails due to GitLab repositories not found

Error message:

Error: POST https://gitlab.com/api/v4/projects/<GITLAB-ACCOUNT>/<GITLAB-REPOSITORY>/variables: 404 {message: 404 Project Not Found}

Cause:

This message means that you are using a wrong Access Token or you have Access Token created in both Gitlab Account/Group and GitLab Repository.

Only Personal Access Token under GitLab Account/Group should exist.

Solution:

Remove any Access Token from the GitLab repositories used by Google Secure Foundation Blueprint.

Gitlab pipelines access denied

Error message:

From the logs of your Pipeline job:

Error response from daemon: pull access denied for registry.gitlab.com/<YOUR-GITLAB-ACCOUNT>/<YOUR-GITLAB-CICD-REPO>/terraform-gcloud, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

Cause:

The cause of this message is that the CI/CD repository has "Limit access to this project" enabled in the Token Access settings.

Solution:

Add all the projects/repositories to be used in the Terraform Example Foundation to the allow list available in CI/CD Repo -> Settings -> CI/CD -> Token Access -> Allow CI job tokens from the following projects to access this project.

The user does not have permission to access Project or it may not exist

Error message:

Error when reading or editing GCS service account not found: googleapi: Error 400: Unknown project id: <PROJECT-ID>, invalid.
The user does not have permission to access Project <PROJECT-ID> or it may not exist.

Cause:

Terraform is trying to fetch or manipulate resources associated with the given project PROJECT-ID but the project was not created in the first execution.

What was created in the first execution was the project id that will be used to create the project. The project id is a composition of a fixed prefix and a random suffix.

Possible causes of the project creation failure in the first execution are:

  • The user does not have Billing Account User role in the billing account
  • The user does not have Project Creator role in the Google Cloud organization
  • The user has reached the project creation quota
  • Terraform apply failed midway due to a timeout or an interruption, leaving the project ID generated in the state but not creating the project itself

Solution:

If the cause is the project creation quota issue. Follow instruction in the Terraform Example Foundation troubleshooting

After doing this fixes you need to force the recreation of the random suffix used in the project ID. To force the creation run

terraform taint <RESOURCE-ID>

For example

terraform taint module.seed_bootstrap.module.seed_project.module.project-factory.random_id.random_project_id_suffix

And try again to do the deployment.