Aurora Global cluster timeout and update-in-place all the time #10150

jamengual · 2019-09-18T19:37:24Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.12.7

Affected Resource(s)

aws_rds_cluster
aws_rds_global_cluster

Terraform Configuration Files

provider "aws" {
  alias  = "primary"
  region = "us-east-2"
  # Make it faster by skipping some checks
  skip_get_ec2_platforms      = true
  skip_metadata_api_check     = true
  skip_region_validation      = true
  skip_credentials_validation = true
  skip_requesting_account_id  = true
}

provider "aws" {
  alias  = "secondary"
  region = "us-west-2"

  # Make it faster by skipping some checks
  skip_get_ec2_platforms      = true
  skip_metadata_api_check     = true
  skip_region_validation      = true
  skip_credentials_validation = true
  skip_requesting_account_id  = true
}

resource "aws_rds_global_cluster" "main" {
  engine_version = "5.6.10a"
  global_cluster_identifier = "main-global-cluster"
  storage_encrypted = true
  provider = aws.primary
}

module "main_primary_cluster" {
  #source          = "git::https://github.com/cloudposse/terraform-aws-rds-cluster.git?ref=0.16.0"
  source = "../terraform-aws-rds-cluster"
  engine          = "aurora"
  engine_version = "5.6.10a"
  cluster_family  = "aurora5.6"
  cluster_size    = var.cluster_size
  namespace       = var.namespace
  stage           = var.stage
  name            = var.main_name
  admin_user      = var.db_user
  admin_password  = random_string.db_password.result
  db_name         = var.main_db_name
  instance_type   = "db.r5.xlarge"
  vpc_id          = local.vpc_id
  security_groups = [aws_security_group.main_sg.id]
  subnets         = local.private_subnet_ids
  engine_mode               = "global"
  global_cluster_identifier = "${aws_rds_global_cluster.main.id}"
  iam_database_authentication_enabled = true
  storage_encrypted = true

  # enable monitoring every 30 seconds
  rds_monitoring_interval = 15

  # reference iam role created above
  rds_monitoring_role_arn = aws_iam_role.main_enhanced_monitoring.arn
  performance_insights_enabled = false
  # performance_insights_kms_key_id = module.kms_key.key_arn
  

  cluster_parameters = [
    {
      name  = "binlog_format"
      value = "row"
      apply_method = "pending-reboot"
    },
    {
       apply_method = "immediate"
      name         = "max_allowed_packet"
     value        = "16777216"
        }
  ]
  providers = {
    aws = aws.primary
  }
}

module "main_secondary_cluster" {
  #source          = "git::https://github.com/cloudposse/terraform-aws-rds-cluster.git?ref=0.16.0"
  source = "../terraform-aws-rds-cluster"
  engine          = "aurora"
  engine_version = "5.6.10a"
  cluster_family  = "aurora5.6"
  cluster_size    = var.cluster_size
  namespace       = var.namespace
  stage           = var.stage
  name            = "${var.main_name}_secondary"
  admin_user      = ""
  admin_password  = ""
  db_name         = ""
  instance_type   = "db.r5.large"
  vpc_id          = local.secondary_vpc_id
  security_groups = [aws_security_group.secondary_main_sg.id]
  subnets         = local.secondary_private_subnet_ids
  engine_mode               = "global"
  global_cluster_identifier = "${aws_rds_global_cluster.main.id}"
  iam_database_authentication_enabled = true
  kms_key_arn = data.aws_kms_key.kms_key.arn
  source_region = "us-east-2"
storage_encrypted = true

  # enable monitoring every 30 seconds
  rds_monitoring_interval = 30

  # reference iam role created above
  rds_monitoring_role_arn = aws_iam_role.main_enhanced_monitoring.arn
  performance_insights_enabled = false
  #performance_insights_kms_key_id = module.kms_key.key_arn
  
  

  cluster_parameters = [
    {
      name  = "binlog_format"
      value = "row"
      apply_method = "pending-reboot"
    },
    {
       apply_method = "immediate"
      name         = "max_allowed_packet"
     value        = "16777216"
        }
  ]
  providers = {
    aws = aws.secondary
  }
}

...

Debug Output

https://gist.github.com/jamengual/3b44ec91777090dea73c3957b87aae9f

Expected Behavior

The aurora clusters that joined the global cluster should not require any modifications

Actual Behavior

Every time that apply is run the Aurora cluster members of the global cluster wants to modify replication_source_identifier
so this should be ignored.

Steps to Reproduce

terraform apply -target module.main_primary_cluster -var cluster_size=0
terraform apply -target module.main_primary_cluster -var cluster_size=2
terraform apply -target module.main_secondary_cluster -var cluster_size=0
terraform apply -target module.main_secondary_cluster -var cluster_size=2
terraform apply

Important Factoids

Timeouts

Tested in different regions, different instances size to no avail, the timeouts are happening 90% of the time if cluster_size=2, I tried
different internet connections and such thinking it was a problem in my setup.

The replication_source_identifier update-in-place happens 100% of the time.

I created my own fork of the clousposse module to add the global support that is basically 2 line
change so there is no hidden magic or loops going one here.

#0000

The text was updated successfully, but these errors were encountered:

anGie44 · 2020-08-06T15:37:33Z

Hi @jamengual, thank you for submitting this issue and apologies you've run into this issue! From the logs, it looks like you're referring to the non-empty plan that results after creation of the rds_cluster as the replication_source_identifier attribute is returned from the API even though the module's variable looks to be unconfigured in your example. We've seen similar issues reported related to these replication_source_identifier and global_cluser_identifier attributes when an rds_cluster resource refers to an rds_global_cluster. As a workaround in the meantime, I would first suggest to update the module source code (if possible) to use the lifecycle configuration block ignore_changes around this param (in the resource definition; unfortunately, this isn't feasible at the module level yet) to avoid the perpetual updates e.g.

lifecycle {
  ignore_changes = [replication_source_identifier]
}

On our end, we'll look into marking this attribute as Computed to address the diff you are seeing on each apply.

anGie44 · 2021-09-29T15:50:15Z

Hi @jamengual , since it's been some time since opening this issue and a newer terraform/provider/module version may address this, I'm going to close this for the time being. Please do reach out if there are any new findings with later versions of the provider.

jamengual · 2021-09-29T17:18:48Z

Thanks @anGie44 , I did not reply before but I did use the workaround and since then I have not used it again in the new provider versions, but it is ok to close the issue

github-actions · 2022-06-05T02:31:29Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ghost added the service/rds Issues and PRs that pertain to the rds service. label Sep 18, 2019

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Sep 18, 2019

anGie44 mentioned this issue Jul 31, 2020

resource/rds_cluster: update delete timeout and add additional retry condition #14420

Merged

anGie44 added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Jul 31, 2020

anGie44 self-assigned this Jul 31, 2020

anGie44 changed the title ~~Autora Global cluster timeout and update-in-place all the time~~ Aurora Global cluster timeout and update-in-place all the time Aug 6, 2020

anGie44 removed their assignment Sep 29, 2021

anGie44 closed this as completed Sep 29, 2021

github-actions bot locked as resolved and limited conversation to collaborators Jun 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aurora Global cluster timeout and update-in-place all the time #10150

Aurora Global cluster timeout and update-in-place all the time #10150

jamengual commented Sep 18, 2019 •

edited

Loading

anGie44 commented Aug 6, 2020 •

edited

Loading

anGie44 commented Sep 29, 2021

jamengual commented Sep 29, 2021 •

edited

Loading

github-actions bot commented Jun 5, 2022

Aurora Global cluster timeout and update-in-place all the time #10150

Aurora Global cluster timeout and update-in-place all the time #10150

Comments

jamengual commented Sep 18, 2019 • edited Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

anGie44 commented Aug 6, 2020 • edited Loading

anGie44 commented Sep 29, 2021

jamengual commented Sep 29, 2021 • edited Loading

github-actions bot commented Jun 5, 2022

jamengual commented Sep 18, 2019 •

edited

Loading

anGie44 commented Aug 6, 2020 •

edited

Loading

jamengual commented Sep 29, 2021 •

edited

Loading