[BUG] AzureRM - Azure Postgres Flexible Server - Virtual Endpoint Attempts to re-create after Failover #27796

leonrob · 2024-10-28T22:22:15Z

Is there an existing issue for this?

I have searched the existing issues

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

0.13

AzureRM Provider Version

4.7.0

Affected Resource(s)/Data Source(s)

azurerm_postgresql_flexible_server_virtual_endpoint

Terraform Configuration Files

resource "azurerm_postgresql_flexible_server_virtual_endpoint" "testendpoint" {
  name              = "testendpoint1"
  source_server_id  = data.azurerm_postgresql_flexible_server.centralpg.id
  replica_server_id = data.azurerm_postgresql_flexible_server.eastpgreplica.id
  type              = "ReadWrite"

  depends_on = [
    data.azurerm_postgresql_flexible_server.centralpg,
    data.azurerm_postgresql_flexible_server.eastpgreplica
  ]
  
}

Debug Output/Panic Output

Whitespace change before failover shows 0 changes.

After manually promoting the replica server to the primary server in the azure UI, this happens on whitespace change:

Terraform will perform the following actions:

  # azurerm_postgresql_flexible_server_virtual_endpoint.testendpoint will be created
  + resource "azurerm_postgresql_flexible_server_virtual_endpoint" "testendpoint" {
      + id                = (known after apply)
      + name              = "testendpoint1"
      + replica_server_id = "/subscriptions/XX/resourceGroups/eastus2-cloudpipelines-dev-rg/providers/Microsoft.DBforPostgreSQL/flexibleServers/eastus2-replica-test-demo-dev-fpg"
      + source_server_id  = "/subscriptions/XX/resourceGroups/centralus-development-dev-rg/providers/Microsoft.DBforPostgreSQL/flexibleServers/centralus-test-demo-dev-fpg"
      + type              = "ReadWrite"
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Expected Behaviour

It should see there is already an endpoint assigned to both resources that is functional.

Actual Behaviour

No response

Steps to Reproduce

Whitespace change

Important Factoids

No response

References

No response

leonrob · 2024-10-28T22:24:05Z

Also - for what it's worth... I've attempted to add lifecycle prevent destroy and it does not work. The only workaround I found is:

Create a var:

variable "create_virtual_endpoint" {
type = bool
default = false # Change this based on your workspace context
}

Use var as a bool to create or not. But on initial creation it would need set to true. It would need changed to "False" in a separate PR after. I'm trying to reduce the amount of steps required.

resource "azurerm_postgresql_flexible_server_virtual_endpoint" "testendpoint" {
count = var.create_virtual_endpoint ? 1 : 0
name = "testendpoint1"
source_server_id = data.azurerm_postgresql_flexible_server.centralpg.id
replica_server_id = data.azurerm_postgresql_flexible_server.eastpgreplica.id
type = "ReadWrite"

depends_on = [
data.azurerm_postgresql_flexible_server.centralpg,
data.azurerm_postgresql_flexible_server.eastpgreplica
]

lifecycle {
ignore_changes = ["*"]
}
}

neil-yechenwei · 2024-10-29T09:21:08Z

Thanks for raising this issue. The prevent destroy needs to be added from the beginning. Seems I can't reproduce this issue. Could you double check if below reproduce steps are expected?

Reproduce steps:

tf apply with below tf config
Exchange the values for zone and standby_availability_zone
tf apply again

tf config:

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "test" {
  name     = "acctestRG-postgresql-test01"
  location = "eastus"
}

resource "azurerm_postgresql_flexible_server" "test" {
  name                          = "acctest-fs-test01"
  resource_group_name           = azurerm_resource_group.test.name
  location                      = azurerm_resource_group.test.location
  version                       = "16"
  public_network_access_enabled = false
  administrator_login           = "adminTerraform"
  administrator_password        = "QAZwsx123"
  zone                          = "1"
  storage_mb                    = 32768
  storage_tier                  = "P30"
  sku_name                      = "GP_Standard_D2ads_v5"

  high_availability {
    mode                      = "ZoneRedundant"
    standby_availability_zone = "2"
  }
}

resource "azurerm_postgresql_flexible_server" "test_replica" {
  name                          = "acctest-ve-replica-test01"
  resource_group_name           = azurerm_postgresql_flexible_server.test.resource_group_name
  location                      = azurerm_postgresql_flexible_server.test.location
  create_mode                   = "Replica"
  source_server_id              = azurerm_postgresql_flexible_server.test.id
  version                       = "16"
  public_network_access_enabled = false
  zone                          = "1"
  storage_mb                    = 32768
  storage_tier                  = "P30"
}

resource "azurerm_postgresql_flexible_server_virtual_endpoint" "test" {
  name              = "acctest-ve-test01"
  source_server_id  = azurerm_postgresql_flexible_server.test.id
  replica_server_id = azurerm_postgresql_flexible_server.test_replica.id
  type              = "ReadWrite"
}

leonrob · 2024-10-29T13:09:02Z

Apologies you're using "ZoneRedundant" for the HA mode

It's actually Replica

CorrenSoft · 2024-11-05T15:44:11Z

According to the plan that you shared, is just creating a virtual endpoint (does not include the destruction part), which may suggest that the Virtuaal Endpoint was already destroyed on the Failover. Could be that the case?

Is not uncommon that in failover scenarios, due to the changes made in the process, the Terraform code gets outdated. In those situations, you will need to decide between to restore the original configuration once the situation that triggered the failover is no longer valid, or update the code to properly describe the new status.

leonrob · 2024-11-05T15:48:35Z

According to the plan that you shared, is just creating a virtual endpoint (does not include the destruction part), which may suggest that the Virtuaal Endpoint was already destroyed on the Failover. Could be that the case?

Is not uncommon that in failover scenarios, due to the changes made in the process, the Terraform code gets outdated. In those situations, you will need to decide between to restore the original configuration once the situation that triggered the failover is no longer valid, or update the code to properly describe the new status.

Hey CorrenSoft thanks for the reply. It actually does NOT destroy the endpoint. I have ton some extremely extensive tests on this to replicate it. I can replicate this super easily.

If possible, would you be willing to hop on a call with me? No pressure or anything. That way I can show you this. My company is a fortune 500 but we aren't a Terraform enterprise customer. (Although we spend a large amount with Hashi :-D )

Thanks in advance

CorrenSoft · 2024-11-05T16:55:17Z

Not sure if it would be appropriate since I don't work for Hashicorp :p
Besides, I am not familiar enough (yet) with this resource, just provided my input based on my experience with failover with other resources.

Just to increase the context information, Did you say that the failover did not destroy the endpoint? If so, does the apply step actually create a new one?

leonrob · 2024-11-05T17:47:01Z

Not sure if it would be appropriate since I don't work for Hashicorp :p Besides, I am not familiar enough (yet) with this resource, just provided my input based on my experience with failover with other resources.

Just to increase the context information, Did you say that the failover did not destroy the endpoint? If so, does the apply step actually create a new one?

Oh I apologize, I thought you did! lol.

Yes. The failover did NOT destroy the endpoint. Which is expected.

The database servers should be able to fail between each other without destruction

My concern is that the Terraform doesn't see the virtual endpoint when it goes to check state. Even though it already exists.

It's 100% a bug on Hashi's end. There was another bug related to this that I was able to get someone to fix, but that person no longer works at hashi

leonrob · 2024-11-14T15:24:15Z

Anyone from hashi take a peek at this yet?

zahi101 · 2024-12-04T14:23:36Z

I ran into this problem too, after promoting to the replica server the terraform don't "know" the endpoint and tries to create a new one (ends with error because same name...) after promoting again(to the original one) its worked.

leonrob · 2024-12-04T14:27:43Z

@jackofallops could you take a look?

leonrob · 2024-12-12T01:18:58Z

@stephybun could you take a look?

leonrob · 2024-12-23T18:57:17Z

Unsure if anyone is planning on trying to fix this so i gave it a shot here
#28374

github-actions bot added service/postgresql v/4.x labels Oct 28, 2024

sitengyu-hc added the bug label Dec 23, 2024

leonrob mentioned this issue Dec 23, 2024

[BUG FIX] - Fixing bug that prevents postgresql_flexible_server_virtu… #28374

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] AzureRM - Azure Postgres Flexible Server - Virtual Endpoint Attempts to re-create after Failover #27796

[BUG] AzureRM - Azure Postgres Flexible Server - Virtual Endpoint Attempts to re-create after Failover #27796

leonrob commented Oct 28, 2024

leonrob commented Oct 28, 2024

neil-yechenwei commented Oct 29, 2024 •

edited

Loading

leonrob commented Oct 29, 2024

CorrenSoft commented Nov 5, 2024

leonrob commented Nov 5, 2024

CorrenSoft commented Nov 5, 2024

leonrob commented Nov 5, 2024

leonrob commented Nov 14, 2024

zahi101 commented Dec 4, 2024

leonrob commented Dec 4, 2024

leonrob commented Dec 12, 2024

leonrob commented Dec 23, 2024

[BUG] AzureRM - Azure Postgres Flexible Server - Virtual Endpoint Attempts to re-create after Failover #27796

[BUG] AzureRM - Azure Postgres Flexible Server - Virtual Endpoint Attempts to re-create after Failover #27796

Comments

leonrob commented Oct 28, 2024

Is there an existing issue for this?

Community Note

Terraform Version

AzureRM Provider Version

Affected Resource(s)/Data Source(s)

Terraform Configuration Files

Debug Output/Panic Output

Expected Behaviour

Actual Behaviour

Steps to Reproduce

Important Factoids

References

leonrob commented Oct 28, 2024

neil-yechenwei commented Oct 29, 2024 • edited Loading

leonrob commented Oct 29, 2024

CorrenSoft commented Nov 5, 2024

leonrob commented Nov 5, 2024

CorrenSoft commented Nov 5, 2024

leonrob commented Nov 5, 2024

leonrob commented Nov 14, 2024

zahi101 commented Dec 4, 2024

leonrob commented Dec 4, 2024

leonrob commented Dec 12, 2024

leonrob commented Dec 23, 2024

neil-yechenwei commented Oct 29, 2024 •

edited

Loading