Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when destroying azuread_application_password #661

Open
dimbleby opened this issue Nov 13, 2021 · 17 comments
Open

Error when destroying azuread_application_password #661

dimbleby opened this issue Nov 13, 2021 · 17 comments

Comments

@dimbleby
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureAD Provider) Version

terraform 1.0.7, azuread provider 2.9.0

Affected Resource(s)

  • azuread_application_password

Actually the output isn't quite explicit that this is the affected resource; but I suppose it must be.

Terraform Configuration Files

Can't share our full configuration, but the basic set up is that we have a azuread_application, a azuread_service_principal and a azuread_application_password.

Can try to shrink to a shareable reproduction if needed.

Debug Output

Coming soon, I hope. Though experience does suggest that bugs will often refuse to reproduce when you turn on the debug logs...

Panic Output

Expected Behavior

Resources are successfully destroyed at terraform destroy

Actual Behavior

│ Error: Removing password credential "5e8d9103-f96c-4e85-ad14-8b2d6f3382f7" from application with object ID "126c83a4-aa27-46c3-a382-e2408c2efac5"
│ 
│ ApplicationsClient.BaseClient.Post(): unexpected status 404 with OData
│ error: Directory_ObjectNotFound: Unable to read the company information
│ from the directory.

Steps to Reproduce

  1. terraform apply
  2. terraform destroy

Important Factoids

References

@manicminer
Copy link
Contributor

Hi @dimbleby, thanks for reporting this issue. This looks like an interesting error case, if you are able to reproduce and could post debug output that would be exceedingly helpful. Thanks!

@dimbleby
Copy link
Author

I have a repro, just need to redact credentials from the log... will update shortly.

@dimbleby
Copy link
Author

https://gist.github.com/dimbleby/88190ed4b89e0d50542c1e2f86aa109b#file-debug-log

(I've removed the client secret and all access tokens that it obtains. All applications etc mentioned in this log are now destroyed, so it doesn't matter that their details remain visible).

The 404 appears at line 13689.

I have started seeing this when taking the upgrade from azuread 2.7.0 to 2.9.0. I don't know whether this is causal, or only coincidence.

@dimbleby
Copy link
Author

Just updating to note that after a few retries both up- and down-level: I am pretty convinced that we hit this at 2.9.0 but not at 2.7.0

@dimbleby
Copy link
Author

Noting, in case it helps, that it seems that we do not hit this at 2.8.0 (but do continue to hit it at 2.9.0)

@manicminer
Copy link
Contributor

Thanks @dimbleby, all of this is extremely helpful! We did add some changes in 2.9 that might be related. I'll dig into this sometime tomorrow 👍

@manicminer manicminer self-assigned this Nov 16, 2021
@manicminer
Copy link
Contributor

Hi @dimbleby, from what I can tell, the provider is doing the right things during this sequence. I simulated the API error with an intercepting proxy and observed the same on my end. The error itself seems quite likely to be an API bug - at 14:13:08 the application is retrieved and the password credential is included in the response, yet shortly after a series of requests to delete that password fail sequentially with the error you pasted (the last request being at 14:13:23). The provider does seem to be retrying here, given the time delay and that when reproducing locally I can see the retries.

At this time I'm inclined to mark this as an API bug and will try to raise it upstream with the relevant folks. Sorry I don't have a better resolution for you here, I did try adding some mitigation to the code but since the preceding GET requests are successful, they don't help at all.

@dimbleby
Copy link
Author

Thanks. Please let me know if there's anything I can do to chip in on encouraging upstream to address the probable API bug.

Curious that we only start hitting this with 2.9.0, did anything change about the way that the provider accesses the API? Maybe just the timings?

(Quite the shame too - I was hoping to pick up the mitigation for #611.)

One oddity that I spotted is that after a DELETE we do a GET to verify that the thing is really deleted: and it looks as though the (expected) 404 causes retries eg at line 13514. Perhaps this is exactly happening because of the #611-mitigation, but I wonder whether it was (i) intended (ii) possibly somehow provoking this behaviour?

@manicminer
Copy link
Contributor

manicminer commented Nov 18, 2021

@dimbleby I'm not certain that any changes in 2.9.0 are contributing to your seeing this more with that version, could it perhaps be coincidence due to the intermittent nature of this API bug? We haven't changed anything with regards to deletion of app or SP passwords, and assuming that your Terraform config defines a relationship between the app and it's secrets, such that the secrets are destroyed first, there should be no material difference in behavior specifically for the password resource.

For the repeated GET requests after deletion (for apps and SPs), we do this intentionally as we're checking for 5 successive 404s to try and guarantee deletion :)

@dimbleby
Copy link
Author

dimbleby commented Nov 18, 2021

If it is only coincidence, it's becoming a very strong one - we have never hit this prior to 2.9.0 (and continue not to hit it in our regular pipelines) whereas we hit it more often than not at 2.9.0.

(Each of our pipelines sets up and destroys several applications and passwords; just one of these passwords erroring out is enough for the pipeline to fail. So I don't necessarily mean that we hit it on more than half of our resources).

So yes, I'm fairly convinced that something has changed at 2.9.0.

@manicminer
Copy link
Contributor

@dimbleby I can think of a potential cause if Terraform has not inferred a direct dependency between your application resource and your application_password resource. In that event, the recent changes in 2.9.0 could potentially be causing this error to surface.

Could you perhaps share a small configuration piece showing these two resources? It would be useful in trying to reproduce this.

@dimbleby
Copy link
Author

dimbleby commented Nov 19, 2021

We follow a pattern something like this:

locals {
  microsoft_graph_app_id = "00000003-0000-0000-c000-000000000000"
  microsoft_graph_user_read = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
}

data "azuread_client_config" "current" {}

resource "azuread_application" "foo" {
  count        = var.aad_foo_name == "" ? 0 : 1
  display_name = var.aad_foo_name
  owners       = [data.azuread_client_config.current.object_id]
  web {
    redirect_uris = [whatever]
  }

  required_resource_access {
    resource_app_id = local.microsoft_graph_app_id
    resource_access {
      id   = local.microsoft_graph_user_read
      type = "Scope"
    }
  }

  lifecycle {
    ignore_changes = [
      owners
    ]
  }
}

resource "azuread_service_principal" "foo" {
  count                        = var.aad_foo_name == "" ? 0 : 1
  application_id               = azuread_application.foo[0].application_id
  app_role_assignment_required = true
  owners                       = [data.azuread_client_config.current.object_id]

  lifecycle {
    ignore_changes = [
      owners
    ]
  }
}

resource "azuread_application_password" "foo" {
  count                 = var.aad_foo_name == "" ? 0 : 1
  application_object_id = azuread_application.foo.0.id
  display_name          = "Foo secret"
  end_date              = var.aad_password_end_date
}

@manicminer
Copy link
Contributor

Thanks, I'll try and use that for further testing to track this down

@hbuckle
Copy link

hbuckle commented Jan 4, 2022

I just started getting this as well after updating from 2.3.0 -> 2.13.0
Our config is pretty similar

resource "azuread_application" "azp" {
  display_name     = var.pool_name
  sign_in_audience = "AzureADMyOrg"
  owners = [
    data.azurerm_client_config.current.object_id,
  ]

  web {
    implicit_grant {
      access_token_issuance_enabled = true
      id_token_issuance_enabled     = true
    }
  }
}

resource "azuread_service_principal" "azp" {
  application_id               = azuread_application.azp.application_id
  app_role_assignment_required = false
  owners = [
    data.azurerm_client_config.current.object_id,
  ]
}

resource "time_rotating" "azp" {
  rotation_days = 365
}

resource "azuread_application_password" "azp" {
  application_object_id = azuread_application.azp.object_id
  end_date              = time_rotating.azp.rotation_rfc3339
}

@dimbleby
Copy link
Author

dimbleby commented Jan 7, 2022

At this time I'm inclined to mark this as an API bug and will try to raise it upstream with the relevant folks.

@manicminer did you manage to do this? Is there an upstream bug somewhere that I can go vote on? thanks!

@dimbleby
Copy link
Author

Just noting that this still happens at 2.33.0 - and we continue to use azuread 2.8.0 to avoid it.

@stuartin
Copy link

We just ran into this (and possibly #1578) , using similar config:

terraform version: 1.9.1
azuread verion: 3.0.2

data "azurerm_client_config" "current" {}

locals {
  name   = "terraform-test-lVg0yZIE-spn"
  owners = toset([
    data.azurerm_client_config.current.object_id
  ])
}

resource "azuread_application" "this" {
  display_name = local.name
  owners       = local.owners
}

resource "azuread_application_password" "this" {
  application_id = azuread_application.this.id
}

resource "azuread_service_principal" "this" {
  client_id = azuread_application.this.client_id
  owners    = local.owners
}

The error appears to happen when there is a parallel delete happening on the azuread_service_principal and azuread_application_password.

You can re-create it using Graph API:

$api = "https://graph.microsoft.com/v1.0"

# Create application
$appRes = az rest `
 --method post `
 --url "$api/applications" `
 --body "{'displayName': 'terraform-test-lVg0yZIE-spn'}" | ConvertFrom-Json

# Create application password
$appPassRes = az rest `
 --method post `
 --url "$api/applications/$($appRes.id)/addPassword" `
 --body "{'passwordCredentials': {}}" | ConvertFrom-Json

# Create spn
$spnRes = az rest `
 --method post `
 --url "$api/servicePrincipals" `
 --body "{'appId': '$($appRes.appId)'}" | ConvertFrom-Json

# Destroy things in parallel
 @(
    @{
        method = "post"
        url = "$api/applications/$($appRes.id)/removePassword"
        body = "{'keyId': '$($appPassRes.keyId)'}"
    },
    @{
        method = "delete"
        url = "$api/servicePrincipals/$($spnRes.id)"
        body = "{}"
    }
 ) | ForEach-Object -Parallel {
    az rest `
        --method $_.method `
        --url $_.url `
        --body $_.body
 }

Error:

ERROR: Not Found({"error":{"code":"Directory_ObjectNotFound","message":"Unable to read the company information from the directory.","innerError":{"date":"2024-12-31T20:17:19","request-id":"759dok757-0049-47ec-126b-8e918f9ea938","client-request-id":"75jb78d57-0049-47ec-856b-8e918f91j738"}}})

Workaround:
We need to manually add a dependency on the spn:

resource "azuread_service_principal" "this" {
  client_id = azuread_application.this.client_id
  owners    = local.owners

+ depends_on = [azuread_application_password.this]
}
# Dont delete in parallel (removes -Parallel)
 @(
    @{
        method = "post"
        url = "$api/applications/$($appRes.id)/removePassword"
        body = "{'keyId': '$($appPassRes.keyId)'}"
    },
    @{
        method = "delete"
        url = "$api/servicePrincipals/$($spnRes.id)"
        body = "{}"
    }
 ) | ForEach-Object {
    az rest `
        --method $_.method `
        --url $_.url `
        --body $_.body
 }

Hope that helps others!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants