-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when destroying azuread_application_password #661
Comments
Hi @dimbleby, thanks for reporting this issue. This looks like an interesting error case, if you are able to reproduce and could post debug output that would be exceedingly helpful. Thanks! |
I have a repro, just need to redact credentials from the log... will update shortly. |
https://gist.github.com/dimbleby/88190ed4b89e0d50542c1e2f86aa109b#file-debug-log (I've removed the client secret and all access tokens that it obtains. All applications etc mentioned in this log are now destroyed, so it doesn't matter that their details remain visible). The 404 appears at line 13689. I have started seeing this when taking the upgrade from azuread 2.7.0 to 2.9.0. I don't know whether this is causal, or only coincidence. |
Just updating to note that after a few retries both up- and down-level: I am pretty convinced that we hit this at 2.9.0 but not at 2.7.0 |
Noting, in case it helps, that it seems that we do not hit this at 2.8.0 (but do continue to hit it at 2.9.0) |
Thanks @dimbleby, all of this is extremely helpful! We did add some changes in 2.9 that might be related. I'll dig into this sometime tomorrow 👍 |
Hi @dimbleby, from what I can tell, the provider is doing the right things during this sequence. I simulated the API error with an intercepting proxy and observed the same on my end. The error itself seems quite likely to be an API bug - at 14:13:08 the application is retrieved and the password credential is included in the response, yet shortly after a series of requests to delete that password fail sequentially with the error you pasted (the last request being at 14:13:23). The provider does seem to be retrying here, given the time delay and that when reproducing locally I can see the retries. At this time I'm inclined to mark this as an API bug and will try to raise it upstream with the relevant folks. Sorry I don't have a better resolution for you here, I did try adding some mitigation to the code but since the preceding GET requests are successful, they don't help at all. |
Thanks. Please let me know if there's anything I can do to chip in on encouraging upstream to address the probable API bug. Curious that we only start hitting this with 2.9.0, did anything change about the way that the provider accesses the API? Maybe just the timings? (Quite the shame too - I was hoping to pick up the mitigation for #611.) One oddity that I spotted is that after a DELETE we do a GET to verify that the thing is really deleted: and it looks as though the (expected) 404 causes retries eg at line 13514. Perhaps this is exactly happening because of the #611-mitigation, but I wonder whether it was (i) intended (ii) possibly somehow provoking this behaviour? |
@dimbleby I'm not certain that any changes in 2.9.0 are contributing to your seeing this more with that version, could it perhaps be coincidence due to the intermittent nature of this API bug? We haven't changed anything with regards to deletion of app or SP passwords, and assuming that your Terraform config defines a relationship between the app and it's secrets, such that the secrets are destroyed first, there should be no material difference in behavior specifically for the password resource. For the repeated GET requests after deletion (for apps and SPs), we do this intentionally as we're checking for 5 successive 404s to try and guarantee deletion :) |
If it is only coincidence, it's becoming a very strong one - we have never hit this prior to 2.9.0 (and continue not to hit it in our regular pipelines) whereas we hit it more often than not at 2.9.0. (Each of our pipelines sets up and destroys several applications and passwords; just one of these passwords erroring out is enough for the pipeline to fail. So I don't necessarily mean that we hit it on more than half of our resources). So yes, I'm fairly convinced that something has changed at 2.9.0. |
@dimbleby I can think of a potential cause if Terraform has not inferred a direct dependency between your application resource and your application_password resource. In that event, the recent changes in 2.9.0 could potentially be causing this error to surface. Could you perhaps share a small configuration piece showing these two resources? It would be useful in trying to reproduce this. |
We follow a pattern something like this: locals {
microsoft_graph_app_id = "00000003-0000-0000-c000-000000000000"
microsoft_graph_user_read = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
}
data "azuread_client_config" "current" {}
resource "azuread_application" "foo" {
count = var.aad_foo_name == "" ? 0 : 1
display_name = var.aad_foo_name
owners = [data.azuread_client_config.current.object_id]
web {
redirect_uris = [whatever]
}
required_resource_access {
resource_app_id = local.microsoft_graph_app_id
resource_access {
id = local.microsoft_graph_user_read
type = "Scope"
}
}
lifecycle {
ignore_changes = [
owners
]
}
}
resource "azuread_service_principal" "foo" {
count = var.aad_foo_name == "" ? 0 : 1
application_id = azuread_application.foo[0].application_id
app_role_assignment_required = true
owners = [data.azuread_client_config.current.object_id]
lifecycle {
ignore_changes = [
owners
]
}
}
resource "azuread_application_password" "foo" {
count = var.aad_foo_name == "" ? 0 : 1
application_object_id = azuread_application.foo.0.id
display_name = "Foo secret"
end_date = var.aad_password_end_date
} |
Thanks, I'll try and use that for further testing to track this down |
I just started getting this as well after updating from 2.3.0 -> 2.13.0 resource "azuread_application" "azp" {
display_name = var.pool_name
sign_in_audience = "AzureADMyOrg"
owners = [
data.azurerm_client_config.current.object_id,
]
web {
implicit_grant {
access_token_issuance_enabled = true
id_token_issuance_enabled = true
}
}
}
resource "azuread_service_principal" "azp" {
application_id = azuread_application.azp.application_id
app_role_assignment_required = false
owners = [
data.azurerm_client_config.current.object_id,
]
}
resource "time_rotating" "azp" {
rotation_days = 365
}
resource "azuread_application_password" "azp" {
application_object_id = azuread_application.azp.object_id
end_date = time_rotating.azp.rotation_rfc3339
} |
@manicminer did you manage to do this? Is there an upstream bug somewhere that I can go vote on? thanks! |
Just noting that this still happens at 2.33.0 - and we continue to use azuread 2.8.0 to avoid it. |
We just ran into this (and possibly #1578) , using similar config: terraform version: 1.9.1 data "azurerm_client_config" "current" {}
locals {
name = "terraform-test-lVg0yZIE-spn"
owners = toset([
data.azurerm_client_config.current.object_id
])
}
resource "azuread_application" "this" {
display_name = local.name
owners = local.owners
}
resource "azuread_application_password" "this" {
application_id = azuread_application.this.id
}
resource "azuread_service_principal" "this" {
client_id = azuread_application.this.client_id
owners = local.owners
} The error appears to happen when there is a parallel delete happening on the You can re-create it using Graph API: $api = "https://graph.microsoft.com/v1.0"
# Create application
$appRes = az rest `
--method post `
--url "$api/applications" `
--body "{'displayName': 'terraform-test-lVg0yZIE-spn'}" | ConvertFrom-Json
# Create application password
$appPassRes = az rest `
--method post `
--url "$api/applications/$($appRes.id)/addPassword" `
--body "{'passwordCredentials': {}}" | ConvertFrom-Json
# Create spn
$spnRes = az rest `
--method post `
--url "$api/servicePrincipals" `
--body "{'appId': '$($appRes.appId)'}" | ConvertFrom-Json
# Destroy things in parallel
@(
@{
method = "post"
url = "$api/applications/$($appRes.id)/removePassword"
body = "{'keyId': '$($appPassRes.keyId)'}"
},
@{
method = "delete"
url = "$api/servicePrincipals/$($spnRes.id)"
body = "{}"
}
) | ForEach-Object -Parallel {
az rest `
--method $_.method `
--url $_.url `
--body $_.body
} Error:
Workaround: resource "azuread_service_principal" "this" {
client_id = azuread_application.this.client_id
owners = local.owners
+ depends_on = [azuread_application_password.this]
} # Dont delete in parallel (removes -Parallel)
@(
@{
method = "post"
url = "$api/applications/$($appRes.id)/removePassword"
body = "{'keyId': '$($appPassRes.keyId)'}"
},
@{
method = "delete"
url = "$api/servicePrincipals/$($spnRes.id)"
body = "{}"
}
) | ForEach-Object {
az rest `
--method $_.method `
--url $_.url `
--body $_.body
} Hope that helps others! |
Community Note
Terraform (and AzureAD Provider) Version
terraform 1.0.7, azuread provider 2.9.0
Affected Resource(s)
azuread_application_password
Actually the output isn't quite explicit that this is the affected resource; but I suppose it must be.
Terraform Configuration Files
Can't share our full configuration, but the basic set up is that we have a
azuread_application
, aazuread_service_principal
and aazuread_application_password
.Can try to shrink to a shareable reproduction if needed.
Debug Output
Coming soon, I hope. Though experience does suggest that bugs will often refuse to reproduce when you turn on the debug logs...
Panic Output
Expected Behavior
Resources are successfully destroyed at
terraform destroy
Actual Behavior
Steps to Reproduce
terraform apply
terraform destroy
Important Factoids
References
The text was updated successfully, but these errors were encountered: