ECS service creation intermittent failures with "Error: ECS service not created" #24565
Labels
bug
Addresses a defect in current functionality.
eventual-consistency
Pertains to eventual consistency issues.
service/ecs
Issues and PRs that pertain to the ecs service.
Milestone
Community Note
Terraform CLI and Terraform AWS Provider Version
Terraform v1.0.10
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.12.1
Affected Resource(s)
Terraform Configuration Files
Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.
Debug Output
Here is the anonymized debug output for the relevant resource.
https://gist.github.com/1rjt/bf4b303c9cab11e265775b41c0dffc15
Panic Output
Expected Behavior
ECS service created.
Actual Behavior
Terraform plan exits with error:
Error: ECS service not created: [ECS_SERVICE_ARN]
(snippet from debug output)
2022-05-05T02:15:01.282Z [DEBUG] provider.terraform-provider-aws_v4.12.1_x5: [aws-sdk-go] {"failures":[{"arn":"arn:aws:ecs:SOME-AWS-REGION:SOME-AWS-ACCOUNT-ID:service/some_ecs_service","reason":"MISSING"}],"services":[]}: timestamp=2022-05-05T02:15:01.282Z
2022-05-05T02:15:01.283Z [TRACE] maybeTainted: module.SOME-MODULE.module.some_file.aws_ecs_service.some_ecs_service[0] encountered an error during creation, so it is now marked as tainted
Steps to Reproduce
Only happens intermittently when doing 'terraform apply' on an ECS cluster with 14 services.
terraform apply
Important Factoids
The problem seems to happen when creating an ECS cluster with 14 services and 10 capacity providers. We create/destroy such clusters many times per day and it only happens in approx 1 in 10 "terraform plan" calls.
When inspecting AWS CloudTrail I can see that the ECS CreateService call was received and returned a correct response. The Cloudtrail event response includes the cluster details plus "status": "ACTIVE", so the cluster all looks OK.
Then within the same second I can see a call to "DescribeServices" with the ARN of the service that caused the error as a request parameter. The "DescribeServices" call seems to get no results and causes the error.
Perhaps an eventual consistency race condition? Immediately calling "describe-service" after "create-service" not always guaranteed?
References
The text was updated successfully, but these errors were encountered: