-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitHub action should fail if deployment circuit breaker / rollback is triggered #191
Comments
How is the deployment circuit breaker enabled? I do not see the option being submitted to ecs.updateService. https://github.com/aws-actions/amazon-ecs-deploy-task-definition/blob/master/index.js#L26 |
@CalebC-RW the circuit breaker is an ECS service feature. You can see it referenced in the link above or in the AWS CLI ECS update-service documentation for example. |
@spyoungtech, thanks for replying! I see that the circuit breaker feature can be enabled on the service (using the old ECS experience on the console or when creating the service). We have tried to enable it on the service and then run this action, but it seems to not be taking effect (failing deployments are being retried repeatedly and not being rolled back). Does it need to be enabled with the "updateService" call (during deployment) as well as be enabled for the service? If it does, this action does not seem to enable the flag when it calls updateService. Or, am I completely misunderstanding this feature. |
I believe I have addressed my questions. The rollback process does seem to work (without specifying it on the update-service instruction), though it does take time. I have not determined how many attempts that the deployment makes before the circuit breaker kicks in and performs the rollback. My services are such that if the deployment fails, they will almost certainly fail after the first deployment attempt and consistently thereafter. After many tests, we have determined that setting the wait-time to a little longer than a successful deployment but shorter than a few failures lets the action fail when the deployment fails and succeed when it succeeds. I wish that this action/deployment process let us more explicitly specify the number of deployment attempts before rollback and to let the action fail when a rollback occurs (as @spyoungtech originally requested). |
We built into our pipeline a post-deployment check that verifies that the new registered revision is the current task definition in ECS. |
Like @erikahansen pointed out, we also ended up adding an extra step doing a post deployment check to verify this. Here's how we did it:
However, it would be good to have this action checking that already... I will try to find the time to work on it. |
+1 for this, please make this workaround part of the core functionality of this Action. |
+1 we need this feature too, currently failures in prod are hidden without this workaround |
+1 for this. I need this workaround to be a part of the core functionality. |
If ECS circuit breaker is enabled for a service the ECS deploy step can give a false successful result, as the triggered rollback is detected as "healthy". The task definition ARN can be checked against the expectation as a workaround. See: aws-actions/amazon-ecs-deploy-task-definition#191 TIS21-4819 TIS21-5327
+1 critically important feature |
Very important to fix |
+1 really should be part of core functionality now |
+1 for this feature |
Hi, |
+1 for this feature |
+1 |
1 similar comment
+1 |
Problem statement
Right now, this action only waits for service stabilization (without regard to how that happens or what deployment actually becomes stable)
With the new deployment circuit breaker feature, it's possible that a deployment will fail, but the service will eventually become stable when the rollback deployment occurs.
This means that workflows may continue on to subsequent steps, even though the service deployment actually failed
What is the bug/problem behavior?
The GitHub action reports success in the case that the service stabilizes after a rollback is triggered by the circuit breaker
This is present in
aws-actions/amazon-ecs-deploy-task-definition@v1
What is the desired/correct behavior
The GitHub action should be considered a failure if the deployment circuit breaker is triggered
Steps to reproduce
Notes
Possible implementation details
Perhaps the action could, upon stabilization, describe the service and see that the expected task revision is contained in the active deployment.
Workarounds
A possible workaround is to add an additional step in your GitHub actions workflow to check the
describe-services
API response to see if the circuit breaker was triggered on the deployment and the expected taskDefinition revision (using the task definition ARN output) is present in the active deployment.The text was updated successfully, but these errors were encountered: