Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub action should fail if deployment circuit breaker / rollback is triggered #191

Open
spyoungtech opened this issue Apr 5, 2021 · 17 comments
Labels
feature-request A feature should be added or improved.

Comments

@spyoungtech
Copy link

spyoungtech commented Apr 5, 2021

Problem statement

Right now, this action only waits for service stabilization (without regard to how that happens or what deployment actually becomes stable)

With the new deployment circuit breaker feature, it's possible that a deployment will fail, but the service will eventually become stable when the rollback deployment occurs.

This means that workflows may continue on to subsequent steps, even though the service deployment actually failed

What is the bug/problem behavior?

The GitHub action reports success in the case that the service stabilizes after a rollback is triggered by the circuit breaker

This is present in aws-actions/amazon-ecs-deploy-task-definition@v1

What is the desired/correct behavior

The GitHub action should be considered a failure if the deployment circuit breaker is triggered

Steps to reproduce

  1. Create an ECS service with deployment circuit breaker with rollbacks enabled
  2. Deploy a change with this action that's expected to fail and trigger the circuit breaker
  3. Observe that the action job is successful, despite the deployment failing.

Notes

Possible implementation details

Perhaps the action could, upon stabilization, describe the service and see that the expected task revision is contained in the active deployment.

Workarounds

A possible workaround is to add an additional step in your GitHub actions workflow to check the describe-services API response to see if the circuit breaker was triggered on the deployment and the expected taskDefinition revision (using the task definition ARN output) is present in the active deployment.

@allisaurus allisaurus added the feature-request A feature should be added or improved. label Apr 8, 2021
@CalebC-RW
Copy link

How is the deployment circuit breaker enabled? I do not see the option being submitted to ecs.updateService. https://github.com/aws-actions/amazon-ecs-deploy-task-definition/blob/master/index.js#L26

@spyoungtech
Copy link
Author

@CalebC-RW the circuit breaker is an ECS service feature. You can see it referenced in the link above or in the AWS CLI ECS update-service documentation for example.

@CalebC-RW
Copy link

@spyoungtech, thanks for replying! I see that the circuit breaker feature can be enabled on the service (using the old ECS experience on the console or when creating the service). We have tried to enable it on the service and then run this action, but it seems to not be taking effect (failing deployments are being retried repeatedly and not being rolled back). Does it need to be enabled with the "updateService" call (during deployment) as well as be enabled for the service? If it does, this action does not seem to enable the flag when it calls updateService.

Or, am I completely misunderstanding this feature.

@CalebC-RW
Copy link

CalebC-RW commented Feb 17, 2022

I believe I have addressed my questions. The rollback process does seem to work (without specifying it on the update-service instruction), though it does take time. I have not determined how many attempts that the deployment makes before the circuit breaker kicks in and performs the rollback. My services are such that if the deployment fails, they will almost certainly fail after the first deployment attempt and consistently thereafter. After many tests, we have determined that setting the wait-time to a little longer than a successful deployment but shorter than a few failures lets the action fail when the deployment fails and succeed when it succeeds.

I wish that this action/deployment process let us more explicitly specify the number of deployment attempts before rollback and to let the action fail when a rollback occurs (as @spyoungtech originally requested).

@erikahansen
Copy link

We built into our pipeline a post-deployment check that verifies that the new registered revision is the current task definition in ECS.

@bmbferreira
Copy link

Like @erikahansen pointed out, we also ended up adding an extra step doing a post deployment check to verify this. Here's how we did it:

...
- name: Deploy to ECS
  uses: aws-actions/amazon-ecs-deploy-task-definition@v1
  id: ecs-deploy
  with:
    task-definition: ${{ steps.task-def.outputs.task-definition }}
    service: ${{ matrix.service }}
    cluster: ${{ env.CLUSTER_NAME }}
- name: Check if deployment was successful
  id: check-deployment
  run: |
     CURRENT_TASK_DEF_ARN=$(aws ecs describe-services --cluster ${{ env.CLUSTER_NAME }} --services ${{ matrix.service }} --query services[0].deployments[0].taskDefinition | jq -r ".")
     NEW_TASK_DEF_ARN=${{ steps.ecs-deploy.outputs.task-definition-arn }}
     echo "Current task arn: $CURRENT_TASK_DEF_ARN"
     echo "New task arn: $NEW_TASK_DEF_ARN"
     if [ "$CURRENT_TASK_DEF_ARN" != "$NEW_TASK_DEF_ARN" ]; then
       echo "Deployment failed."
       exit 1
     fi
...

However, it would be good to have this action checking that already... I will try to find the time to work on it.

@chrispetsos
Copy link

+1 for this, please make this workaround part of the core functionality of this Action.

@eagleeyec
Copy link

+1 we need this feature too, currently failures in prod are hidden without this workaround

@lexabug
Copy link

lexabug commented Aug 14, 2023

+1 for this. I need this workaround to be a part of the core functionality.

Judge40 added a commit to Health-Education-England/.github that referenced this issue Nov 8, 2023
If ECS circuit breaker is enabled for a service the ECS deploy step can
give a false successful result, as the triggered rollback is detected as
"healthy".

The task definition ARN can be checked against the expectation as a
workaround.
See:
aws-actions/amazon-ecs-deploy-task-definition#191

TIS21-4819
TIS21-5327
@jdjkelly
Copy link

+1 critically important feature

@healthbjk
Copy link

Very important to fix

@larry-flexpa
Copy link

+1 really should be part of core functionality now

@raf-stx
Copy link

raf-stx commented Jun 10, 2024

+1 for this feature

@vitalykarasik
Copy link

Hi,
Is there any update on this?

@ArthurMor4is
Copy link

+1 for this feature

msjang4 added a commit to Team-BomBomBom/Server that referenced this issue Oct 14, 2024
@cwoodhayes
Copy link

+1

1 similar comment
@Orrison
Copy link

Orrison commented Jan 8, 2025

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

17 participants