-
When cancelling job there is chance that running software handle some resources that needs to release when job is cancelled. To make this robust way GitHub Actions should send signals for running process so that application can tear-down tasks properly. Gentle terminator would send first SIGINT, wait some seconds and if app still doesn’t die it would send another signal SIGTERM and finally SIGKILL to force terminate it eventually. Probably GH already manage this someway, but at least I didn’t find any documentation about subject. would be good to document it properly how it behaviours at the moment so it’s easier to propose changes if needed. Here is nice document for Jenkins about same issue: https://gist.github.com/datagrok/dfe9604cb907523f4a2f |
Beta Was this translation helpful? Give feedback.
Replies: 18 comments 21 replies
-
Thanks for your feedback. |
Beta Was this translation helpful? Give feedback.
-
According the introduction from the engineering team, after the user click “Cancel workflow”:
Hope this can help you understand better. |
Beta Was this translation helpful? Give feedback.
-
Hi @brightran I have a terraform/terragrunt action that when it is canceled it leaves state file locked in AWS dynamoDB. Then I need to force unlock it and most of the resources are not in state file so my state is corrupted.
And when I cancel the job all I see in action console is:
So it does not look like job allowed terraform to do all the necessary steps, to save state, release lock etc. |
Beta Was this translation helpful? Give feedback.
-
Hi, This is happening for us too, terraform does not have chance to release the state locks. If it is working as documented then it would be good to be able to extend the 7500ms shutdown time |
Beta Was this translation helpful? Give feedback.
-
I can also confirm that using I’ve tested locally sending |
Beta Was this translation helpful? Give feedback.
-
If you’re using terraform, I’d suggest using atlantis:
Terraform Pull Request Automation | AtlantisAtlantis: Terraform Pull Request Automation Yes, it means you’re running a small VM somewhere, but, it also means you don’t have to worry about it being killed. |
Beta Was this translation helpful? Give feedback.
-
I confirmed; Github runner is using SIGKILL for cancel-in-progress. |
Beta Was this translation helpful? Give feedback.
-
Can we make job termination operate more gentle? |
Beta Was this translation helpful? Give feedback.
-
This seems like a common issue for everybody running tasks like terraform - that need to gracefully exit. You can and should disable Is there any solution for this? |
Beta Was this translation helpful? Give feedback.
-
I wrote a demo to illustrate and confirm the behaviour described above, which really should be in the docs. At least on a Linux runner, "CTRL-C" means SIGINT and "CTRL-Break" (also CTRL-) means SIGTERM.
in this simple case the result observed was consistent with above docs - SIGINT, about 7.5s, SIGTERM, about 2.5s, and presumably SIGKILL + cleanup. But ... note that I've If I don't It looks like the github actions runner probably waits for the session leader process to exit, then hard-kills anything under it when it exits. It doesn't appear to deliver signals to the process tree by signalling the process group; AFAICS it only signals the leader process. So the leader must install a signal handler that explicitly propagates signals to child processes then waits for them to exit. |
Beta Was this translation helpful? Give feedback.
-
I wrote this up better in a demo at https://github.com/ringerc/github-actions-signal-handling-demo since I wasn't satisfied with the answers from @BrightRan above, nor my earlier quick tests. It's a right mess. It looks like you really need to rely on |
Beta Was this translation helpful? Give feedback.
-
Has something changed in the way Github processing of the job termination ? |
Beta Was this translation helpful? Give feedback.
-
Why is this marked as answered when it isn't really answered? How to avoid e.g. |
Beta Was this translation helpful? Give feedback.
-
Could we maybe adjust the wrapper so that it invokes terraform via |
Beta Was this translation helpful? Give feedback.
-
This is the workaround I'm using now ...
|
Beta Was this translation helpful? Give feedback.
-
Here is my workaround. Like @breathe , I don't use the terraform wrapper. Aside from that, I instead use tini to make sure all signals get propagated:
Which results in something like the following when the job gets cancelled:
|
Beta Was this translation helpful? Give feedback.
-
I built a simple tool which speeds up the signals propagation based on the above material: It can be just put to ...
jobs:
my-job:
runs-on: ubuntu-latest
steps:
- name: Long-running step
shell: signal-fanout {0}
run: |
for i in $(seq 1 30); do echo "$(date): $i"; sleep 1; done |
Beta Was this translation helpful? Give feedback.
-
FYI: We are using the following snippet as a workaround. - name: Terraform apply
id: apply
run: terraform apply -no-color -auto-approve
- name: Release lock if exists
if: ${{ steps.apply.outcome == 'cancelled' && always() }}
run: |
lock_id=$(terraform plan -no-color -refresh=false 2>&1 | grep ' ID: ' | cut -d: -f2 | tr -d ' ' || true)
if [[ -n "${lock_id}" ]]; then
terraform force-unlock -force ${lock_id}
fi |
Beta Was this translation helpful? Give feedback.
@jupe,
According the introduction from the engineering team, after the user click “Cancel workflow”:
The server will re-evaluate job-if condition on all running jobs.
If the job condition is always(), it will not get canceled.
For the rest of the jobs that need cancellation, the server will send a cancellation message to all the runners.
Each runner has 5 minutes to finish the cancellation process before the server force terminate the job.
The runner will re-evaluate if condition on the current running step.
If the step condition is always(), it will not get canceled.
Otherwise, the runner will send Ctrl-C to the action entry process (node for javascript action, docker for c…