Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci build errors due to cancelled jobs #2908

Closed
reubenmiller opened this issue May 30, 2024 · 4 comments
Closed

ci build errors due to cancelled jobs #2908

reubenmiller opened this issue May 30, 2024 · 4 comments
Assignees
Labels
bug Something isn't working ci/cd Repository management and pipeline topics

Comments

@reubenmiller
Copy link
Contributor

reubenmiller commented May 30, 2024

Describe the bug

Recent failures on the merge queue show unexpected behaviour when a build job is cancelled. This resulted in the workflow continuing even though one architecture failed resulting in only half the packages being published.

Runs with unexpected behaviour

To Reproduce

Expected behavior

Screenshots

Environment (please complete the following information):

  • OS [incl. version]
  • Hardware [incl. revision]
  • System-Architecture [e.g. result of "uname -a"]
  • thin-edge.io version [e.g. 0.1.0]

Additional context

@reubenmiller reubenmiller added the bug Something isn't working label May 30, 2024
@reubenmiller reubenmiller self-assigned this May 30, 2024
reubenmiller added a commit to reubenmiller/thin-edge.io that referenced this issue May 30, 2024
Prevent progression of the workflow if a build job is cancelled or skipped.

Resolves thin-edge#2908

Signed-off-by: Reuben Miller <reuben.d.miller@gmail.com>
@reubenmiller
Copy link
Contributor Author

A PR was merged to add an additional dependency to the build job, and the subsequent behaviour will be monitored.

@reubenmiller
Copy link
Contributor Author

There is an active discussion going on for similar symptoms to:

The hosted runner encountered an error while running your job. (Error Type: Failure).

The above was taken from this run:

image

@reubenmiller
Copy link
Contributor Author

Root cause

There is an active Github Issue where other projects have also reported similar issues with Github Workflows.

There seems to be a problem with the Github Runner which results in jobs sporadically being set to skipped, due to unknown reasons. The error can manifest in slightly different ways, but the following are the symptoms seen in the thin-edge.io project:

  • Job with a matrix build is marked as "Skipped" with the description "This job was skipped"
  • An error annotation is added to the workflow run
    The hosted runner encountered an error while running your job. (Error Type: Failure).
    

Secondary effects

The root cause resulted in an unexpected side-effect where the publish job was still running after the check-build was failing was due to the usage of always() in the job's if block. The usage of always() is required as the upstream test job is conditional, and without always(), the job was also be skipped.

To handle the skip case better, the explicit check for the success result on the check-build job was added.

publish:
  name: Publish ${{ matrix.job.target }}
  if: |
    always() &&
    github.event_name != 'pull_request_target' &&
    (needs.check-build.result == 'success') &&
    (needs.test.result == 'success' || needs.test.result == 'skipped')

@reubenmiller
Copy link
Contributor Author

The linked github issue has been closed and no more job cancellations have been observed.

@reubenmiller reubenmiller added the ci/cd Repository management and pipeline topics label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci/cd Repository management and pipeline topics
Projects
None yet
Development

No branches or pull requests

1 participant