Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace splices with reusable workflows #942

Merged
merged 5 commits into from
Jun 11, 2024
Merged

Conversation

danielrbradley
Copy link
Member

Part of #936

  • Reduce duplication in rendered workflows.
  • Splices are not composable (can't call one splice from another) so don't form good building blocks.

@danielrbradley danielrbradley self-assigned this May 28, 2024
@danielrbradley danielrbradley requested a review from a team May 28, 2024 21:17
@t0yv0
Copy link
Member

t0yv0 commented May 28, 2024

Splices had a reason to be, avoiding breaking branch protections, I think in #687

We might want to make sure that the protections still work or just follow up here as needed.

Copy link
Contributor

@blampe blampe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lovely!

Splices had a reason to be, avoiding breaking branch protections, I think in #687

AFAICT the job names haven't changed so existing protections should be unaffected. We should do #693 at some point - similar to how pu/pu only checks ci-ok.

@danielrbradley
Copy link
Member Author

Splices had a reason to be, avoiding breaking branch protections, I think in #687

We might want to make sure that the protections still work or just follow up here as needed.

Yup, testing for the branch protections makes sense - will test this via the xyz provider.

Reading through that thread, the discussion of the sentinel was interesting, though I think we should consider dropping the sentinel job altogether and just make our branch protections accurate for the jobs we care about.

The very precise combinations of conditionals and dependencies required to make it effective seem error-prone and non-obvious as to if there's a bug or not. The sentinel was introduced when our branch names were changing every week and so it was hard work keeping them in sync, but the sentinel was ultimately a poor attempt to avoid managing the rules. Our management of branch protections has always been pretty unreliable and gets out of sync easily. I think a bit of a redesign around this could make things simpler and more reliable.

An alternative design for the branch protections would be for the provider-ci CLI tool to have a command such as provider-ci sync-branch-protections or something similar which can be run by the repository itself to ensure it maintains the correct settings.

@danielrbradley
Copy link
Member Author

danielrbradley commented May 29, 2024

The name of the step does change because of the nested worklow:

-build_sdk (dotnet)
+build_sdk / build_sdk (dotnet)

Option 1: Avoid the name change by abstracting using a composite action rather than a reusable workflow. This makes the steps slightly less nice to look at in the UI, but avoids changing the naming.

Option 2: Stick with "sentinel" checks right at the end but make them simpler to comprehend and harder to mess up. Rather than relying on the job within the workflow to be marked as complete, let's just explicitly post to our own custom check to say "all required steps passed" and only require this one check.

  sentinel:
    name: sentinel
    needs:
    - test
    - license_check
    - lint
    runs-on: ubuntu-latest
    steps:
    - uses: guibranco/github-status-action-v2@latest
      with: 
        authToken: ${{secrets.GITHUB_TOKEN}}
        context: 'Sentinel'
        description: 'All required checks passed'
        state: 'success'
        sha: ${{github.event.pull_request.head.sha || github.sha}}

This avoids the need for expressions like if: (github.event_name == 'repository_dispatch' || github.event.pull_request.head.repo.full_name == github.repository) && ! cancelled() and contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled') || contains(needs.*.result, 'skipped'). This step can also be refactored without affecting the required checks breaking - as long as the value of context doesn't change.

@t0yv0
Copy link
Member

t0yv0 commented May 29, 2024

I think all prior art was done in a tremendous rush. If got time to think it through and especially test on xyz provider I think it's completely reasonable to rewrite from the ground up. The key is non-disruptive rollout that does not affect provider ops.

@danielrbradley
Copy link
Member Author

Ok, going to simplify the sentinel, roll it out, then come back to roll this out. Thanks for the ideating!

@mjeffryes
Copy link
Member

I would recommend syncing with @iwahbe and @guineveresaenger about the structure of the current sentinel checks before replacing them. IIRC, they had to do some surgery to make sure skipped and cancelled jobs reported the right status, (which is why those conditionals are so complex). Eg see the code comment in #687 (comment).

@iwahbe
Copy link
Member

iwahbe commented May 29, 2024

I'd like to see any changes to the sentinel before they roll out. Sentinels are required because GH actions doesn't have very intuitive behavior for how it conducts required checks, and it allows race conditions with auto-merge.

@danielrbradley
Copy link
Member Author

Here's the proposal PR for a better sentinel & migration plan: #944

danielrbradley added a commit that referenced this pull request May 30, 2024
Write an explicit status check called "Sentinel" which will only pass if
this code really runs. Once rolled out, we can make this the only
required check for PRs, then remove the old conditionals on this job and
remove the previous step.

This is a follow-up to the conversation
#942 (comment) where
we debated the merits of a single required check if we could simplify
the current sentinel logic to make it more reliable.

Here's the test run in the xyz provider:
https://github.com/pulumi/pulumi-xyz/actions/runs/9292939832?pr=145

Here's the old and new sentinel checks shown on the [test
PR](pulumi/pulumi-xyz#145):

![image](https://github.com/pulumi/ci-mgmt/assets/331676/2a9ac907-729e-415b-aac6-87ff29eb9d7b)
@guineveresaenger
Copy link
Contributor

I am in favor of this. I also appreciate the improved naming where each reusable workflow has its own identifiable name.

nit: Could we... structure the files to have reusable workflows in their own folder?

@danielrbradley
Copy link
Member Author

nit: Could we... structure the files to have reusable workflows in their own folder?

Do they not have to be in the .github/workflows folder for GitHub to recognise them?

pulumi-bot added a commit to pulumi/pulumi-aws that referenced this pull request Jun 7, 2024
This PR was automatically generated by the
update-workflows-ecosystem-providers workflow in the pulumi/ci-mgmt
repo, from commit 18d44a52b6ca8881f3d9c20d4ca562cdec02effc.

To fix the hard-coded steps the versions have been hard coded too for
now. This will be possible to resolve once
pulumi/ci-mgmt#942 is rolled out.

---------

Co-authored-by: Daniel Bradley <daniel@pulumi.com>
- Reduce duplication in rendered workflows.
- Splices are not composable (can't call one splice from another) so don't form good building blocks.
This has since moved out of the env.
@danielrbradley
Copy link
Member Author

I've rebased and updated this since completing the roll-out of the simplified sentinel check.

Copy link
Member

@iwahbe iwahbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very hard change to review. I'm comfortable merging if we have a test provider that has merged to master with these workflows.

@danielrbradley
Copy link
Member Author

This is a very hard change to review. I'm comfortable merging if we have a test provider that has merged to master with these workflows.

@iwahbe Yeah, diffs are a bit noisy, though hoped the separate commits would help.

My plan is to wait until tomorrow as there's the sentinel changes going out tonight, then I'll roll this to a single provider first, get a green master build, then merge.

Co-authored-by: Ian Wahbe <ian@wahbe.com>
@danielrbradley
Copy link
Member Author

Test release running on the xyz provider: https://github.com/pulumi/pulumi-xyz/actions/runs/9464972790

@danielrbradley danielrbradley merged commit 64140e4 into master Jun 11, 2024
5 checks passed
@danielrbradley danielrbradley deleted the remove-splices branch June 11, 2024 13:19
danielrbradley added a commit that referenced this pull request Jun 12, 2024
Stacked on top of #942 

- Provide dedicated option instead of relying on providers to use custom
steps which are run after tool installs.
- Run as first step so we don't delete things that have just been
installed.
- Default to off to maintain existing behaviour. This can take around 3
minutes to run, so we should only run it if we're having issues.

```yaml
# Set to true to clear disk space before running prerequisites workflow.
freeDiskSpaceBeforeBuild: false
# Set to true to clear disk space before running test jobs.
freeDiskSpaceBeforeTest: false
```

This fixes oddities in Azure and AWS where we've added steps in the
`preBuild` and `preTest` to clear disk space but then have to install
tools (e.g. dotnet) a second time as they get deleted.
danielrbradley added a commit that referenced this pull request Jun 12, 2024
Stacked on top of #942

Primary benefit is to reduce duplication within templates. This was not
possible when using splices because splices cannot reference splices.

Additionally, we want to make installing our standard versions of our
tools available for use from custom test jobs as used in AWS and other
providers so they don't have to hard code their own install actions.

- Go was not previously explicitly installed during the build_sdk job
but is used when running `make build_go`, so this adds it to ensure
we're using a consistent version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants