Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate build to GitHub actions #1151

Merged
merged 47 commits into from
Aug 24, 2023

Conversation

DavidLawes
Copy link
Contributor

@DavidLawes DavidLawes commented Aug 21, 2023

What does this change?

This PR adds GitHub workflows for the build of each of the apps inside in the n10n repo. Previously the builds were managed by TeamCity, but after this change is merged the corresponding TeamCity config will be deleted.

The current TeamCity configuration builds all projects, on every commit to every branch. The workflows in this PR change things slightly:

  • Each project has its own workflow file
  • Each project will be built on every commit, but only if the code changed in the commit matches one of the paths specified in the corresponding workflow
  • Instead of using sbt-riffraff-artifact, we now upload assets to riff-raff using the Guardian's action instead.

Each workflow follows one of these patterns:

  • Lambda assembly: event-consumer, fake-breaking-news, football, report-extractor, schedule-lambda, slo-monitor. This workflow typically just compiles and tests the project, then uses the sbt assembly command to generate a .jar.
  • Debian package: notification, registration, report. This workflow typically compiles and tests the project, then uses the debian:packageBin command (using plugin sbt-native-packager) to generate a .deb.
  • Docker image: notification-worker-lambdas. This workflow has to do something slightly more fiddly, it needs to upload images to ECR, which requires a valid authorisation token. There is some extra setup and steps in this type of workflow.

NB: not all projects have been converted to CDK, so the CDK synth step is not always required.

Docker lambdas

The notificationworkerlambda project deploys to docker lambdas, the historical reason for this is because the size of the .zip exceeded AWS's maximum size whereas docker lambdas don't impose such a limit.

To use docker lambdas we need to upload images to a remote ECR repository. In order to push images to the repository we need to login to ECR. Previously we managed the docker auth/deploy using script, in the new workflow I've changed things slightly:

  • I use an aws action to manage the login to ECR
  • I pulled out the docker commands into the workflow step explicitly (I think this helps the readability of the workflow file)

For a login to be successful we need to supply a custom role. I followed a couple of guides, which I'll link inline below, but my starting point was here. The custom role has been configured as following:

  • Create a new identity provider in the mobile aws account for the GitHub token provider service
  • Define a new role in the account that has a trust relationship with the GitHub identity provider, restricting access only to the n10n repo, following this guide.
  • Give the new role permissions to get an ECR token and put an ECR image (a bit of trial and error to work out the correct permissions)
  • Provide the role as a repository secret and use the role in the workflow.

I manually created the new role in the aws account, it's name is mobile-n10n-deploy-role. I believe I've applied the minimum permissions necessary for the workflow to succeed (including limiting the scope of any permissions applied to just the n10n ECR repository):

  • To get the token: ecr:GetAuthorizationToken
  • To publish an image: ecr:CompleteLayerUpload, ecr:UploadLayerPart, ecr:InitiateLayerUpload, ecr:BatchCheckLayerAvailability, ecr:PutImage

Given login was successful, the workflow compiles and tests the code and then uses the docker publishLocal command to generate a local image.

A small change I had to make was to explicitly tag the local image using the expected naming convention ($repository-url:$build-number). I'm not quite sure how this was working in the TeamCity build, but this current solution does work. If we migrated to using a docker build command, I think we can explicitly set a tag value, but I couldn't see anything obvious/immediate in the packager docs.

Given a local docker image exists with the expected tag we can then push the image to the remote ECR repository.

Testing

Service Testing
Event consumer Successful deploy to CODE, lambda continued to be triggered every 5 mins, no errors in logs, athena queries were successfully executed.
Fake breaking news lambda Successful deploy to CODE, lambda executed successfully after being triggered manually via the aws console, expected logs could be found in kibana.
Football Successful deploy to CODE.
Notification Successful deploy to CODE, service successfully received a request to send a notification, no errors in logs
Notification worker lambda Successful deploy to CODE, after sending a test notification it was observed that the harvester successfully retrieved tokens from the db and then ios and android lambdas were invoked to deliver the messages (received on real device). For these lambdas the expected logs could be found in kibana. Other services tested: registration-cleaning-worker (continued to process/delete dodgy tokens from a queue), topic-counter (continued to be triggered on expected cron, continued to retrieve/count/store topic counts), expired-registration-cleaner (successfully triggered manually via the aws console)
Registration Successful deploy to CODE, request to register a new device was successful, corresponding token (with expected topic) could be retrieved from the database.
Report Successful deploy to CODE, api responded successfully to /healthcheck, /notifications and /notifications/:id requests.
Report extractor Successful deploy to CODE, lambda executed successfully after triggering it manually via the aws console, expected logs could be found in kibana (no errors).
Schedule Successful deploy to CODE, lambda continued to be triggered every minute, logs appeared normal (no errors).
SLO monitor Successful deploy to CODE, after sending a breaking news notification in CODE I could then see the lambda invoked successfully, along with expected logs (no unexpected errors reported).

Copy link
Contributor

@Mark-McCracken Mark-McCracken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! (assuming your builds all run successful, which is a big assumption).
All looks fine to me, I've left a few tiny optional comments, but you've definitely got this right.
Lovely code, I could read that all day

.github/workflows/event-consumer.yml Outdated Show resolved Hide resolved
Comment on lines 41 to 51
- name: Compile, test and assembly
run: sbt "project eventconsumer" "compile" "test" "assembly"

- name: Upload to riff-raff
uses: guardian/actions-riff-raff@v2
with:
projectName: mobile-n10n:eventconsumer
configPath: eventconsumer/riff-raff.yaml
contentDirectories: |
eventconsumer:
- eventconsumer/target/scala-2.13/eventconsumer.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes I've had differences between what ran locally and what came out in github actions, for the version of scala, ie. 2.12 locally vs. 2.13 in GHA, and those could change in theory.
Sometimes to fix this, I've added something like this:

      - name: copy jar to root dir
        run: cp eventconsumer/target/scala-*/eventconsumer.jar .

then in the riffraff action, provided the path at the root directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea - I've added this extra step :)

.github/workflows/fake-breaking-news.yml Outdated Show resolved Hide resolved
.github/workflows/football.yml Outdated Show resolved Hide resolved
cd cdk
npm install
npm run lint
npm test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're writing tests, might be nice to add a test reporter step after this.
This is the one I've used:
https://github.com/dorny/test-reporter/tree/main

and here's an example:
https://github.com/guardian/janus/blob/e41b96a9221c0f2cf5d558990d81b3d8dd981bbc/.github/workflows/build.yml#L55-L64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking at the test reports - for the scala tests we use specs2, which output test data in the .stats format. I don't think this is supported by the action. I tested anyway and got an expected error.

For the cdk tests, I can't see the output of the tests on my local machine, so not entirely sure what path to specify for the reporter. Will do a little more digging.

.github/workflows/notification-worker-lambdas.yml Outdated Show resolved Hide resolved
@DavidLawes DavidLawes marked this pull request as ready for review August 24, 2023 09:53
@DavidLawes DavidLawes requested a review from a team August 24, 2023 09:53
Copy link
Contributor

@tkgnm tkgnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely stuff! Thanks for doing this - this looks ginormous. Do we have a ticket to convert the non-cdk stuff in this project to cdk?

@DavidLawes
Copy link
Contributor Author

Lovely stuff! Thanks for doing this - this looks ginormous. Do we have a ticket to convert the non-cdk stuff in this project to cdk?

Yep, it's a bit big, sorry! About the cdk migration there's a ticket in the backlog https://theguardian.atlassian.net/browse/LIVE-4170

@DavidLawes DavidLawes merged commit d85cc38 into main Aug 24, 2023
2 checks passed
@DavidLawes DavidLawes deleted the LIVE-5304-migrate-build-to-github-actions branch August 24, 2023 10:33
@DavidLawes
Copy link
Contributor Author

After deploying this change:

  • All checks passed
Screenshot 2023-08-24 at 11 39 59
  • Continuous deployments to prod were successful

Screenshot 2023-08-24 at 11 44 34

@DavidLawes
Copy link
Contributor Author

NB: these workflows make use of some repository secrets, I've made copies of the values of these in SSM:

  • AWS_ROLE_ARN: param name /mobile-n10n/github-build-action/aws-role-arn
  • NOTIFICATION_LAMBDA_REPOSITORY_ID: param name /mobile-n10n/github-build-action/repository-id
  • NOTIFICATION_LAMBDA_REPOSITORY_URL: param name /mobile-n10n/github-build-action/repository-url

davidfurey added a commit to guardian/story-packages that referenced this pull request Oct 5, 2023
Also switches from deprecated sbt-riffraff-artifact plugin to devx recommendation guardian/actions-riff-raff.

Based on guardian/mobile-n10n#1151 and guardian/s3-upload#53
davidfurey added a commit to guardian/story-packages that referenced this pull request Oct 6, 2023
Migrate from Teamcity to Github actions

Also switches from deprecated sbt-riffraff-artifact plugin to devx recommendation guardian/actions-riff-raff.

Based on guardian/mobile-n10n#1151 and guardian/s3-upload#53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants