Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrity checks for R2 migration #3469

Open
4 of 8 tasks
UlisesGascon opened this issue Aug 22, 2023 · 5 comments
Open
4 of 8 tasks

Integrity checks for R2 migration #3469

UlisesGascon opened this issue Aug 22, 2023 · 5 comments
Assignees
Labels

Comments

@UlisesGascon
Copy link
Member

UlisesGascon commented Aug 22, 2023

TL:DR;

We will change the way we serve the binaries, so we want to ensure that the binaries are properly migrated. Additionally, we can take this opportunity to have some scripts (potentially GH actions) that we can use to check if the binaries are fine and the releases are correct.

Historical Context

We had being suffering from cache problems for a while:

Seems like the long term solution will be to relocate the binaries to R2:

Implementation

I started building a simple GitHub Action that collects all the releases and generates the URLs for all the available binaries. It then performs a basic HTTP request using curl to check the response headers. After that, it generates some metrics based on this and presents a simple report in markdown format.

While presenting this proof of concept in Slack, the collaborators provided super useful feedback and suggested features that we can implement.

Current approach

The idea of using a CRON Job to collect availability metrics may not be very effective for the cache issues scenario, but there are many features that can be valuable to us.

Features requested/ideas

  • Add support for iojs.org/dist as NVM depends on it (@ljharb)
  • Verify the R2 cutover (@flakey5 @MattIPv4 @ovflowd)
  • Store and validate the SHA for files does not change (@MattIPv4)
  • Check that the SHASUMS256 files are correctly signed (@UlisesGascon)
  • Check the binaries (@MattIPv4 @UlisesGascon)
    • Checksum matches the release SHASUMS256
    • Binaries described in the SHASUMS256 are available
    • Binaries are excluded from malware databases using VirusTotal
    • Binaries checksum matches the SHASUMS256

I will request to transfer the repo to the Node.js org when the code is stable and documented, currently is quite hacky code

Next steps

I have started to consolidate the feedback into issues:

Discovery

There are some things that bubble to the surface while implementing the systematic checks:

@richardlau
Copy link
Member

While I appreciate the effort I have some concerns.

I think you're trying to check two separate issues:

  1. The integrity of the files. e.g. are the SHASUMS properly signed and do the files match the SHAs?
  2. Whether the URL(s)/webserver is responding.

We currently do a very limited version of 1. in validate-downloads which only checks the binaries for the most recent versions of Node.js 16, 18 and 20 using jenkins/download-test.sh. It runs once per day (or on demand if manually run in Jenkins).

Cases where the files do not match the SHAs published in the SHASUMS:

  • Something went wrong in the release process. This only needs to be a one time check.
  • The files were not uploaded fully to the server (e.g. the disk filled up). Again only needs to be a one time check validating the file was uploaded correctly.
  • The webserver/cache service is misbehaving.
  • Someone or process with access inadvertently tampers with the files. We mitigate this by gating access -- even releasers do not have permissions to change the releases on the server once seven days have past (the seven days was originally because some platforms (e.g. arm32) were slow and released after the other platforms -- we haven't actually had phased platform releases in a long time (I think we even removed the bits from the release guide that mentioned this)).
  • The infrastructure has been compromised and a malicious actor tampers with the files. In this case they'd likely be able to also modify the SHASUMS files. In mitigation we also fully publish the signed SHASUMs in the release blog posts on the website, so an attacker would also need to compromise the website and the website's GH repository.

For 2. we currently know that we have cache purge issues that affect any number of the download URLs -- the extra monitoring if we were checking over HTTP every existing asset URL would be contributing negatively to the server load (even if retrieving just the headers as connection(s) will need to be made to the server).

I started building a simple GitHub Action that collects all the releases and generates the URLs for all the available binaries. It then performs a basic HTTP request using curl to check the response headers.

I hope this has rate limiting implemented -- this will be hundreds of files/HTTP requests.

@UlisesGascon
Copy link
Member Author

UlisesGascon commented Aug 22, 2023

Thanks a lot for the feedback @richardlau! :)

We currently do a very limited version of 1. in validate-downloads which only checks the binaries for the most recent versions of Node.js 16, 18 and 20 using jenkins/download-test.sh. It runs once per day (or on demand if manually run in Jenkins)

I was not aware of this job, and it basically covers a lot of the things that I was expecting to cover, so fewer things in my to-do list. 👍

Cases where the files do not match the SHAs published in the SHASUMS:

Only one case is relevant here: the infrastructure has been compromised and a malicious actor has tampered with the files.

We can check if the shasum files were modified. I already collect and update them when new releases are added. You can find them here. Then I can check if any of the checksums have changed and/or if the signatures are valid (in case of additions, aka new releases).

This way, we ensure that the immutability is still in place and there is no tampering with the new additions. The number of HTTP requests is quite low because the binary checksums are collected from the SHASUMS. The script only downloads the SHASUM files.

This can be a weekly job, executed on the weekends.

For 2. we currently know that we have cache purge issues that affect any number of the download URLs -- the extra monitoring if we were checking over HTTP every existing asset URL would be contributing negatively to the server load (even if retrieving just the headers as connection(s) will need to be made to the server).

I hope this has rate limiting implemented -- this will be hundreds of files/HTTP requests.

It ran during the weekend for a while and I have already removed the CRON. However, it can still be executed manually either on the local machine or by triggering the workflow in Github. I believe we can use this script for the R2 migration to ensure that all the binaries are transferred and that all the URLs are functioning correctly. Please note that the script only checks the headers and closes the connection, it does not attempt to download the binaries.

Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Jun 22, 2024
@MattIPv4
Copy link
Member

I don't believe this is stale, these checks will still be crucial once the R2 migration is completed.

@flakey5
Copy link
Member

flakey5 commented Jun 23, 2024

I don't believe this is stale, these checks will still be crucial once the R2 migration is completed.

+1

@github-actions github-actions bot removed the stale label Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants