Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy refactoring #51

Merged

Conversation

joaoantoniocardoso
Copy link
Member

@joaoantoniocardoso joaoantoniocardoso commented May 25, 2024

This patch refactors the Dockerfile and the CI workflow to improve build speed.

Old

  • Worst possible time ref: ~5h 49m
  • Best possible time ref: ~2h 32m

New

  • Worst possible time: ~2h
  • Best possible time ref: ~2m

E.g.:

  • Adding cmatrix to BlueOs-base APT dependencies: <10m
  • Upgrading GST version to 1.24.3: ~1h 40m

General improvements:

  1. multi-staging
  2. caching
  3. build strategy
  4. Dockerfile syntax
  5. Dockerfile variable handling

Bugs fixed:

  1. The current workflow is not properly storing each platform artifact because the artifacts have the same name
  2. Correctly compute the image size. The master image size for x86_64 is 793MB, so the limit had to be increased to 900MB

Techniques implemented:

  1. Parallelizes the Docker build for each platform.

  2. Uses GitHub cache storage for Docker caching for each platform.

  3. Uses GitHub cache storage for APT cache for each platform, and by doing so, we share the downloaded packages between different Docker stages and/or incremental changes. Also, because we are storing all apt packages outside the container, we don't need to clean the container, ensuring we have thinner layers.

  4. Uses GitHub cache storage for user cache for each platform, including ccache and pip.

  5. Uses global variables to apply the same configuration to each Docker stage.

  6. Builds GStreamer in /tmp folder so it doesn't bloat the image.

Other techniques tested:

  1. Using apt-fast: it really does speed up the apt layers, but the additional complexity doesn't seem to pay off: (1) sometimes it just fails to download (configured with 16 parallel connections, no mirrors configured); (2) the apt caching strategy already helped a lot; and (3) because GSTreamer is so painful to build, the total time of apt is just a fraction (at least 1/10).

  2. Moving the gst_build.sh script into the Dockerfile GST build stage, splitting the single layer into many, making better use of the Docker cache when we change GSreamer configurations and versions. I dropped this because we are always rebuilding 100% of GStreamer when we change versions, and GStreamer is the longest part of this build, so it didn't seem to be worth keeping this change, as a shell script is much more readable and easier to maintain than a Dockerfile.

  3. I tried to pass the digests using the cache storage, but there is no easy way to make this work, artifacts are simpler. I choose to display an error annotation instead of making the deployment fail, as we will need the images somewhere to optimize them in case the size is bigger than the limit.

Things that can be improved:

  1. We could experiment with caching the meson builddir or their subprojects somehow, but it seems hard.

@joaoantoniocardoso joaoantoniocardoso marked this pull request as ready for review May 25, 2024 23:02
.github/workflows/deploy.yml Outdated Show resolved Hide resolved
platforms: ${{ matrix.platform }}
labels: ${{ steps.meta.outputs.labels }}
push: ${{ github.event_name != 'pull_request' }}
outputs: type=docker,dest=${{ env.ARTIFACTS_PATH }}/${{ env.PLATFORM_PAIR }}.tar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
IMAGE_LIMIT_SIZE_MB: 900
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, our images are around 300MB in docker base. Is this from blueos mais docker ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the tar file is under 300MB, but the noncompressed image is ~800MB. I compared the current image to the image from this PR and the only difference was a reduction of 3mb or so.

.github/workflows/deploy.yml Outdated Show resolved Hide resolved
@patrickelectric patrickelectric merged commit ef485f7 into bluerobotics:master May 27, 2024
4 checks passed
@joaoantoniocardoso joaoantoniocardoso deleted the improve-dockerfile branch May 28, 2024 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants