Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with cache mount mechanism to improve devcontainer build performance #345

Open
shikanime opened this issue Nov 27, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@shikanime
Copy link

shikanime commented Nov 27, 2023

Description

This proposal aims to integrate with cache mount caching mechanisms to enhance the performance of devcontainer builds. Rebuilding devcontainers frequently is a common practice due to various factors, such as frequently working on different projects, upgrading tool version or editing devcontainer specification. To address this issue, Buildkit introduced the RUN --mount feature to fix practice such as apk add --no-cache or rm -rf /var/cache/apt/archives /var/lib/apt/lists/*, which is actually utilized by the devcontainer building script for mounting features scripts. Exposing an API for features to leverage the cache mount would be beneficial for caching directories like /var/cache/apt/archives.

Motivation

Building containers can be a resource-intensive process, both in terms of compute and network resources. A notable example is installing home-manager in a container where a significant amount of developer experience programs are shared, such as oh my zsh configurations, custom shells, and versioning tooling. All of these contributions can increase the container size by gigabytes. The only known solution to this issue is to move the some steps towards hooks, as demonstrated in my script and Ken Muse's article. This approach allows for offloading the build task to hooks and utilizing mounts.

Proposed Solution

To address the aforementioned concerns, I propose introducing a new configuration option in the specification to enable the configuration of one or more mount type caches such as:

{
  "build": {
    "mounts": [
      {"type": "cache", "id": "apt-cache", "target": "/var/cache/apt/archives" }
    ]
  }
}

Implementation Challenges

While this proposal addresses the integration of caching mechanisms for devcontainer builds, it doesn't encompass solutions for user relative cache directories like local $HOME/.cache/pip directories under user home paths. It primarily solve global caching mechanisms, such as /var/cache.

Furthermore, the distinction between runtime and build-time caching should be carefully considered. Installing dependencies during the install.sh phase allows for immediate access to those dependencies for dependent features, while utilizing hooks enables caching to be shared with the user's runtime environment.

@chrmarti
Copy link
Contributor

chrmarti commented Dec 1, 2023

Thanks for bringing this up, this sounds like a good idea!

I guess, this would be an addition to the devcontainer-feature.json only and not the devcontainer.json as that can use the Dockerfile to do the build-time mounts.

On built vs runtime caching: Not keeping the caches in the image makes a lot of sense in general since that keeps the image small for transferring to/from a registry. If the user would want to have the cache in the dev container, maybe that could be bind/volume mounted or copied when creating the container.

I see the other type would be "bind", are there specific usages for that?

@shikanime
Copy link
Author

In the build context, bind is primarily used to manage short-lived configuration files such as the source directory, requirements.txt, go.mod/go.sum, or Cargo.toml/Cargo.lock. These files typically do not need to persist beyond the RUN operation, making bind a suitable option for handling them.

While integrating bind into the Devcontainer environment is not strictly necessary as the current implementation already mounts the feature directory during the build, it could provide flexibility for use cases such as configuring OpenSSL by mounting a configuration to /etc/ssl/openssl.cnf. However, the use case remains relatively niche. On the other hand, ssh and secret are essential tools for managing private source repositories.

Anyway, implementing it as a generic specification like the devcontainer spec for mounts would unlock all features. Although I don't have the source code of the VSCode extension to dig into how the devcontainer artifacts generation is done.

@chrmarti
Copy link
Contributor

chrmarti commented Dec 4, 2023

This appears to be a BuildKit feature. Docker Desktop comes with BuildKit preinstalled. The only issue might be Linux installs where BuildKit comes in a separate package (e.g., docker-buildx-plugin in Debian/Ubuntu) and might not be installed by default. We could make it mandatory though, this would also allow us to remove some compatibility code that deals with installs missing BuildKit. Podman is using buildah which seems to support cache mounts.

@chrmarti
Copy link
Contributor

chrmarti commented Dec 4, 2023

According to the docs APT requires sharing=locked and a config change to make this work: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/reference.md#example-cache-apt-packages

The example from the above link:

# syntax=docker/dockerfile:1
FROM ubuntu
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
  --mount=type=cache,target=/var/lib/apt,sharing=locked \
  apt update && apt-get --no-install-recommends install -y gcc

@chrmarti
Copy link
Contributor

chrmarti commented Dec 5, 2023

A few more things to think about:

  • Looking through the devcontainers/features repo I see many apt-get update and rm -rf /var/lib/apt/lists/* commands. Ideally we would run apt-get update only once (and again after adding more sources) during an image build.
  • If one feature removes docker-clean and adds the above keep-cache other features unaware of this and running without the cache mount will start accumulating package archives in the image.

We could try to address these in each feature using the cache mounts:

  • Check if apt-get update is needed first, e.g.: if [ -z "$(find /var/lib/apt/lists -type f -mmin -60)" ]; then apt-get update; fi (Checks if there are recently updated files in the package lists.)
  • Restore docker-clean and remove keep-cache at the end.

The advantage of having each participating feature deal with these would be the simplicity of the proposal.

@chrmarti
Copy link
Contributor

chrmarti commented Dec 7, 2023

Another option to speed up APT specifically is to configure a caching proxy like https://wiki.debian.org/AptCacherNg. This might be possible in a way that is transparent to features and without amending the spec.

@shikanime
Copy link
Author

shikanime commented Dec 18, 2023

This is something I was also thinking about, similarly in the nix ecosystem we have cachix which is a remote build cache and bazel also has his own. At some point this spec could evolve into something akin to a Pod with sidecar containers, or using layering of compose spec if we want to fully realize this vision. We can be quite inspired by how the WSL team is handling shared subsystems such as having wslg on the side but that might be a little broader than just the scope of this simple discussion.

@schlich
Copy link

schlich commented Feb 13, 2024

this is all really great stuff. would you mind elaborating on this part more @shikanime ?

While this proposal addresses the integration of caching mechanisms for devcontainer builds, it doesn't encompass solutions for user relative cache directories like local $HOME/.cache/pip directories under user home paths. It primarily solve global caching mechanisms, such as /var/cache.

are absolute paths a firm requirement by docker or something?

@shikanime
Copy link
Author

shikanime commented Feb 21, 2024

Docker itself does have an absolute path requirement on the target side. Devcontainer spec allows users to be switched within the container, but the timing and mechanism remain unclear. It's uncertain when, where, and how this user creation and switching occurs, but I have my doubt that this is during the runtime lifecycle of the container, therefore after the docker build steps. Also, I believe there are a few places in the spec that allow the use of certain variables, but the implementation details aren't really clear to me, I think there's a dockerfile that's templated behind the scene, so maybe having relative path is not an issue ?

@schlich
Copy link

schlich commented Feb 21, 2024

Stray thoughts, may or may not be relevant or helpful.

Following up on your nix-adjacent lines of thought,I know Arch Linux's big thing is their rolling upgrades. I wonder if there's lessons to be applied here for the apt-get update problem.

I think i have a decent idea of the lifecycle timelines. Nothing we can't brute force with some logging anyways.

I wonder how we might utilize the XDG Base Directory specs?

@ahjulstad
Copy link

I don't know if it is relevant, but I am using an external volume mount in a docker compose file to share the ~/.julia folder between my devcontainers. As first run-time startup takes time in Julia (due to on-demand native code compilation) this has tremendous benefits.

Now if there was a way to make this work when the external mount is not present (like in a github codespace)....

Perhaps ugly, but useful for me.

services:
  devcontainer:
    image: mcr.microsoft.com/devcontainers/base:bullseye
    volumes:
      - ../..:/workspaces:cached
      - dotjulia:/home/vscode/.julia
    command: sleep infinity

volumes:
  dotjulia:
    external: true 
    name: dotjulia
{
	"dockerComposeFile": "docker-compose.yml",
	"service": "devcontainer",
	"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
	"name": "Debian",
	"features": {
		"ghcr.io/julialang/devcontainer-features/julia:1": {
			"channel": "release"
		}
	}
}

@HernandoR
Copy link

any update or current workaround for this?

@chrmarti chrmarti added the enhancement New feature or request label Dec 17, 2024
@301166
Copy link

301166 commented Dec 17, 2024 via email

@ruffsl
Copy link

ruffsl commented Dec 17, 2024

any update or current workaround for this?

@HernandoR , I've been using the init life cycle script to invoke my own Buildx bake command with great success. I just fix the image tag name in the dev container config, and then have the bake command build for that tag before the dev container is created. I like it than I can fully codify the bake process for the dev container base image as a standalone file that even our CI can pre build if needed, or include arbitrary cache mount or backend arguments like for S3 or image registries to speed up building locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants
@ruffsl @ahjulstad @chrmarti @schlich @shikanime @HernandoR @301166 and others