Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecOp: update content-cache mount logic #4624

Merged
merged 4 commits into from
Feb 7, 2024

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented Feb 5, 2024

See discussion in Slack: https://dockercommunity.slack.com/archives/C7S7A40MP/p1706711391769259

This is split into 4 commits for readability:

  • exec: refactor content-based cache detection
  • exec: allow content-cache for root selected mounts (with @sipsma for coming up with the idea)
    • Allows for root-selected mounts to be content-cached
  • exec: allow caller-controlled content-based cache
    • Allows LLB-controlled cache fields to dictate content-based cache. Each mount can be enabled/disabled/left at the default. Some mounts cannot have it enabled for safety reasons, and this should fail.
  • test: add new content-cache exec mount tests
    • This adds detailed tests for all the old+new behavior, explicitly listing all the cases we really actually care about.

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires a new cap as well.

solver/llbsolver/ops/exec.go Show resolved Hide resolved
jedevc and others added 4 commits February 6, 2024 11:18
This refactors the content-based cache to be that little bit tidier. In
addition to adding comments that explain *why* we're even bothering,
this restructures the code to avoid being unclear.

To explain the changes in a little more detail (since it's not
abundantly clear why this translation is valid), the initial condition
looks like:

	if (!m.Readonly || m.Dest == pb.RootMount) && m.Output != -1 {
		deps[m.Input].NoContentBasedHash = true

We can apply De Morgan's law recursively to invert the condition and the
result:

	deps[m.Input].NoContentBasedHash = true
	if (m.Readonly && m.Dest != pb.RootMount) || m.Output == -1 {
		deps[m.Input].NoContentBasedHash = false

With all the juggling of NoContentBasedCache, we invert the variable
name to be ContentBasedCache (and invert everywhere it's used as well):

	if (m.Readonly && m.Dest != pb.RootMount) || m.Output == -1 {
		deps[m.Input].ContentBasedHash = true

This reads a bit easier, but now we split this into two separate
branches for readability (and so we can comment each one in more detail
separately):

	if m.Readonly && m.Dest != pb.RootMount {
		deps[m.Input].ContentBasedHash = true
	}
	if m.Output == -1 {
		deps[m.Input].ContentBasedHash = true
	}

While this has been the behavior for ages, I think it makes sense to
deliberately this behavior slightly. It doesn't make sense to me that we
should only disallow read-only root mounts, but no-output root mounts
are allowed - there's no reason these shouldn't behave identically, by
splitting these out:

	if m.Readonly {
		deps[m.Input].ContentBasedHash = true
	}
	if m.Output == -1 {
		deps[m.Input].ContentBasedHash = true
	}

	if m.Dest == pb.RootMount {
		deps[m.Input].ContentBasedHash = false
	}

There is a small chance that this is a breaking change for some users,
however, 1. SkipOutput (-1) is very rare and not often used in the wild
(except for dockerfiles, where it's only used for non-root mounts), and 2.
will only cause a cache miss.

Signed-off-by: Justin Chadwell <me@jedevc.com>
These mounts are actually safe, as suggested by Erik on slack:

> Is it correct that this wouldn’t be a problem in the case where the
> selector of the mount is just “/“? Because then there’s no “hidden”
> files.
>
> If so, maybe there’s a path to enabling content cache for rw mounts
> that are from “/“. Then you could also get the same end behavior by
> not using selectors and instead always copying the subdir you want to
> mount to scratch.

This patch adds a check for this case, and explicitly enables
content-based cache for these cases.

Co-authored-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Justin Chadwell <me@jedevc.com>
This allows LLB-directed content-based cache enablement for each mount.

Some mounts may not be explicitly unabled (because it would be unsafe) -
for these cases we explicitly error out.

Signed-off-by: Justin Chadwell <me@jedevc.com>
These test all of the new behavior:

- Checks for old default no content cache
- Checks for old read-only and no-output allowed content cache
- Checks for new root selector allowed content cache
- Checks for new caller options that allow enabling/disabling it

Signed-off-by: Justin Chadwell <me@jedevc.com>
@jedevc jedevc force-pushed the control-mount-content-cache branch from fa2435c to ed2efe3 Compare February 6, 2024 11:44
@jedevc
Copy link
Member Author

jedevc commented Feb 6, 2024

I think this requires a new cap as well.

I think this should do the job? https://github.com/jedevc/buildkit/blob/control-mount-content-cache/solver/pb/caps.go#L60

@tonistiigi tonistiigi merged commit aa8d9ef into moby:master Feb 7, 2024
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WithMountedDirectory/WithMountedFile does not use content-based cache in withExec
2 participants