Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for automatic content discovery for cross-mounting blobs #275

Merged
merged 3 commits into from
Jun 23, 2021

Conversation

sargun
Copy link
Contributor

@sargun sargun commented May 7, 2021

When uploading to multiple registries, the user may or may not what
other repositories exist in these registries. Therefore, a client may
perform an unnecessary upload when the registry already has a given
blob. This an optimization that allows the registry to perform the
authz check and check if it can find the blob with a given the passed
digest in its blobstore. If that blob is accessible (from an authz
perspective) to the user, it can then perform the mount automatically
on its behalf.

Because there is a potential a timing attack that could be used to
disclose knowledge of whether or not the registry has a given blob
(for example, a vulnerable version of a Linux image), this an
optional feature for registries to implement.

Signed-off-by: Sargun Dhillon sargun@sargun.me

@sargun
Copy link
Contributor Author

sargun commented May 7, 2021

This follows up on the discussion in #236.

spec.md Outdated Show resolved Hide resolved
@sargun sargun force-pushed the dedupe-uploads branch from 059b337 to 195aa01 Compare May 7, 2021 20:51
The registry MAY treat the `from` parameter as optional.

It MAY cross-mount the blob referred to by the `mount` parameter even if the `from` parameter is omitted or the `from` repository does not contain the blob.

Alternatively, if a registry does not support cross-repository mounting or is unable to mount the requested blob,
it SHOULD return a `202`. This indicates that the upload session has begun and that the client MAY proceed with the upload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify the 202 response will also include a Location header for uploading the blob?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's unrelated to this change, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of the "else" path we hit, when the blob isn't found. But yes, to your point, that path isn't being changed with this PR, so we could clarify that in a separate PR.

@sargun
Copy link
Contributor Author

sargun commented Jun 12, 2021

Okay, this has like 80% of the work required: https://github.com/sargun/distribution/tree/cross-repo-mount

The configurations aren't user exposed, but the inner guts are wired up.

@sargun
Copy link
Contributor Author

sargun commented Jun 12, 2021

I changed the proposal / spec ever so slightly, and removed the sentence that said even if the blob wasn't found in the from repo, it could cross-mount from another repo. This turns out to be a pretty big headache.

@jonjohnsonjr
Copy link
Contributor

removed the sentence that said even if the blob wasn't found in the from repo, it could cross-mount from another repo. This turns out to be a pretty big headache.

How is this semantically different?

@sargun
Copy link
Contributor Author

sargun commented Jun 14, 2021

@jonjohnsonjr If there are many repos with the same content, and the registry has the concept of "linking" inside of it, and it exposes this information in any way (which, as it turns out, the "reference" docker distribution does), it's confusing.

@jonjohnsonjr
Copy link
Contributor

which, as it turns out, the "reference" docker distribution does

I'd be surprised that mounting a blob would interact with references, but also I don't think that I would predicate this PR on its interactions with a fork of distribution.

@sargun
Copy link
Contributor Author

sargun commented Jun 14, 2021

er, I don't mean "Reference" as in manifest references, I mean, the Docker Distribution implementation being the "reference implementation".

sargun added a commit to sargun/distribution that referenced this pull request Jun 14, 2021
As described in opencontainers/distribution-spec#275,
this adds, and allows for "automatic content discovery" on mount if the from
field is ommitted. This implementation does not provide and authz functionality,
so it should be only be used / implemented, if the registry has no cross-repo
authz.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
@@ -372,6 +372,8 @@ The Location header will contain the registry URL to access the accepted layer f
header returns the canonical digest of the uploaded blob which MAY differ from the provided digest. Most clients MAY
ignore the value but if it is used, the client SHOULD verify the value against the uploaded blob data.

The registry MAY treat the `from` parameter as optional, and it MAY cross-mount the blob if it can be found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change if it can be found to if it can be found and can be accessed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me, that ACL implication is included in can be found, because ideally if it can not be accessed then it should be found. Or is there another use case you're thinking about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is outside of the specification, I would suggest that we indicate that it is not a good idea to implement this feature on any registry with cross-repo ACLs due to the possibility of information disclosure. I do not think found implies any notion of "access" in the sense of "Security"

So, I would rather say something like:
"If the registry does not implement a security model which allows for attenuation of access to reading blobs, it MAY treat the from parameter as optional, and it MAY cross-mount the blob if it can be found."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on adding more to wording here. While we don't define an ACL model, it is a potential pitfall with a very small change that is worth calling out. The updated language is reasonable, but I think it can be simplified some to just mentioned the read access. Like @jonjohnsonjr brought it, it could be that the registry only searches in a list of a blob that are known public and therefore able to be read by anyone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could leverage this for MCR/ACR as well and for avoiding pushing these public well known layers multiple times.

@sargun
Copy link
Contributor Author

sargun commented Jun 17, 2021

I added two FAQ entries explaining the situation slightly. Once this gets merged, I'll add a conformance test to make sure that old registries properly handle the behaviour that when from is omitted they still try to start the upload.

@sargun sargun force-pushed the dedupe-uploads branch 3 times, most recently from d81969d to 18d9675 Compare June 18, 2021 19:03
@sargun sargun requested review from vbatts and dmcgowan June 21, 2021 18:25
Copy link
Contributor

@jonjohnsonjr jonjohnsonjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that you're not the original author of the things you're modifying, but it would be nice to follow one sentence per line, as per https://github.com/opencontainers/distribution-spec#markdown-style

FAQ.md Outdated

**Q: How are clients expected to adopt (and probe for) automatic content discovery support?**

The process of mounting a blob is supposed to fail in such a way that if a blob cannot be cross-mounted, the registry is supposed to initiate an upload. Clients should try to use the automatic content discovery mechanism. Non-conformant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The process of mounting a blob is supposed to fail in such a way that if a blob cannot be cross-mounted, the registry is supposed to initiate an upload. Clients should try to use the automatic content discovery mechanism. Non-conformant
The process of mounting a blob is supposed to fail in such a way that if a blob cannot be cross-mounted, the registry initiates an upload. Clients should try to use the automatic content discovery mechanism. Nonconformant

FAQ.md Outdated
**Q: How are clients expected to adopt (and probe for) automatic content discovery support?**

The process of mounting a blob is supposed to fail in such a way that if a blob cannot be cross-mounted, the registry is supposed to initiate an upload. Clients should try to use the automatic content discovery mechanism. Non-conformant
registries may return a non-201 or non-202 error code. If the client is trying to be defensive to non-complaint registries, and receives a non-201 or non-202 error code, it should fall back to other methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
registries may return a non-201 or non-202 error code. If the client is trying to be defensive to non-complaint registries, and receives a non-201 or non-202 error code, it should fall back to other methods.
registries may return a non-201 or non-202 error code. If the client is trying to be defensive to nonconformant registries, and receives a non-201 or non-202 error code, it should fall back to other methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "fall back to other methods", maybe link to https://github.com/opencontainers/distribution-spec/blob/main/spec.md#pushing-blobs

FAQ.md Outdated

**Q: How come `from` is required on cross-repo mount for some registries?**

Mounting without having to specify `from`, also known as automatic content discovery, requires the registry to determine whether or not a blob exists in a repository. If the existence check for the blob is done first, an immediate failure will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also known as automatic content discovery

This is kind of confusing because we've already called tag listing "Content Discovery" here: https://github.com/opencontainers/distribution-spec/blob/main/spec.md#content-discovery

And "automatic content discovery" doesn't really fit into that theme. We're also less interested in the discovery part than the "mounting" part.

I don't have a great suggestion as an alternative, but we often refer to this as "short-circuiting" an upload internally. Maybe "automatic blob mounting" or something could work, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about
"Automatic mount origin discovery"

When uploading to multiple registries, the user may or may not what
other repositories exist in these registries. Therefore, a client may
perform an unnecessary upload when the registry already has a given
blob. This an optimization that allows the registry to perform the
authz check and check if it can find the blob with a given the passed
digest in its blobstore. If that blob is accessible (from an authz
perspective) to the user, it can then perform the mount automatically
on its behalf.

Because there is a potential a timing attack that could be used to
disclose knowledge of whether or not the registry has a given blob
(for example, a vulnerable version of a Linux image), this an
optional feature for registries to implement.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
sargun added 2 commits June 22, 2021 08:32
Although the specification states that registries should start an upload on a
failed mount, certain registry implementations may not properly implement
this behaviour. This explicitly states that clients MAY adopt the automatic
content discovery behaviour immediately, but MAY want to be defensive
to non-compliant registries.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Automatic content discovery creates an information disclosure risk. There
are a variety of mitigations to this risk, for example, storing knowledge
of public, cached layers elsewhere, or performing the authz check to
determine all of the blobs the user has access to prior to performing
the mount check.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
@sargun
Copy link
Contributor Author

sargun commented Jun 22, 2021

@jonjohnsonjr Updated based on your feedback

Copy link
Member

@dmcgowan dmcgowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Before merging, I would like to milestone issues into 1.1 for adding related conformance tests and client/registry verification

@sargun
Copy link
Contributor Author

sargun commented Jun 22, 2021

Issues:

#281
#282

Need someone to set the milestones.

Copy link
Member

@vbatts vbatts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦑

@vbatts vbatts merged commit 94fb011 into opencontainers:main Jun 23, 2021
sargun added a commit to sargun/distribution-spec that referenced this pull request Sep 15, 2021
In opencontainers#281, we stated
that cross-repo mounts without from should start an upload. This is a
requirement to make it so that clients can adopt the changes in
opencontainers#275 smoothly.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
sargun added a commit to sargun/distribution-spec that referenced this pull request Sep 15, 2021
In opencontainers#281, we stated
that cross-repo mounts without from should start an upload. This is a
requirement to make it so that clients can adopt the changes in
opencontainers#275 smoothly.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
sargun added a commit to sargun/distribution that referenced this pull request Sep 16, 2021
As described in opencontainers/distribution-spec#275,
this adds, and allows for "automatic content discovery" on mount if the from
field is ommitted. This implementation does not provide and authz functionality,
so it should be only be used / implemented, if the registry has no cross-repo
authz.
sargun added a commit to sargun/distribution that referenced this pull request Sep 16, 2021
As described in opencontainers/distribution-spec#275,
this adds, and allows for "automatic content discovery" on mount if the from
field is ommitted. This implementation does not provide and authz functionality,
so it should be only be used / implemented, if the registry has no cross-repo
authz.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
sargun added a commit to sargun/distribution that referenced this pull request Sep 16, 2021
As described in opencontainers/distribution-spec#275,
this adds, and allows for "automatic content discovery" on mount if the from
field is ommitted. This implementation does not provide and authz functionality,
so it should be only be used / implemented, if the registry has no cross-repo
authz.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
sargun added a commit to sargun/distribution that referenced this pull request Sep 20, 2021
As described in opencontainers/distribution-spec#275,
this adds, and allows for "automatic content discovery" on mount if the from
field is ommitted. This implementation does not provide and authz functionality,
so it should be only be used / implemented, if the registry has no cross-repo
authz.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
sargun added a commit to sargun/distribution that referenced this pull request Sep 20, 2021
As described in opencontainers/distribution-spec#275,
this adds, and allows for "automatic content discovery" on mount if the from
field is ommitted. This implementation does not provide and authz functionality,
so it should be only be used / implemented, if the registry has no cross-repo
authz.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
@sudo-bmitch sudo-bmitch mentioned this pull request Feb 16, 2022
@jdolitsky jdolitsky added this to the v1.1.0 milestone Feb 17, 2022
@jdolitsky jdolitsky mentioned this pull request Sep 15, 2022
@sudo-bmitch sudo-bmitch mentioned this pull request Feb 1, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants