Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HEAD endpoints #207

Closed
jdolitsky opened this issue Oct 28, 2020 · 19 comments · Fixed by #208
Closed

Add HEAD endpoints #207

jdolitsky opened this issue Oct 28, 2020 · 19 comments · Fixed by #208
Milestone

Comments

@jdolitsky
Copy link
Member

jdolitsky commented Oct 28, 2020

https://docs.docker.com/registry/spec/api/#existing-manifests

"Docker’s use of HEAD is also has a Docker-Content-Digest in the response headers so you can HEAD against a tag to determine the digest that the tag currently points to."

@jdolitsky jdolitsky added this to the v1.0.0-rc2 milestone Oct 28, 2020
@dmcgowan
Copy link
Member

What exactly is the addition recommended here? HEAD is already well defined as being the same as GET without the body and shouldn't be considered separate endpoints. Is the point to highlight the use of Docker-Content-Digest?

I am +1 for calling out explicitly that HEAD should be supported though and having that in conformance tests. There have been issues in the past of registries returning different response code or headers for HEAD.

@jdolitsky
Copy link
Member Author

There was some convo yesterday about this - containerd uses HEAD and falls back to GET. So the plan is to add to conformance, and also add it to the endpoints table

@simskij
Copy link

simskij commented Nov 5, 2020

Given DockerHub's recent implementation of rate limits, an interesting issue surfaced around the behavior of HEAD. If you do a HEAD request against a manifest-list, the Docker-Content-Digest returned is not for the actual manifest list but for the resolved manifest depending on os and arch. In the case of DockerHub at least, this seems to default to linux/amd64 unless overridden.

Docker, and I assume other container environments as well, store the manifest list's digest as the actual image digest, without resolving the digest of the manifest it's redirected t, it would probably make sense if a HEAD request against a manifest list would return the digest of that list, matching what's being stored locally.

There is already a way around this by supplying a "Accept:application/vnd.docker.distribution.manifest.list.v2+json" header to the request, however, this is not currently done. So, my suggestion basically comes down to making application/vnd.docker.distribution.manifest.list.v2+json the default Content-Type for head requests.

I get that the ideal thing here would be to change the digests stored by docker-engine, but given the wide adoption of docker-ce, and the wide array of versions being used, any changes like that would probably take years to propagate fully into the wild.

Some repros using DockerHub. I appreciate any challenges to my understanding as I may very well have gotten it wrong.

$ docker inspect containrrr/watchtower
...
"RepoDigests": [
            "containrrr/watchtower@sha256:d0331edc5b1c5bbf18a92fc27c50f32e8bb894cf67a06cbd33d04eb40d5c8cc2"
        ],
...

Curl response with the header added:

$ curl -v -H "Authorization: Bearer $TOKEN" -H "Accept:application/vnd.docker.distribution.manifest.list.v2+json" https://registry-1.docker.io/v2/containrrr/watchtower/manifests/latest 2>&1 | grep Digest
< Docker-Content-Digest: sha256:d0331edc5b1c5bbf18a92fc27c50f32e8bb894cf67a06cbd33d04eb40d5c8cc2

and without, defaulting to application/vnd.docker.distribution.manifest.v1+prettyjws

$ curl -v -H "Authorization: Bearer $TOKEN" https://registry-1.docker.io/v2/containrrr/watchtower/manifests/latest 2>&1 | grep Digest
< Docker-Content-Digest: sha256:e24b7c874f4e514676a3aeca424ed55a03d32ea1095925d8e37f910bfea3d782

The result of this currently seems to be that all HEAD requests against multi-arch/manifest-lists are considered to indicate that the digest has changed, leading to an actual GET of the full manifest.

@amouat
Copy link
Contributor

amouat commented Nov 5, 2020

The current draft of the OCI spec has removed mention of the Docker-Content-Digestheader completely. As this enables some useful workflows, I think it would be good to add it back and also to the conformance tests (I only realised this week that Trow doesn't set it currently).

I understand we probably want to remove the word Docker, but perhaps that can wait until later?

@thaJeztah
Copy link
Member

I would expect the specs to describe a Content-Digest header (without the Docker prefix), but allow registries to return a Docker-Content-Digest header for backward compatibility with older clients (not sure if that would be part of the spec, as "any other header" likely is allowed).

If it does describe the Docker prefixed one, it should define that clients that consume the Content-Digest (I'd gather using it would always be optional), that it MUST prefer Content-Digest over Docker-Content-Digest (i.e., "ignore" Docker-Content-Digest if both are present).

@amouat
Copy link
Contributor

amouat commented Nov 5, 2020

Thanks @thaJeztah - to be honest I wanted to say the same and gave up trying to word it correctly! 🤦

@jdolitsky
Copy link
Member Author

Hi all - this conversation has split into two separate threads. Please see the conclusions here: #208 (comment)

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Nov 11, 2020

Given DockerHub's recent implementation of rate limits, an interesting issue surfaced around the behavior of HEAD. If you do a HEAD request against a manifest-list, the Docker-Content-Digest returned is not for the actual manifest list but for the resolved manifest depending on os and arch. In the case of DockerHub at least, this seems to default to linux/amd64 unless overridden.

Even worse than that, it's down-converted to a schema 1 image.

Even more worse, there used to be a bug where you'd see the manifest list digest even if it returned a schema 2 image (based on your accept headers). That's luckily now been fixed: distribution/distribution#2395

There is already a way around this by supplying a "Accept:application/vnd.docker.distribution.manifest.list.v2+json" header to the request, however, this is not currently done. So, my suggestion basically comes down to making application/vnd.docker.distribution.manifest.list.v2+json the default Content-Type for head requests.

I think you mean Accept instead of Content-Type, but that only makes sense if the manifest on the other side is a manifest list. Clients should supply a list of content types that they support in the Accept header, which is covered here: https://docs.docker.com/registry/spec/api/#pulling-an-image-manifest

I am really confused about why half of the registry spec was thrown away -- this stuff is vital to a correct implementation. (sorry)

We should make sure we haven't left out any other details like this. It should be possible to produce a client that works against existing registries just from reading the spec (especially docker hub).

@jdolitsky
Copy link
Member Author

Hey @jonjohnsonjr - in response to

I am really confused about why half of the registry spec was thrown away -- this stuff is vital to a correct implementation.

We can use your knowledge and experience here. The goal is to make things digestible to newcomers to this specification. If things were left out that you view to be vital, we are probably missing some context.

Please reach out via slack/email, I'd like to setup some time to meet and address any concerns.

@jonjohnsonjr
Copy link
Contributor

Apologies for that last remark -- I am genuinely confused here, and didn't intend for that to come off so rudely. I know this is a lot of work and it's challenging to tackle a document of this size, so please don't take it personally :)

I'll find you on slack to follow up.

@simskij
Copy link

simskij commented Nov 12, 2020

I think you mean Accept instead of Content-Type, but that only makes sense if the manifest on the other side is a manifest list. Clients should supply a list of content types that they support in the Accept header, which is covered here:

No, I mean Content-Type. I'm saying that, if it is a manifest list, and no Accept has been provided, it would make a lot more sense if it defaulted to returning the manifest list rather than picking a manifest (seems like it defaults to linux/amd64) and return that.

The behavior is not obvious and kind of counter-intuitive currently, as it's not returning the type that matches my request the closest, in this case: a tag that happens to point at a manifest list.

@thaJeztah
Copy link
Member

In the case of DockerHub at least, this seems to default to linux/amd64 unless overridden.

Even worse than that, it's down-converted to a schema 1 image.

For the Docker Hub case, both have been done to remain backward-compatible;

  • default to linux/amd64, as it was the only OS and Architecture originally supported (and thus, would be the default image format selected)
  • serving v1 manifests, to keep backward compatibility with old clients which would not send (appropriate) Accept headers.

No, I mean Content-Type. I'm saying that, if it is a manifest list, and no Accept has been provided, it would make a lot more sense if it defaulted to returning the manifest list rather than picking a manifest (seems like it defaults to linux/amd64) and return that.

@simskij see my comment above; for the Docker Hub case, this is done to remain backward compatible. I also was reading through the HTTP RFCs for content negotiation Yesterday (more below), and both variants are "valid". However, in case of the "no Accept" header, and if a server decides not to do active content-negotiation and return a manifest-list, a 406 status response could possibly be more appropriate. That said, perhaps the Accept header For v2-capable clients should be required?

Testing some variations against Docker Hub, here's how it currently handles content-negotiation:

✅ For a multi-arch repository, specifying multiple Accept headers returns the manifest list (active content-negotiation; leaving it up to the client to pick the best-matching variant);

export token="$(curl -fsSL "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/hello-world:pull" | jq --raw-output '.token')";

curl -X HEAD -I -fsSL -H "Authorization: Bearer $token" \
    -H 'Accept: application/vnd.docker.distribution.manifest.v2+json' \
    -H 'Accept: application/vnd.docker.distribution.manifest.list.v2+json' \
    -H 'Accept: application/vnd.docker.distribution.manifest.v1+json' \
    "https://registry-1.docker.io/v2/library/hello-world/manifests/latest"

HTTP/1.1 200 OK
Content-Type: application/vnd.docker.distribution.manifest.list.v2+json
...

✅ Using Accept with only v1 manifest, returns the v1 manifest (same as when no Accept header is set) (again, "best match"):

export token="$(curl -fsSL "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/hello-world:pull" | jq --raw-output '.token')";

curl -X HEAD -I -fsSL -H "Authorization: Bearer $token" \
    -H 'Accept: application/vnd.docker.distribution.manifest.v1+json' \
    "https://registry-1.docker.io/v2/library/hello-world/manifests/latest"

HTTP/1.1 200 OK
Content-Type: application/vnd.docker.distribution.manifest.v1+prettyjws
...

✅ On a single-arch repository (armhf/hello-world:latest), specifying multiple Accept headers returns the v2 manifest (best match for the given Accept headers):

export token="$(curl -fsSL "https://auth.docker.io/token?service=registry.docker.io&scope=repository:armhf/hello-world:pull" | jq --raw-output '.token')";

curl -X HEAD -I -fsSL -H "Authorization: Bearer $token" \
    -H 'Accept: application/vnd.docker.distribution.manifest.v2+json' \
    -H 'Accept: application/vnd.docker.distribution.manifest.list.v2+json' \
    -H 'Accept: application/vnd.docker.distribution.manifest.v1+json' \
    "https://registry-1.docker.io/v2/armhf/hello-world/manifests/latest"

HTTP/1.1 200 OK
Content-Type: application/vnd.docker.distribution.manifest.v2+json
...

⚠️ Using Accept with only v2 manifest-list, on a single-arch repository returns the v1 manifest

This is one is odd, and feels like a bug/omission. Based on the presence of a v2 Accept header, the registry should probably better return the v2 image manifest (or generate a manifest list on-the-fly):

export token="$(curl -fsSL "https://auth.docker.io/token?service=registry.docker.io&scope=repository:armhf/hello-world:pull" | jq --raw-output '.token')";

curl -X HEAD -I -fsSL -H "Authorization: Bearer $token" \
    -H 'Accept: application/vnd.docker.distribution.manifest.list.v2+json' \
    "https://registry-1.docker.io/v2/armhf/hello-world/manifests/latest"

HTTP/1.1 200 OK
Content-Type: application/vnd.docker.distribution.manifest.v1+prettyjws
...

However, content-negotiation is not described in the specification currently, and because of that would currently be up to each implementation how to handle Accept headers. From looking at the responses, the current implementation (Docker Hub, and the open source registry from https://github.com/docker/distribution) also looks incomplete, as the registry does not return a Vary header (rfc7231, section 7.14) (which I think it should ).

Docker Hub (and other registries that existed before introduction of the v2 schemas) may be in a slightly special position as they have to preserve backward compatibility for that reason. Other registries could make other choices when dealing with active (server-side) content-negation.

That said, I do think it would be good to have content-negotiation added to the specs, but as OPTIONAL. The reason for making it optional is that it should be possible to host a static registry (which would not be able to perform active content-negotiation). In situations where no active content-negotiation is performed by the registry, it should be handled by the client (in case of a manifest-list, the client is responsible for picking the right variant from the list).

I started writing up a draft / some ideas Yesterday, and will try to post an initial proposal for discussion (also to perform active content-negotiation for multi-arch manifests, which could benefit (e.g.) ARM architectures, where a client could be able to support multiple (v5/v6/v7/v8) variants: allowing the client to set a list of accepted architectures, with priority would save extra roundtrips to fetch the manifest list and pick the variant on the client side).

@amouat
Copy link
Contributor

amouat commented Nov 12, 2020

@thaJeztah are there still active clients that only work with v1 manifests? I guess they belong to bespoke CI/CD systems and the like? It would simplify life if we could drop v1 support.

I would agree that we should a section on content-type negotiation to the specification. It's a bit frustrating though, as the way you've worded it will result in different registries returning significantly different results for the same requests (as I guess happens at the minute with the Docker Hub).

@thaJeztah
Copy link
Member

@amouat I don't have numbers at hand, but with billions of pulls, we definitely get old (or plain "weird") clients that connect.

It's a bit frustrating though, as the way you've worded it will result in different registries returning significantly different results for the same requests (as I guess happens at the minute with the Docker Hub).

I wonder if this can be avoided. Today's v2 will be tomorrow's v1. Content-Negotiation can help such transitions, and clients should indicate what content-types they can accept. Returning the "oldest" acceptable format if no Accept header is sent, thus if the client is non-specific (IMO) is the best option.

@amouat
Copy link
Contributor

amouat commented Nov 12, 2020

Personally, I'd rather never return a v1 manifest. I actually removed all support for v1 from Trow. Supporting v1 manifests is a lot more work than v2. One option might be to refuse v1 uploads but automatically convert v2 manifests to v1 if requested.

@thaJeztah
Copy link
Member

One option might be to refuse v1 uploads but automatically convert v2 manifests to v1 if requested.

Actually, I may be mixing up v2, v1, and v2 schema1 (I always get confused by those); I think v1 has already been removed (https://www.docker.com/blog/docker-hub-deprecation-1-5/), but v2 schema1 is still "supported" by Hub (current versions of docker will produce a warning (19.03) or error (20.10) to recommend users to pull the schema 2 v1 manifest, and push as schema 2 v2) (moby/moby#39365, moby/moby#41295); docker 20.10 will produce;

DEPRECATED] support for pushing manifest v2 schema1 images has been removed. More information at https://docs.docker.com/registry/spec/deprecated-schema-v1/

I think the schema 2 v1 manifests are auto-generated by Hub though (again, would have to check)

@thaJeztah
Copy link
Member

Cleaned up my draft a bit, and posted it as #212

@amouat
Copy link
Contributor

amouat commented Nov 12, 2020

Yes @thaJeztah - I was confused as well and also talking about schema 2 v1 (which is vastly different to v2).

@jonjohnsonjr
Copy link
Contributor

No, I mean Content-Type. I'm saying that, if it is a manifest list, and no Accept has been provided, it would make a lot more sense if it defaulted to returning the manifest list rather than picking a manifest (seems like it defaults to linux/amd64) and return that.

The behavior is not obvious and kind of counter-intuitive currently, as it's not returning the type that matches my request the closest, in this case: a tag that happens to point at a manifest list.

Ah yeah, I see what you mean. That makes sense for this spec in isolation, but unfortunately (as @thaJeztah described in great detail) the Accept header was used to ease the transition from v2 schema 1 images to v2 schema 2 images, so just dropping this completely would break some backwards compatibility with older clients.

FWIW, the way we "solved" this in GCR was to do what you expect if clients send * or */* in the accept header. This resolved all of our complains about GCR being broken with curl, as curl sends Accept: */*:

$ curl -v https://gcr.io/v2/ 2>&1 | grep "Accept:"
> Accept: */*

So you get what you would expect with a manifest list:

$ curl -v https://gcr.io/v2/google-containers/debian-hyperkube-base/manifests/0.12.1 2>&1 | grep content-type
< content-type: application/vnd.docker.distribution.manifest.list.v2+json

But clients that send no Accept header at all (old docker versions) get the fallback behavior:

$ curl -v -H "Accept:" https://gcr.io/v2/google-containers/debian-hyperkube-base/manifests/0.12.1 2>&1 | grep content-type
< content-type: application/vnd.docker.distribution.manifest.v1+prettyjws

This is a somewhat janky solution because it relies on some specific client behavior, but it seems to fall within the spirit of the spec and seems to work for both curl and docker 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants