-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFP] replace catalog API functionality #22
Comments
I'd like to remove There are a couple relevant PRs in docker/distribution of the same vein that would be nice to include as well:
From the proposal, it's explicitly out of scope:
... but it would be very nice to have, eventually :) In general, I'd like to get the spec into a minimal, workable state before we start adding any features. |
With the caveat that I think all of this is out of scope for this project, I'd like to brain dump my thoughts on it so that we can maybe reach some consensus or have a plan for a proposal. Possibly, all of this could be a completely separate spec/service that many registry operators just happen to host side-by-side with their distribution-spec compliant registry. That said... Most[citation needed] registries don't implement At the very least, we should mark this as OPTIONAL. (The rest of the spec would benefit from more formal Requirements Level language, too.) Regardless of what we do here, it would be nice to have some method of indexing a registry that fits the spec's namespacing model. Being able to index the registry enables some nice projects, e.g. flagstate, grafeas. I haven't put together a formal proposal for anything yet, but some prior art to get the ball rolling: Listing RepostoriesThis +
Listing ImagesThere's currently a This + Listing Repositories +
PubsubGiven a point-in-time view of a registry, it's much more efficient to subscribe to a firehose of events than to constantly poll for changes. Many registries provide this feature. Unfortunately, none of these message formats seems compatible with each other. In an ideal world, we could standardize on some common format for registry events.
|
I am +1 to not including the |
My collection of thoughts:
|
Here's what we do for Amazon ECR:
|
This commit redefines the `_catalog` endpoint as an optional operation. Background on the issue: opencontainers#22 https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/rJ72OtZuhbc opencontainers/tob#35 opencontainers/tob#46 opencontainers/tob#50 Signed-off-by: Atlas Kerr <atlaskerr@gmail.com>
This commit redefines the `_catalog` endpoint as an optional operation. Background on the issue: opencontainers#22 https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/rJ72OtZuhbc opencontainers/tob#35 opencontainers/tob#46 opencontainers/tob#50 Signed-off-by: Atlas Kerr <atlaskerr@gmail.com>
This comment has been minimized.
This comment has been minimized.
If we are okay with keeping catalog as an optional endpoint, I think issue can be closed. |
I vote for dropping it. |
I just submitted a PR to remove the catalog completely. Hopefully that helps move the decision along a bit. |
I closed my PR since no one voted for dropping. This PR can be closed. |
I'm not sure why you say that no one voted for dropping it? It looks like a lot of people in this thread agree it should be dropped. |
I'm fine with dropping the catalog api, so long as there is agreement on replacing it with a more useful list api, such as that which was discussed above. |
@jzelinskie Sorry, I meant no one replied commented/LGTM/rejected my PR so I closed it. I figured I'd leave it up to you guys to make the change when yall were finished discussing. |
@jzelinskie Sorry it sounds kind of rude. I just mean that I'm slowing down my contributions to this project and OCI in general because I feel like I'm being kinda annoying making PRs that no one told me to make and asking questions yall have already discussed years ago. I don't want to be "that guy" lol. I'll reopen the PR if there is still interest in fully dropping catalog though. |
Folks get busy :) Thanks for the commits! |
@atlaskerr oh no Atlas! I think there is a disconnect. While there is history, your participation is good and valid. Sometimes old assumptions need to be challenged. I'm very glad for your PRs and commentary |
Thanks guys. I guess I'm overreacting. The milestone for rc-1 is Feb 1 and there is so much housekeeping I wanted to get done before then and my anxiety is through the roof haha. I'll keep motivated! |
I'll throw out two suggestions and people can pick apart why they hate it or love it. These are small, additive changes that shouldn't be too hard to get registries to adopt, and they fit the existing API model pretty well. If this is interesting to anyone, we could iterate on it in a more collaborative medium. 1.
|
For the repositories endpoint, I agree that it would need to be registry specific. Some might prefer to return all public, others that the specific user making the query is allowed to see. Just to clarify - the repositories endpoint handles being given a particular organization (e.g., library) and then return the repos under it, or no organization ( A concern with this endpoint is that it gives good reason to stress the API - people like myself that like to study software are going to scrape the heck out of it. I would say for larger servers that want to serve the endpoint and not be scraped, they could do something along the lines of what GitHub / StackOverflow does, and provide some BigQuery Table of data. Would there be a next / previous in the responses (i.e. are they paginated?) Also,would it make sense to return a random order in case people do massive scraping, we don't all hit poor postgres at the top at the same time? I missed the descriptors discussion - what is a descriptor, an image manifest with annotations? Stepping out of details for a second - what are the goals of this endpoint? From a high level, it lets people interested in studying containers (via their manifests) find them more programatically. What else? |
Yes exactly, since repositories can be nested, you would be able to walk the repositories down to leaves. This is how GCR works today, but if you grafted both of these proposals on to the
In my experience, people are already stressing the API with
Yes, see the
A descriptor is defined here. tl;dr, it's:
Exposing something like this solves two problems:
There's not currently any way to ask the registry "tell me about everything in this repo", which would be solved by using both of these endpoints together. |
On 07/03/19 22:40 +0000, jonjohnsonjr wrote:
### 1. `/v2/.../repositories/list`
This would mirror `/v2/.../tags/list`
I think this is a nice logical extension.
Further, for a provided security token context, a way to list orgs
you have access too?
|
> what are the goals of this endpoint
Exposing something like this solves two problems:
1. Discovery of images that aren't tagged (e.g. old images) and their digests.
2. Getting an index of a repository without making N + 1 requests (list tags + pull every tag).
There's not currently any way to ask the registry "tell me about everything in this repo", which would be solved by using both of these endpoints together.
👍
|
Do we define the token handshake anywhere in the spec or is that out of scope for the distribution spec?
Not sure what "orgs" would be, but if it's just the top-level repos, we could reuse this via: For the actual scopes, I would imagine something like:
This might be complicating things more but I'm trying to reuse existing patterns wherever we can. |
Adding to this list - Listing & Auth
PubSub
With the Artifact extension we are considering adding the config descriptor type as well to the payload so that consumers can filter events by artifact type. A helm chart update would be a webhook filtered by the helm type. |
Some comments:
Actually these questions were raised at the beginning but the general consensus was that these were not in the scope of distribution spec. But entity list/search is an essential part of a practical registry provider. So it sounds good to have it either in distribution spec, or some additional spec (management spec just a wild guess?) As a data point, ACR implements both the _catalog API (to be compatible with OSS docker registry), and the private set of _list API for each type of entity (repositories/manifests/tags). |
Does dockerhub support catalog? Last I checked, it didn't.
This has been my observation as well. |
Is it just me, or does this smell a little bit like GitHub or GitLab API endpoints? For example, listing repositories:
The main difference is just the use of "repos" vs "repositories" and the "list" is implied in the first. The maps to the users/:username. It's similar to how (some / all of?) Docker's APIs were integrated into the image spec, no? Or more simply, wouldn't it be really powerful if we developed a spec for these additional endpoints so that already existing version control APIs would already be compliant? In the context of Github, this would mean that GitHub pages could serve a static registry and deliver the same interactions as with a container registry. If we add content types, then with a "doc" or "license" sort of type, this would link cleanly to the files in the repo. |
The first (/repos) is a more common RESTful style API. I would vote for the first:) |
@SteveLasker that seems like an astonishing amount of scope creep for a summary 😉 My main goal here is to drop If we can get:
then we could build most of what you're proposing around that, generically. I'm hesitant to add a ton of requirements to the registry spec because that will basically guarantee that most registries won't ever fully implement it.
That's horrifying.
Pagination for tag listing is already in the spec. Something like
Agreed. Ideally, well-behaved clients would do a full-resync once and listen for registry events to keep their index up to date. This is similar to how kubernetes informers behave.
There are a few registry CLIs already. I don't think we need to create an OCI-blessed CLI, but it should be easy to write a CLI from reading the spec.
I based it on the
I think this is going to be hard to achieve and maintain, since GitHub is free to change their API arbitrarily... so it might not be a great idea to tie the spec to whatever GitHub's API happens to be right now. (I love that you got the static registry stuff working, BTW.) What does the equivalent GitLab API look like? The same? |
I think limiting yourself to GitHub Pages is not the right solution. First of all, there is a proprietary solution. Second, its usage limits (maximum size 1 Gb, monthly transfer of 100 GB, 10 updates per hour) can limit practical potential. We can have statistically generated registers in mind and I like this idea. I notice that in the case of operating system repositories, for example, APT is not uncommon, they are statistically generated (see https://github.com/krobertson/deb-s3 for apt-repository on s3, https://tylerpower.io/post/hosting-yum-repo-on-s3/ for yum-repository on s3), and updates require refreshing of register indexes. After all, the repository reads more than writes to it, so the read operation should be optimized. Thanks to the appropriate architecture in this area, operating system repositories have many mirrors ( https://www.debian.org/mirror/list ), and now - in the case of Docker - an unofficial mirror of an unofficial repository is something limited (https://docs.docker.com/registry/recipes/mirror/). I would like to draw attention to the arguments that were given in the case of abandoning one of the Linux kernel distribution protocols. |
I don’t concretely mean that it would be limited to GitHub Pages, the idea that I’m trying to get across is that there are already APIs that exist to list repositories and projects. Instead of coming up with an entirely new one, we can use features from those APIs that have already been somewhat tested and known. This would mean that an already existing API (GitHub as the example with probably billions of repos) would then conform to our new specification. Sure, they could change in the future, but the incentive to do so might change if they know that their resource is friendly to OCI. If people start building things using them two? Then I suppose we’d start to see another company/ies representation at the meetings :) |
Yeah, umm, I tend to work from a master plan approach, knowing where all the pieces could go, then scope back in incremental pieces. Starting smaller is goodness.
I had thought the same at first. But we've had developers want to deploy the latest/newest build to a dev environment. While they could pull the tag, based on a webhook, we got feedback they want an ordered tag listing. They also wanted ordered tag listings in other tools, like DevOps and App Services, where the user can choose a tag from a combo box. They wanted to get the same experience across Docker Hub and ACR. It would be great if a customer could choose from other registries they might happen to host with Azure as well.
Could be later, as long as paging has a reasonable, small page size
tag listing API to support artifact typeI forgot to include the tag listing should support listing the artifactType, enabling tools to understand what the tag represents.
Jon brought up an interesting reference to be able to list untagged manifests. There's a good discussion here, as well as possibly understanding the history of manifests a tag represented. When a stable tag of a base image is updated to reflect OS & FX patching, it's also interesting to know the previous manifest, in case a user must roll back. |
Would the new catalog/listing operation be a required or optional endpoint? |
Collecting use cases, forming workgroup here: https://hackmd.io/s/BJPAUxDvV#OCI-Catalog-Listing-API---Workgroup |
I am now looking forward to @josephschorr proposal on a pubsub event model. |
Hoping to publish it for community review within a few weeks, as holidays adds some delays :) |
Do we want to close this, and let Joey continue to make progress on the Pub/Sub model? I love what Joey is doing for the specific content update scenario. But, that's not the same for quick-hit scenarios where someone just needs to see a one-time listing of repos or tags. |
@josephschorr are you waiting on something for the pub/sub events PR? |
I would like to see pub/sub done in such a way so as to cover the one time listing of repos/tags with published updates to follow based on the subscription. Let's wait to close this till we have a resolution to the issue I think? |
@vbatts I was hoping for some more feedback on my document before I opened it |
@josephschorr ok. Let me get #111 shaped up then, and you can ready your PR |
|
… an oras repo Without the _catalog call, it is already enough to do all necessary filtering for oras package installs. Doing the _catalog call is unnecessary. As _catalog is also not part of oci specs, per opencontainers/distribution-spec#22 , the return behavior is also undefined, which can cause the install commands to fail depending on the specific implementation of the OCI registry, such as in JFrog Artifactory when combined with namespaces. Signed-off-by: Eric Chen <echen@intersystems.com>
for more rich indexing and searching of container images in a registry.
There is the
/v2/_catalog
though it still seems not clear enough for implementers.The text was updated successfully, but these errors were encountered: