Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog for V2 API Implementation #653

Merged
merged 5 commits into from
Jul 23, 2015
Merged

Conversation

pdevine
Copy link

@pdevine pdevine commented Jun 20, 2015

This change adds a basic catalog endpoint to the API, which returns a list,
or partial list, of all of the repositories contained in the registry. Calls
to this endpoint are somewhat expensive, as every call requires walking a
large part of the registry.

Instead, to maintain a list of repositories, you would first call the catalog
endpoint to get an initial list, and then use the events API to maintain
any future repositories.

ALSO.. it could probably stand to use more tests, but I wanted to make certain I was
on the right track.

Signed-off-by: Patrick Devine patrick.devine@docker.com

@pdevine pdevine force-pushed the catalog-api branch 2 times, most recently from f438ee7 to 0761c38 Compare June 20, 2015 04:46
@RichardScothern
Copy link

I'm not sure this approach will work. The catalog handler will block the connection and given a large number of repositories could cause an idle socket timeout. This could be worked around by streaming repository entires back as soon as they are encountered during Walk with http chunked encoding.

This won't make the response quicker, but will keep the connection active.

@pdevine
Copy link
Author

pdevine commented Jun 24, 2015

Yeah.. it was meant to be quick and dirty, but you're correct in that it won't work for large numbers of repositories with slow storage. What would be a reasonable amount of repositories that we would want to support?

maxEntries = maximumReturnedEntries
}

repos, err := storage.GetRepositories(ch, ch.App.driver, maxEntries, lastEntry)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough time today to give you full details, but the handlers should not be directly making calls to storage. We'll need a new API method on distribution.Namespace to provide a catalog object. Something like the following:

type Namespace interface {
    ...

    Catalog(ctx context.Context) CatalogService
}

type CatalogService interface {
    // Repositories fills p with up to len(p) entries, beginning after the provided q parameter.
    Repositories(p []string, q string)  (n int, err error)
}

May want to generalize the interpretation of q to allow more interesting queries against the catalog in the future. Note that this is reentrant, so if you have a max value of n entries, p is filled with those entries, similar to the way readers work.

@@ -361,6 +361,48 @@ var routeDescriptors = []RouteDescriptor{
},
},
{
Name: RouteNameCatalog,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than copying this from #583, just rebase this branch on the commits from that PR branch. When everything is good to go, we can merge it as a unit.

@pdevine pdevine force-pushed the catalog-api branch 3 times, most recently from ea85373 to 0e47b03 Compare July 13, 2015 22:21
@@ -466,6 +470,7 @@ func (app *App) authorized(w http.ResponseWriter, r *http.Request, context *Cont
repo := getName(context)

if app.accessController == nil {
ctxu.GetLogger(context).Debug("access controller not set")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be logged on every request.

@mattmoor
Copy link
Contributor

How do you plan to authenticate this?

Unless I'm missing something, this will either be unauthenticated, in which case the Catalog will be missing private images.

Or it will be authenticated, in which case it is unclear what scope would be passed to the authentication endpoint to retrieve a Bearer token.

@stevvooe
Copy link
Collaborator

@mattmoor The catalog API is considered an administration function. This is protected by a JWT access record with type="registry", name="catalog" and access="*" for the standard distribution/auth package. This is covered in the app.go portion of this PR. Other implementations may choose to use a more or less granular access model, but this endpoint is not meant for end users.

If you have suggestions here, they would be appreciated, but this is a little late in the design process.

@mattmoor
Copy link
Contributor

@stevvooe Thanks for the pointer, so IIUC the auth endpoint would get:
scope=registry:catalog:*

If this is not meant for end-user consumption, then maybe it should go in a separate doc covering administrative endpoints?

I am not concerned with the design, just making sure I understand what this is (and isn't).

thanks again.

// Repositories returns a lexigraphically sorted catalog given a base URL. The 'entries' slice will be filled up to the size
// of the slice, starting at the value provided in 'last'. The number of entries will be returned along with io.EOF if there
// are no more entries
func Repositories(ctx context.Context, baseURL string, entries []string, last string, transport http.RoundTripper) (int, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This client API doesn't follow the conventions of the rest of the API. Right now, the client doesn't have the notion of a repository or namespace, so this is better than anything.

Let's break this up into a struct, which takes the transport and baseURL, and a function that follows the signature of the Namespace.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was nervous about this when I was writing it. The original way was more straightforward as it followed already used conventions, but was awkward because it seemed like overkill.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe should I create a separate Namespace interface, or use distribution.Namespace? The problem with using distribution.Namespace is that I'll have to create additional functions for Scope() and Repository() which will be broken.

@stevvooe
Copy link
Collaborator

@pdevine This is almost ready to go. The only aspect I am weary about is the client API for this.

// Namespace provides an interface for calling Repositories. It's intended to be a slimmed down
// version of distribution.Namespace since that interface implements additional functions which are
// currently not supported through namespaces.
type Namespace interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call this Catalog. It will be confusing to mix this up with distribution.Namespace. We'll fix up the naming later.

The way Repositories() was initially called was somewhat different than
other parts of the client bindings because there was no way to instantiate a
Namespace.  This change implements a NewRegistry() function which changes
it so that Repositories() can be called the way one would expect.

It doesn't implement any of the other functions of Namespaces.

Signed-off-by: Patrick Devine <patrick.devine@docker.com>
@stevvooe
Copy link
Collaborator

LGTM

@monikakatiyar16
Copy link
Contributor

For the v2 catalog api, consider the following scenario:
There's a repository called image1 with only one tag v1. Soft deleted/untagged the image using the delete http v2 api call. The image1 still shows up in the v2 catalog repository.

We need to get a list of image: tag. So we used the v2 catalog api to get the repository names and looped it to get the tags for each one to form a list of image: tags. Right now, we manually remove the repository for which zero tags are present. This leads to x number of extra http calls to be made for x single tag repositories. Is there any way such repositories can be removed from the v2 catalog.

Please let me know, if there's any thing I have missed out.

@monikakatiyar16
Copy link
Contributor

Also, we need the date time of creation or modification when the tags were created or modified. Right now, we are manually walking the storage folder and accessing the date. Is there any provision for the same. Its impossible to distinguish between newly created image:tags and the old ones.

@monikakatiyar16
Copy link
Contributor

Apart from that, need to filter these repositories from pagination too.
@stevvooe Is it possible implementing it in v2 api or would it be expensive for the cases where only repositories are to be listed?

@stevvooe
Copy link
Collaborator

@monikakatiyar16 This issue was merged like a 1.5 years ago. What makes you think this is a reasonable venue to ask these kinds of questions? It sounds like you have questions on how to use the API.

I first recommend that you read the specification and documentation. If you still have questions, then I'd recommend reaching out on IRC or the community slack. If you still can't answer your question, open an issue, describing what you tried and why it didn't work so we can better help to route your request.

As a hint, I'd recommend taking a look at notifications.

@monikakatiyar16
Copy link
Contributor

Agreed. This was not the right place to ask questions. I have read about notifications. The only problem is that the v2 catalog api (to list repositories) also lists the repositories for which all the tags have been deleted. I will create a new issue for the same with all the required details. Thanks.

wking added a commit to wking/opencontainers-tob that referenced this pull request Mar 1, 2018
Mike and I have had a lengthy discussion before realizing that we were
not interpreting these terms the same way [1].  This commit replaces
some explicit object-type lists with "objects defined in the image
specification", which is more concrete than using terms without local
definitions.  I've also added image-spec links where I do use
object-type terms.  And I've used the wordy but more explicit "groups
of image indexes" instead of "repositories" in most cases.

I've also explicitly included /v2/_catalog as out of scope.  It's a
higher-level endpoint than the image-spec objects.  As Patrick [2] and
Stephen [3] pointed out when the endpoint landed, it's for internal
administration.  Matt suggested keeping it out of the user-facing API
on those grounds [4], and while that didn't happen in
docker/distribution, I think we want to keep the endpoint out of our
distribution spec in order to avoid burdening implementors (because as
Patrick pointed out [2], it's an expensive endpoint unrelated to image
push/pull.

[1]: opencontainers#44 (comment)
[2]: distribution/distribution#653 (comment)
[3]: distribution/distribution#653 (comment)
[4]: distribution/distribution#653 (comment)
wking added a commit to wking/opencontainers-tob that referenced this pull request Mar 1, 2018
Mike and I have had a lengthy discussion before realizing that we were
not interpreting these terms the same way [1].  This commit replaces
some explicit object-type lists with "objects defined in the image
specification", which is more concrete than using terms without local
definitions.  I've also added image-spec links where I do use
object-type terms.  And I've used the wordy but more explicit "groups
of image indexes" instead of "repositories" in most cases.

I've also explicitly listed /v2/_catalog as out of scope.  It's a
higher-level endpoint than the image-spec objects.  As Patrick [2] and
Stephen [3] pointed out when the endpoint landed, it's for internal
administration.  Matt suggested keeping it out of the user-facing API
on those grounds [4], and while that didn't happen in
docker/distribution, I think we want to keep the endpoint out of our
distribution spec in order to avoid burdening implementors (because as
Patrick pointed out [2], it's an expensive endpoint unrelated to image
push/pull).

[1]: opencontainers#44 (comment)
[2]: distribution/distribution#653 (comment)
[3]: distribution/distribution#653 (comment)
[4]: distribution/distribution#653 (comment)
wking added a commit to wking/opencontainers-tob that referenced this pull request Mar 1, 2018
Mike and I have had a lengthy discussion before realizing that we were
not interpreting these terms the same way [1].  This commit replaces
some explicit object-type lists with "objects defined in the image
specification", which is more concrete than using terms without local
definitions.  I've also added image-spec links where I do use
object-type terms.  And I've used the wordy but more explicit "groups
of image indexes" instead of "repositories" in most cases.

I've also explicitly listed /v2/_catalog as out of scope.  It's a
higher-level endpoint than the image-spec objects.  As Patrick [2] and
Stephen [3] pointed out when the endpoint landed, it's for internal
administration.  Matt suggested keeping it out of the user-facing API
on those grounds [4], and while that didn't happen in
docker/distribution, I think we want to keep the endpoint out of our
distribution spec in order to avoid burdening implementors (because as
Patrick pointed out [2], it's an expensive endpoint unrelated to image
push/pull).

[1]: opencontainers#44 (comment)
[2]: distribution/distribution#653 (comment)
[3]: distribution/distribution#653 (comment)
[4]: distribution/distribution#653 (comment)
wking added a commit to wking/opencontainers-tob that referenced this pull request Mar 1, 2018
Mike and I have had a lengthy discussion before realizing that we were
not interpreting these terms the same way [1].  This commit replaces
some explicit object-type lists with "objects defined in the image
specification", which is more concrete than using terms without local
definitions.  I've also added image-spec links where I do use
object-type terms.  And I've used the wordy but more explicit "groups
of image indexes" instead of "repositories" in most cases.

I've also explicitly listed /v2/_catalog as out of scope.  It's a
higher-level endpoint than the image-spec objects.  As Patrick [2] and
Stephen [3] pointed out when the endpoint landed, it's for internal
administration.  Matt suggested keeping it out of the user-facing API
on those grounds [4], and while that didn't happen in
docker/distribution, I think we want to keep the endpoint out of our
distribution spec in order to avoid burdening implementors (because as
Patrick pointed out [2], it's an expensive endpoint unrelated to image
push/pull).

[1]: opencontainers#44 (comment)
[2]: distribution/distribution#653 (comment)
[3]: distribution/distribution#653 (comment)
[4]: distribution/distribution#653 (comment)
wking added a commit to wking/opencontainers-tob that referenced this pull request Mar 1, 2018
Mike and I have had a lengthy discussion before realizing that we were
not interpreting these terms the same way [1].  This commit replaces
some explicit object-type lists with "objects defined in the image
specification", which is more concrete than using terms without local
definitions.  I've also added image-spec links where I do use
object-type terms.  And I've used the wordy but more explicit "groups
of image indexes" instead of "repositories" in most cases.

I've also explicitly listed /v2/_catalog as out of scope.  It's a
higher-level endpoint than the image-spec objects.  As Patrick [2] and
Stephen [3] pointed out when the endpoint landed, it's for internal
administration.  Matt suggested keeping it out of the user-facing API
on those grounds [4], and while that didn't happen in
docker/distribution, I think we want to keep the endpoint out of our
distribution spec in order to avoid burdening implementors (because as
Patrick pointed out [2], it's an expensive endpoint unrelated to image
push/pull).

[1]: opencontainers#44 (comment)
[2]: distribution/distribution#653 (comment)
[3]: distribution/distribution#653 (comment)
[4]: distribution/distribution#653 (comment)
thaJeztah pushed a commit to thaJeztah/distribution that referenced this pull request Apr 22, 2021
thaJeztah pushed a commit to thaJeztah/distribution that referenced this pull request Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants