Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container Images #1685

Closed
mattfarina opened this issue Nov 11, 2021 · 13 comments · Fixed by #1777
Closed

Container Images #1685

mattfarina opened this issue Nov 11, 2021 · 13 comments · Fixed by #1777
Assignees

Comments

@mattfarina
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Container images that people would want to consume generally (e.g., PostgreSQL) used to all be in Docker Hub. It was that central store. Things have change and container images are stored all over. That includes those meant to be consumed by the masses and not just company or hobby projects.

This makes discoverability difficult.

Describe the solution you'd like
I would like to have a site to go to where I could search for container images and get a list of them in their distributed locations.

Describe alternatives you've considered
I've considered using a search engine, like Google. Unfortunately, it does a bad job because I'm just looking for container images and it displays all types of pages.

I've considered just relying on Docker Hub search. But, some of the images I need to work with are in other registries. I imagine there are other great images in those other registries I would like to use.

Additional context
Container images are part of the OCI and container projects (e.g., containerd and cri-o) are part of the CNCF. The OCI and CNCF are both part of the Linux Foundation.

I'm not sure container images should be in the Artifact Hub. I would like to discover them in either the Artifact Hub or a system that just does images but is like the Artifact Hub. I'm mostly unsure what this would do for user experience, scaling (e.g. with scanning), and other things.

@tegioz
Copy link
Collaborator

tegioz commented Nov 15, 2021

Hi @mattfarina 👋

This sounds interesting 🙂

@cynthia-sg and I have been discussing this and we have some ideas that could make it work, but we’ll need to experiment a bit with them to be sure. Scaling may be another challenge indeed.

However, we are not sure either if container images should be in the Artifact Hub or not. What do you think @caniszczyk?

@caniszczyk
Copy link

caniszczyk commented Nov 15, 2021 via email

@mattfarina
Copy link
Collaborator Author

@tegioz I brought it up with @caniszczyk before I filed the issue here. :)

@tegioz
Copy link
Collaborator

tegioz commented Nov 17, 2021

Hi 👋

There are a couple of points we'd like to discuss a bit before moving on.

Discoverability

At the moment the OCI distribution spec does not define a mechanism to list all repositories for a given namespace. There are some interesting conversations going on (opencontainers/distribution-spec#22, opencontainers/distribution-spec#222, OCI Catalog Listing API - Workgroup), but we are not there yet. Some registries have their own APIs for this purpose, but if we were to add a new Container image repository kind to Artifact Hub we think it'd be better to build it on top of a standardized solution.

This limitation may have an impact on the user experience.

In Artifact Hub, publishers add repositories that can contain one or more packages. Setting up a repository is a simple process that just takes a minute, and only requires adding the repository name and url. Our initial idea was to map somehow a namespace in an OCI registry to a repository in AH, and each of the images repositories in the registry namespace would become a package in AH, which can have multiple versions (or tags in this case). However with limitations in discoverability this may not be possible unless we consider relying on registries' specific APIs and only support those offering that functionality, which is probably not ideal.

Another approach would be to map a repository in a OCI registry to a repository in AH. This would be the easiest way to go, but we think this can lead to a poor user experience. Some organizations have hundreds of repositories published in a registry. Bitnami, for example, has 267 repositories published in the Docker Hub at the time of writing this. Following this approach, they'd need to add the same amount of repositories to AH, each of them having a single package with multiple versions.

Metadata and documentation

Most of the containers images available in the major public registries don't contain any metadata or documentation embedded. Importing those container images would lead to a poor user experience in the Artifact Hub UI, as we wouldn't have much information to display and search results wouldn't be very accurate without descriptions, categories or keywords.

We were thinking that we could require some metadata to be present in the image for it to be indexed. The metadata could be provided in the form of annotations as defined in the OCI image spec. We could leverage the pre-defined annotation keys when they fit, and define AH specific ones when needed.

This would require an extra effort from publishers to include this information in their images, but would improve the final user experience in AH. Not all tooling is ready yet to deal with annotations in images, but we could consider supporting labels in the configuration as a fallback mechanism.

There are more topics worth discussing, but we can start with these two 🙂

@SteveLasker
Copy link

Hi folks,
+1 to a syndication model. When we created MCR, and syndicated the catalog to Docker Hub, this was the initial goal. To create a syndication API by which NVidia, Oracle, and other software registries could syndicate their catalog content so each software vendor could distribute their software directly, while re-distributors, could aggregate the list of content.

This is no different than Home Depot, Lowes, Amazon having a broad catalog, where many of the products are "drop shipped' directly from the manufacturer.

Within the Syndication API, you might choose to replicate just the catalog info, or the content as well. Through signing, it wouldn't matter where you get the artifact, just as long as it's an artifact from an entity you trust. I wrote about this model here: Separating Identity From Location

While the biggest gap today is a standard API to replace the _catalog API for listing repositories, it actually gets more complex as how do you know what repositories were added, or removed? Yes, you can do a comparison each time, but the list gets quite big. Then, what do you do for a few added, or updated tags within a repository? A single repository can have thousands of tags. Public registries already do optimizations around the tag listing apis, without being pinged for "what changed, what changed, what changed"

For how to distribute meta-data and documentation, I had played with the idea of shipping the documentation for a repo as just another artifact type.

Imagine you push an OCI Artifact with two files:

./readme.md
./details.md

The artifact might be tagged :regdoc, with an artifactType of application/vnd.something.regdoc.v0

Publishers of a repo simply upload this artifact.
The registry operator, that wishes to display readme content, watches for the artifactType, and matches the tag.
If they match, the registry cracks open the blobs and display whatever the repo owner provided.
By using a collection of markdown files, the repo owner can actually express as much detail as they'd like. One of the constraints we have with MCR syndication today is the inability to upload a scan result summary to docker hub. Not just because we can't upload files, but the content length is limited.

In the ./readme.md, you could have links to external sources, or provide additional .md files that have scan results. The registry could host up to 10 markdown files, so the user could create a mine markdown doc site.

We're also exploring the ORAS Artifact reference types to host the regdoc content, specific to an artifact, but that's a bit deeper.

For annotations, I'd also really like to see annotations and labels be indexed. Until we have a way to get the annotations out, it's hard to see the broad adoption of annotations. We also have to think about how annotations from the original artifact are added with annotations that would be added after. I captured some notes here, and started thinking about ORAS Artifacts being able to upload just annotations to enhance an existing artifact.

I'd suggest a working group is a place to start as there's not enough to use today, so whichever direction we go, we'll need new apis.
In the working group we can capture the requirements and goals, and then figure out which of the existing proposal can be built upon. For instance, there's a pending extensions proposal that gets getting close, but we'd still need to spec out what the syndication extension would be.

@tegioz
Copy link
Collaborator

tegioz commented Nov 17, 2021

Thanks @SteveLasker! All this information will be very useful 🙂

Regarding shipping the documentation as another artifact type: we are also experimenting with something similar by allowing publishers of Helm repos stored in OCI registries to add the repository metadata file, used for features like verified publisher and ownership claim, as an additional artifact using the tag :artifacthub.io. We recommend users to publish it using ORAS (more details here). Not much traction yet as most Helm repositories are HTTP based though, but hopefully it'll be used more over time.

@SteveLasker
Copy link

@tegioz awesome to see the ownership claim. For the OCI usage, this has just been the chicken/egg issue where folks won’t adopt while under experimental.
Hopefully, experimental will get removed, and we can move to cart/horse and start adding these other capabilities.

@mattfarina
Copy link
Collaborator Author

@tegioz Sorry I was slow to see your comments on this. Great starting analysis.

I see that this landed on the OCI mailing list at https://groups.google.com/a/opencontainers.org/g/dev/c/Le0BtdnqS40.

Most of the containers images available in the major public registries don't contain any metadata or documentation embedded. Importing those container images would lead to a poor user experience in the Artifact Hub UI, as we wouldn't have much information to display and search results wouldn't be very accurate without descriptions, categories or keywords.

We were thinking that we could require some metadata to be present in the image for it to be indexed. The metadata could be provided in the form of annotations as defined in the OCI image spec. We could leverage the pre-defined annotation keys when they fit, and define AH specific ones when needed.

This would require an extra effort from publishers to include this information in their images, but would improve the final user experience in AH.

I don't think that all public images need to be indexed by AH. I've created many public images that aren't supported and are basically tests that ended up being abandonware.

The extra effort to provide a good UX around images means to be consumed by the general public is ok, IMHO. Those who want their images to be generally used as independent images can put in some of that extra work. This is my opinion and if people want to persuade me of something else, I am open to listening.

Some organizations have hundreds of repositories published in a registry. Bitnami, for example, has 267 repositories published in the Docker Hub at the time of writing this.

I wonder if Bitnami wants all their images to be listed or if they are just a building block of a higher level component (i.e. a chart). This could use some investigation (I'll do it).

On the OCI mailing list it was noted:

I do really like the metadata idea - and maybe this could be a potential working group for OCI?

This conversation may be good to take over there. I can also volunteer to take it there.

Docker Hub has been the go to place so people can discover images. If it were more easily possible to discover images without putting them on Docker Hub it would relieve some burden on Docker Hub while providing more publicity for other registries. I can see potential interest in helping this effort.

@tegioz
Copy link
Collaborator

tegioz commented Dec 6, 2021

Thanks @mattfarina, no worries.

We were thinking that we could require some metadata to be present in the image for it to be indexed.

The extra effort to provide a good UX around images means to be consumed by the general public is ok, IMHO. Those who want their images to be generally used as independent images can put in some of that extra work.

Cool, that's agreed then 👍

Another approach would be to map a repository in a OCI registry to a repository in AH. This would be the easiest way to go, but we think this can lead to a poor user experience. Some organizations have hundreds of repositories published in a registry.

I wonder if Bitnami wants all their images to be listed or if they are just a building block of a higher level component (i.e. a chart). This could use some investigation (I'll do it).

If we agree that mapping a repository in a OCI registry to a repository in AH is ok, we can go ahead and start working on a prototype for this new Container image repository kind to see how it fits within AH. Please note that it's not only Bitnami, there are many more orgs with hundreds of repositories. But like you said, maybe only a subset of them will listed on AH. And if we think about this in combination with the metadata requirement, maybe users/orgs will only put in that extra work for certain repositories.

I do really like the metadata idea - and maybe this could be a potential working group for OCI?

This conversation may be good to take over there. I can also volunteer to take it there.

Awesome, thanks!

@vbatts
Copy link

vbatts commented Dec 9, 2021

sounds good. I'm happy to brainstorm or be roped in as needed.

@mattfarina
Copy link
Collaborator Author

@tegioz I reached out to someone at Bitnami and the response was that they would want to list many of their images. Since registries don't have a method to discover all the images in a namespace, they would be open to another method to provide a list of images such as uploading a list in some format.

tegioz added a commit that referenced this issue Jan 10, 2022
Closes #1685

Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
Co-authored-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Co-authored-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
tegioz added a commit that referenced this issue Jan 10, 2022
Closes #1685

Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
Co-authored-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Co-authored-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
@tegioz
Copy link
Collaborator

tegioz commented Jan 10, 2022

Hi 👋

Happy new year to everyone! 🙂

We've just created a PR that adds experimental support for containers images to Artifact Hub. Please see the containers images section in the repositories guide for more information about how it would work.

Repositories can be added from the UI control panel as usual:

1

And this is how packages of kind Container image would look like in the UI:

2

3

Please let us know your thoughts!

tegioz added a commit that referenced this issue Jan 12, 2022
Closes #1685

Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
Co-authored-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Co-authored-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
tegioz added a commit that referenced this issue Jan 12, 2022
Closes #1685

Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
Co-authored-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Co-authored-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
@tegioz
Copy link
Collaborator

tegioz commented Jan 12, 2022

Experimental support for containers images has just been deployed 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants