Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear: should registries permit auto-deletion of manifests when last tag is deleted? #180

Open
pmengelbert opened this issue Aug 24, 2020 · 9 comments

Comments

@pmengelbert
Copy link
Contributor

The proposed 1.0 spec includes deletion of tags via the HTTP API. What remains unclear is whether registries should be allowed auto-delete a manifest when all of its associated tags have been deleted. This issue was referenced in the comments to PR #178

@hallyn

@vsoch
Copy link
Contributor

vsoch commented Sep 18, 2020

I have this question too - if we delete a tag that is associated with other images, then of course we wouldn't want to delete the image. However if we delete a tag that only is linked to one image, I would expect the image is deleted too, because then you would have an image without a tag (which would still have a version, but no tag, is that allowed?) Finally, if there is a request to delete a manifest, I'd also assume that this means cascading to delete the associated image and blobs.

@vsoch
Copy link
Contributor

vsoch commented Sep 18, 2020

Are others implementing registries to share blobs, to be efficient in terms of saving space? E.g., if we upload blobs and they are unique based on digest, if some second image is uploaded with the same digest it could re-use that blob. And then on a delete of an image (via tag or manifest) the blob would only be deleted given that it's not linked to any other images. A simpler approach (but one that is redundant) is to link each blob explicitly to one image, and if the image is deleted via a tag/manifest, we can be sure the blob isn't needed by another one.

So follow up question - given the above - if we get a request to upload a blob (and the digest and content type already exist, for some other image) do we do the upload again? It seems risky to have blobs shared because in the case of a chunked upload where the digest is sent at the end, we would need to calculate the digest of the entire image on the server to validate the digest is correct, which is likely more intensive than just calculating the digest of a body for a request (for a single POST, for example). I'm wavering back and forth about whether my implementation should allow shared blobs or not (and what is a safe way to allow that).

@jonjohnsonjr
Copy link
Contributor

I have this question too - if we delete a tag that is associated with other images, then of course we wouldn't want to delete the image. However if we delete a tag that only is linked to one image, I would expect the image is deleted too, because then you would have an image without a tag (which would still have a version, but no tag, is that allowed?)

An explicit deletion API makes this more apparent, but this situation happens all the time in normal operation. Since tags are mutable, whenever you push an image to a pre-existing tag, the previous image becomes tag-less. This happens a lot with "latest".

Since you can pull images by digest, it's also pretty common for people to be referencing untagged images. If you assume that untagged images are okay to delete, this puts those users in a bad spot.

Finally, if there is a request to delete a manifest, I'd also assume that this means cascading to delete the associated image

I'm not sure how you're distinguishing the image from the manifest here. From the registry's perspective, the manifest is the image, IMO.

and blobs.

Often, yes, but this is an implementation detail.

@jonjohnsonjr
Copy link
Contributor

It seems risky to have blobs shared because in the case of a chunked upload where the digest is sent at the end, we would need to calculate the digest of the entire image on the server to validate the digest is correct, which is likely more intensive than just calculating the digest of a body for a request (for a single POST, for example).

FWIW, it's possible to resume the state of a sha256 hasher. See https://github.com/stevvooe/resumable and https://golang.org/pkg/crypto/sha256/#New, specifically:

The Hash also implements encoding.BinaryMarshaler and encoding.BinaryUnmarshaler to marshal and unmarshal the internal state of the hash.

I believe your implementation is in python, but I would suspect there is an equivalent library.

Are others implementing registries to share blobs, to be efficient in terms of saving space?

I believe most registries do, yes. Be careful not to allow your implementation to leak information about blob existence across security boundaries: #18 (comment)

I'm wavering back and forth about whether my implementation should allow shared blobs or not (and what is a safe way to allow that).

This is certainly a hard problem and how you implement this depends on the features/guarantees you are provided by the underlying storage layer.

A simpler approach (but one that is redundant) is to link each blob explicitly to one image, and if the image is deleted via a tag/manifest, we can be sure the blob isn't needed by another one.

I'd probably start with something simple like this and replace it with a more sophisticated strategy later.

Also, be careful of race conditions here. It's possible to accidentally delete blobs that are referenced by manifests if you don't do garbage collection transactionally. Some registries do "stop-the-world" GC to avoid issues here, which is probably the easiest approach.

@vsoch
Copy link
Contributor

vsoch commented Sep 18, 2020

@jonjohnsonjr if we are sharing blobs across images (manifests) and we don't need to finalize an association until a manifest is provided, why would we need to provide the name of the repository for any kind of POST request, e.g., POST /v2/<name>/blobs/uploads/? Is it just to validate that the user has permission to interact with the registry, via that repository?

@jonjohnsonjr
Copy link
Contributor

Just to be clear, my answers are based mostly on experience operating a registry and writing a few registry clients -- I wasn't part of the initial spec writing, so I'm not privy to the "why" of most of these choices beyond what I could intuit.

if we are sharing blobs across images (manifests) and we don't need to finalize an association until a manifest is provided,

One reason is that a registry implementation may not be sharing blobs, so you would need <name> to implement it properly. I think the current choice of including the name in the path allows for registry operators to have more freedom in implementation than the alternative, e.g. you could write a registry that just ignores the path component.

why would we need to provide the name of the repository for any kind of POST request, e.g., POST /v2/<name>/blobs/uploads/? Is it just to validate that the user has permission to interact with the registry, via that repository?

I think that's part of it, yes. You want to reject a request early if the client doesn't have permission to upload images to that repository.

It's also useful for the registry to know "where" the blobs should be stored, depending on how you partition namespaces.

E.g. example.com/foo/bar might go to a different S3 bucket than example.com/foo/baz.

@SteveLasker
Copy link
Contributor

So, this is a common discussion. I talked about it a bit here, in the ORAS repo: oras-project/oras#171 (comment)

Basically, as time yields on, everyone will have to deal with delete. Look at the latest docker TOS updates. 15 petabytes of data. And, that's not the expensive part. We can't keep data forever, particularly docker images. Sorry, but anyone that says they don't believe in delete isn't being realistic, and I don't think that's what they really mean.
I'd suggest the problem is how to communicate how to delete when you have ref-counting of layers, and tags can take updates as Jon points out.
What I do believe the spec should clarify is a standard for what it means to:

  • delete a tag, where the tag is the last reference to a digest
  • delete a manifest that has tag references
  • layers, and config objects, IMO, should just be deleted when no longer referenced by a manifest. The fact the spec has a hole here, ( i believe) is an oversight and a weak link

The problem is we've all implemented it with varying approaches.
ACR supports a range of auto-purge options, with more coming. We also support automating untagged manifests. But, we make it an option as we know some users deploy with digests, not tags. (My belief is, being able to Sign and Lock a tag reference would eliminate that need)

As for layer sharing, that is a registry optimization.
Jessie Frazelle put the fear into us when first building ACR for how someone could possibly hack a malformed layer into a registry, and pollute others. So, for ACR, we don't share layers across registries. While it makes for a great demo to push your first image to a new registry, and see several base layers are skipped, we haven't found it to be a problem. And, base layers change so often, it's marginally valuable.
And, when serving content, if you're not using a CDN, which is expensive and not possible in all scenarios, the bottleneck on concurrency is the number of copies you have to serve concurrently. If 1,000 customers were pulling an image that shared the same 3 base layers, and the 1,000 customers pulled an average of 10 nodes at the same time, you now have a chokepoint attempting to serve those 3 common layers.
Keeping copies provides security isolation and expanded concurrency.

Ok, ramblings on a rainy Friday in a pandemic.
I'm out...

@vsoch
Copy link
Contributor

vsoch commented Sep 19, 2020

Sorry, but anyone that says they don't believe in delete isn't being realistic, and I don't think that's what they really mean.

I believe in delete! I'll need to think about the layer/blob sharing - I refactored to use shared blobs this afternoon because for someone implementing a registry with Django they are likely a small academic group doing so on a local filesystem, and storage is more of an issue, and users are probably scoped to the group (that can better be trusted to not do such an attack). But it's still a concern that I have. Is the detail of how it would happen written anywhere?

Ok, ramblings on a rainy Friday in a pandemic. I'm out...

Roger that! And I appreciate your rainy Friday, pandemic rantings! I'm having a lot of fun working on this :O) Have a good (hopefully not rainy, definitely not Friday, but probably still pandemic) weekend!

@jonjohnsonjr
Copy link
Contributor

And, that's not the expensive part.

I think that's roughly the point of the "never delete" crowd -- it doesn't cost that much to store these, so why clean them up?

We can't keep data forever, particularly docker images.

Why not? GitHub just put a bunch of code in an arctic vault!

Sorry, but anyone that says they don't believe in delete isn't being realistic, and I don't think that's what they really mean.

I think it's a bit more nuanced than that. As an analogy, within the git ecosystem, it's standard etiquette not to rewrite history because you can end up breaking anyone downstream of you. Git is a little different here because the entire commit history is inextricably linked, but I try to follow a similar etiquette with public images.

IMO (barring legal concerns), any image that has ever been publicly discoverable via the registry API (i.e. tagged) should be kept indefinitely, otherwise you risk breaking someone who depends on that image.

What that doesn't include:

  • Images produced by CI for testing -- these should probably be in a separate repo.
  • Images in private repositories -- if you can control who has access, it's completely reasonable to have a time horizon for expiring images.

What I do believe the spec should clarify is a standard for what it means to:
...

Yeah I'd like to have a discussion about this, possibly just around standardizing response codes and response bodies. I'd love it if we could return a structured error that said something like you can't delete this because it is referenced by the following tags: ["latest", "foo", "bar"], so clients could automatically handle cleaning up tags. Maybe also some kind of "force delete" header to indicate that it should delete all associated tags as well. These same issues apply to manifest lists referencing manifests. Similarly, it might be nice to standardize on an error for immutable tag policies. These might be good candidates for extension proposals.

I guess I'll stop rambling, but yeah this is hard and there doesn't seem to be an obvious answer to any of these questions that will make everyone happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants