-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface Deletions Better #11841
Comments
If we are talking about a case where we packages can be deleted from the PyPI and what would be the outcome, then I don't think it needs much of attention but what we can do to improve user experience we can show when was the package deleted but rest of the problem need not be considered as the chance of happening is quite less and can be ignoreed |
As a related question, does this behaviour mean that sensitive data persists via For example, see this PR where someone requested removal of a package entry from the pypi-data repository, after they also deleted the associated package which contained AWS access keys.1 The PR contains
It's surprising — and I'm not sure if it's documented somewhere where users might read2 — behaviour to me that deleting the package from PyPI does not delete the underlying data. Footnotes
|
Yes. It's not actually possible to delete sensitive data wholly once you've released it on PyPI. Even if we deleted things from the underlying storage, there's a large mirror network that near instantly mirrors and often times is configured not to respect deletions. This means that once it's out there, it's out there. No take backs. |
There's currently a discussion going on about if we want to make any changes to when things are able to be deleted from PyPI, it's not clear how that will turn out but almost certainly there's going to still be situations when things can get deleted from PyPI.
Currently when something gets deleted from PyPI there's no longer any record of it outside of the journals/audit logs. This missing information can make debugging harder for users of PyPI when some file goes missing that used to exist.
It might be worthwhile to expose these deletions in some way, possibly to even give people a way to add a note for why something was deleted.
For Python level tools, if the version specifier allows other files besides the deleted one, it will just silently grab another version. This can paper over a lot of the obvious problems that happen with deletions (but not all of them, since there may not be other files that are acceptable) but this can actually make subtle bugs more frustrating to discover or debug since you may end up with different versions. Pinning it The(Tm) solution to that, but pinning makes it more likely that this error turns into a hard error, leaving people's heads scratching.
For non-python level tooling, a lot of them pin to a specific URL (or a set of URLs to allow for mirroring) and bake that into their downstream build systems.
In some cases, deletions probably go unnoticed by these systems because, as an implementation detail of PyPI, deletions don't actually delete the underlying file from our blob storage, and
files.pythonhosted.org
doesn't consult the database, it just goes direct to the blob storage. That means that if you know the full URL with the hash in it and are pinned to it, you're currently safe from deletion affecting you BUT, that is an implementation detail of PyPI and is subject to change at any time.In other cases, downstream wants to be able to construct the URL from nothing but the package name and version, without having to bake in our long URL structure. Those downstreams are relying on a redirect powered by Conveyor, which hits the JSON API to fetch the real underlying URL and redirect to that URL. In those cases, when Conveyor tries to generate the redirect, it gets no information other than the file doesn't exist in the JSON api, which it turns into a 404 with no additional details.
We could try to surface this situation in a better way, possibly providing details in the 404, or replacing the 404 with a 410 or something like that.
I don't really have any specific ideas here, and it's possible that the discussions around restricting deletions end up making this an edge case that isn't really worth worrying about. I just wanted to get it down as something that we might want to do.
The text was updated successfully, but these errors were encountered: