Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore sdists with malformed names #9

Open
njsmith opened this issue Jan 24, 2023 · 3 comments
Open

Ignore sdists with malformed names #9

njsmith opened this issue Jan 24, 2023 · 3 comments

Comments

@njsmith
Copy link
Owner

njsmith commented Jan 24, 2023

When reading https://pypi.org/simple/cffi, we currently see cffi-1.0.2-2.tar.gz and parse it as name: cffi-1.0.2, version: 2. And then in PackageDB::available_artifacts("cffi"), we end up filing this under version 2.

I don't think we can parse this sdist name in general -- at least without breaking much more common cases like scikit-learn-1.0.2.tar.gz. But a very simple thing we could do is, when reading a simple API page, ignore all entries whose name doesn't match the simple API page we're looking at!

(I guess we could also get fancier, and try to use the simple API page to bias the sdist name parsing? But I think stuff like cffi-1.0.2-2.tar.gz is super rare and we can probably just skip it.)

@encukou
Copy link

encukou commented Jan 24, 2023

See PEP 625. The sdist filename was standardized 2 years ago, so you can parse it. There should be only one dash, since the name and versions should be normalized.
There are stragglers, and historical releases won't be fixed, but a new tool should be OK with simply ignoring those -- though it does need to detect them. Apparently the overwhelming majority of legacy filenames contain multiple dashes, so detecting that could be good enough.

@encukou

This comment was marked as outdated.

@njsmith
Copy link
Owner Author

njsmith commented Jan 24, 2023

Ah, yeah, that's another option -- skipping any sdist name with multiple dashes. I was assuming that we couldn't drop compat with old non-compliant artifacts, but maybe we could get away with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants