Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines in the description field produce a malformed PKG-INFO #1390

Closed
mgedmin opened this issue Jun 18, 2018 · 42 comments · Fixed by #2538
Closed

Newlines in the description field produce a malformed PKG-INFO #1390

mgedmin opened this issue Jun 18, 2018 · 42 comments · Fixed by #2538
Labels
bug help wanted Needs Implementation Issues that are ready to be implemented.

Comments

@mgedmin
Copy link
Contributor

mgedmin commented Jun 18, 2018

We discovered this accidentally by way of zopefoundation/zc.relation#4 (comment): if you pass a string containing newlines to the description argument of setup(), setuptools will generate a malformed PKG-INFO.

To reproduce:

# setup.py
from setuptools import setup
setup(
    name='test-package',
    version='0.1',
    author='Blah Blah',
    author_email='blah@example.com',
    description='description\n\n',
    py_modules=['blah'],
)

(The contents of blah.py do not matter, but the file should exist.)

Run python setup.py sdist and the inspect test_package.egg-info/PKG-INFO. For me, with setuptools 39.1.0, it looks like this:

Metadata-Version: 1.0
Name: test-package
Version: 0.1
Summary: description


Home-page: UNKNOWN
Author: Blah Blah
Author-email: blah@example.com
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN

The extra newlines lead tools to treat the rest of the PKG-INFO as a long_description.

I would expect setuptools to complain about the newlines in the description field, or at least escape them properly (i.e. prepend whitespace, like it does for the long_description field).

@jaraco
Copy link
Member

jaraco commented Jun 18, 2018

I looked at this for about 10 minutes, and I'm still not sure where validation should happen for these options/metadata fields. From what I can tell, only a few of the options do any sort of validation, and even then only in distutils.

So while I agree it would be nice for Setuptools to reject invalid metadata fields, it may not even have a good hook point to do so. Perhaps a hook could be added in Distribution.finalize_options.

@mgedmin
Copy link
Contributor Author

mgedmin commented Jun 18, 2018

Do you think this would be best fixed in distutils? Such a fix would take (much) longer to reach users, but I don't think this foot-gun is very dangerous.

@pganssle
Copy link
Member

@mgedmin I think at this point things are not really being fixed much in distutils. Some time soon we'll be removing setuptools' dependency on distutils and distutils will be deprecated entirely.

@di
Copy link
Member

di commented Jun 18, 2018

Just wanted to point out that this is not limited to just the Summary field -- any field with a double newline will cause the rest of the PKG-INFO file after those newlines to be interpreted as the "message body" and thus become the Long-Description.

@pganssle
Copy link
Member

@di That's not great. I think we could reasonably add strip('\n') to all fields, but that wouldn't stop anyone from doing something like A\n\nB. Is there any way to escape double-newlines so they aren't interpreted this way?

I see the options as:

  1. Escape newlines in the text so they aren't interpreted as delimiters.
  2. Raise an error whenever double newlines are found outside of long_description.
  3. Silently s/\n+/\n/g for everything except long_description.

I think I prefer 1 as least disruptive, but I'm ambivalent between 2 and 3. 2 is more explicit, but 3 creates less "churn" (since this will just be auto-magically fixed).

@di
Copy link
Member

di commented Jun 18, 2018

I think #1 is probably not the right thing to do here, as the resulting field would still contain the extra (albeit escaped) newline characters, which would not be what the user wants.

I think if we could do #2, it would be easy enough to just do #3 and avoid a possibly-confusing error message.

@icemac
Copy link
Contributor

icemac commented Jun 19, 2018

@mgedmin @di Even a single newline at the end of the description parameter of the setup function seems break PKG-INFO. (The second newline is added when writing PKG-INFO to separate the parameters in the file.)

See https://pypi.org/project/zc.relation/1.1.post1/#files for such a malformed PKG-INFO and https://pypi.org/project/zc.relation/1.1.post1/#description for the PyPI parsing result.

@AmitAronovitch
Copy link

AmitAronovitch commented Aug 15, 2018

#262 was another case where a single newline at the end breaks description.

I tend to think that "newlines are not allowed in that field" would be a clear enough error message to support option (2) for that case.

However, if you go for the explicit options (1 or 3), note that setup.py upload (as opposed to twine) does NOT fail in that case (not sure what/how it does exactly, but data is shown correctly in testpypi).
Hence - having a look at what the raw upload command does might provide more objective feedback re: "what the user expects" here.

@bittner
Copy link
Contributor

bittner commented Oct 1, 2018

I see the options as:

  1. Escape newlines in the text so they aren't interpreted as delimiters.
  2. Raise an error whenever double newlines are found outside of long_description.
  3. Silently s/\n+/\n/g for everything except long_description.

I think I prefer 1 as least disruptive, but I'm ambivalent between 2 and 3. 2 is more explicit, but 3 creates less "churn" (since this will just be auto-magically fixed).

We should fail hard for configurations that are invalid. There should be an error message that gives a clear indication on how to fix the problem. This is a developer API, and developers should deal with their broken code. We shouldn't try to guess-fix broken stuff.

Do the fields/kwargs have an explicit specification of the data they expect? This should be enforced. The fields description and license obviously expect single-line data. Hence, when a newline is submitted in the data for those fields the generation of PKG-INFO should be aborted with a clear error message, e.g.

$ python3 setup.py sdist
ERROR: The "description" field requires a single line of text. Multi-line text was supplied.
Aborting.

@pganssle pganssle added Needs Triage Issues that need to be evaluated for severity and status. bug help wanted Needs Discussion Issues where the implementation still needs to be discussed. Needs Implementation Issues that are ready to be implemented. and removed Needs Triage Issues that need to be evaluated for severity and status. Needs Discussion Issues where the implementation still needs to be discussed. labels Oct 19, 2018
@pganssle
Copy link
Member

OK, anyone who wants to submit a PR, let's go with a hard-failure if newlines are found in fields other than long_description.

I think the easiest way to do this is to implement the check right before PKG-INFO is actually written out. Hopefully that doesn't disconnect the error message too much from the location where these inputs are actually being generated.

@bittner
Copy link
Contributor

bittner commented Oct 23, 2018

I can do this -- if no-one else has already started.

IIUC, I have to adapt the write_pkg_file pseudo-method, which is used to monkey patch the distutils implementation from the Python standard library.

I'd really prefer to introduce a class for this that can encapsulate the validation logic for all values. It feels too much like spaghetti code, otherwise. I hope that's fine.

@di
Copy link
Member

di commented Oct 23, 2018

I'd really prefer to introduce a class for this that can encapsulate the validation logic for all values. It feels too much like spaghetti code, otherwise. I hope that's fine.

One of my top priorities for this weekend's sprint is to move metadata validation logic out of Warehouse and into pypa/packaging for reuse elsewhere, and this would be a good example of "elsewhere".

The API will likely behave a lot like how it's currently being used in Warehouse: there will be a class which is initialized with a dict adhering to JSON-compatible metadata, and an instance of that class will have a .validate() function and a .errors attribute.

If you want to move ahead with that assumption, we can probably just connect the dots later.

@bittner
Copy link
Contributor

bittner commented Oct 28, 2018

@di I have some difficulties with the idea of passing in only a (flat) dict with JSON-compatible Metadata to the constructor. Currently, the resulting code would look something like below. But this way we'd have to somehow stick the validation logic into the validate method:

def write_pkg_file(self, file):
    """Write the PKG-INFO format data to a file object."""
    metadata = Metadata(dict(
        name=self.get_name(),
        version=self.get_version(),
        description=self.get_description(),
        # ...
    ))
    metadata.validate(throw_exception=True)

    version = get_metadata_version(self)
    file.write('Metadata-Version: %s\n' % version)
    file.write('Name: %s\n' % metadata.name)
    file.write('Version: %s\n' % metadata.version)
    file.write('Summary: %s\n' % metadata.description)
    # ...

To avoid setting a validator for every single value we may set a "default" validator, and set individual validators with method calls, e.g.

        # ...
    ))
    metadata.add_validator('default', single_line)
    metadata.add_validator('long_description', multi_line)
    metadata.add_validator('requires', check_requirements)
    metadata.validate(throw_exception=True)

Note that this splits up the value and its validation specification, and duplicates the metadata key.

Instead, I'd rather specify the validator together with with the metadata item, e.g.

    metadata = Metadata(dict(
        name=Value(self.get_name(), validator=single_line),
        version=Value(self.get_version(), validator=single_line),
        description=Value(self.get_description(), validator=multi_line),
        # ...
    ))

Obviously, this is not particularly beautiful, but value and validation would be visible next to each other. Maybe there is a similar but better way. Any ideas?

I've also noticed that there are quite a few check_* functions the the dist module. Would it be okay to move them to a separate validation module together with the Metadata class?

@di
Copy link
Member

di commented Oct 28, 2018

@bittner Not sure I follow... The validator(s) would be part of the packaging library, which would understand how to validate a field based on the field name. You shouldn't have to do an add_validator here at all.

@bittner
Copy link
Contributor

bittner commented Oct 29, 2018

@di Please see PR #1562 and let me know if this is roughly what you envisioned. Thanks!

@webknjaz
Copy link
Member

webknjaz commented Nov 5, 2021

According to the TODO comment in the source, it's time to turn the warning into an exception.

@kiorky
Copy link

kiorky commented Nov 17, 2021

Lots of our builds are AGAIN broken since this release, for various dependencies.

I would have expect that setuptools does not hardly break for a simple escape all \n workaround.

@jaraco
Copy link
Member

jaraco commented Nov 17, 2021

I would have expect that setuptools does not hardly break for a simple escape all \n workaround.

It seems like an unfortunate wart to carry to support transforming \n to forever. Is that what you're proposing, that Setuptools should allow invalid inputs and maintain responsibility for making them valid?

To be sure, Setuptools is not happy about breaking the builds. It's conceivable we could back out the removal and restore the Deprecation warning for an extended period. Would that help?

@ssbarnea
Copy link

IMHO, no need to act. Package maintainer should be able to fix it, is not that we didn't see this coming.

@kiorky
Copy link

kiorky commented Nov 17, 2021

It's not true, maintainers of those package can fix current code, but that will be only for "future" releases of one package, and not the already released one.

It means that every other stack that relies on that package needs to be also upgrade their versions pinning to fix their own build pipelines.

Indeed, as of today, builds are broken for nothing but a little metadata violation, i agree.
But this violation can still easily reworked (stripping \n is not a big deal) to be acceptable and then conserve retrocompatiblity which has no prize.

So, yes, the best option indeed is a forever(and not an extent of the deprecation period) warning, i would be really fine with that.

@kiorky
Copy link

kiorky commented Nov 17, 2021

To make things clear, maybe a solution would be that what is not already uploaded to pypi can not be accepted, but what's already released should continue to be installable as it is already used in the wild.

Tobias-Fischer added a commit to Tobias-Fischer/catkin_pkg that referenced this issue Nov 18, 2021
…uptools

Currently many builds are broken with the newest setuptools
This is because of pypa/setuptools#2870
See e.g. pypa/setuptools#1390 pypa/setuptools#2895 pypa/setuptools#2893
fjsj added a commit to vintasoftware/django_markdown that referenced this issue Nov 27, 2021
fjsj added a commit to vintasoftware/django_markdown that referenced this issue Nov 27, 2021
fjsj added a commit to vintasoftware/django_markdown that referenced this issue Nov 27, 2021
tchassagnette added a commit to momentslab/ffprobe-python that referenced this issue Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted Needs Implementation Issues that are ready to be implemented.
Projects
None yet