Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 639: Make requested additions and changes, add additional helpful info, reformat, update and copyedit #2164

Merged
merged 19 commits into from
Dec 17, 2021

Conversation

CAM-Gerlach
Copy link
Member

As requested and discussed at length on the Discourse thread, this is a thorough rewrite of PEP 639, to reflect the numerous additions (PEP 621 metadata, deprecation guidance), changes (use License-Expression field instead of License, license file handling), updates (to many tools, specs, links, best practices and docs it references in the two+ years since its creation) and reformatting (as a standalone PEP rather than a spec update, inline links per #2130 , headings, various formatting/PEP 12 issues).

Additionally, adds examples, user scenarios and a terminology section (as mentioned there), comprehensively documents the many ultimately rejected ideas suggested during the course of that thread, expands several of the other sections, resolves many issues and inconsistencies in the text, adds a number of internal links for easier navigation and copyedits for grammar, phrasing, clarity, tone and correctness.

To note, the diffs look much messier and more extensive than they actually are, due to the reorganization and, particularly, the rather antiquated hard-wrapping (rather than, e.g., sentence wrapping) that leads to small changes cascading the appearing to "change" an entire paragraph.

Note to committers: Once this is ready, a rebase-merge would be appreciated, to preserve the carefully arranged commit history organizing the changes into digestible, logical units, while still keeping a linear git history, but I understand if repo policy prevents that.

@pombredanne , if you see this, would you prefer if I added myself to the authors list, so I can continue to help champion, refine and implement this PEP? We all haven't heard from you about it in many months, so I'm happy to take care of your baby for you!

Pinging involved parties (also posted on the thread, ofc): @pombredanne @pfmoore @ofek @hroncok @kpfleming @di @pradyunsg @brettcannon @uranusjr

High-level summary of changes:

All the requested/needed changes have been implemented, including:

  • Rewrite to reflect adoption of a standalone License-Expression field instead of the License field, as agreed here
  • Add detailed guidance for deprecating the legacy license metadata, as discussed here
  • Add requirements for PyPI validation of new fields, per consensus here
  • Add full coverage of PEP 621 metadata, as requested by @pf_moore
  • Add open issue discussed here regarding filling legacy license field
  • Restructure PEP as standalone rather than spec update per @pf_moore
  • Update and expand other sections to reflect the changes above
  • Implement basic formatting cleanup and fixes mentioned above

In addition, I've addressed the following issues I found during the process:

  • Add many additional missing ambiguous license classifiers and further guidance on handling them
  • Update to reflect tool, spec, link, doc and best practice changes since its drafting two years ago
  • Add minor clarifications to document the license files in the sdist, wheel and install project specs
  • Specify that license_files should match the subdir layout of the source as requested on wheel and Setuptools issues
  • Specify proposed license_file subdirectory as suggested on wheel and setuptools issues
  • Move the examples and other non-normative content out of the specification
  • Add, fix and improve links, references and more
  • Copyedit PEP for grammar, phrasing, tone, clarity and correctness issues

Finally, I've added some additional non-normative content that should be useful here:

  • Add terminology section, and use terms consistently
  • Describe in detail the rationale for the implemented structure and the alternative ideas proposed here and elsewhere
  • Add User Scenarios section with guidance/FAQs for ordinary package authors
  • Add/expand both simplified and detailed end to end examples
  • Expand several other sections with additional detail and clarifications

Full detailed chronological description of changes:

  • Restructure PEP to reflect form used in PEP 643, framing it as a standalone specification rather than a canonical incremental update of the actual core metadata spec itself, as proposed on Discuss thread and confirmed by @pfmoore

    • Update PEP title and metadata
    • Remove/replace references to the past and proposed versions of the metadata spec
    • Revise relevant sections (Abstract, Motivation, Rationale, Specification, References, etc) accordingly
    • Add a mention of this being for Core Metadata 2.3
  • Implement basic formatting cleanup and fixes

    • Conform to consistent Title Case and names for headings
    • Use ordered vs. unordered lists appropriately
    • Capitalize words after list items
    • Fix other minor formatting, spelling, capitalization and syntax issues
  • Rewrite PEP to reflect adoption of a standalone License-Expression field instead of the License field, as agreed on the Discuss thread

    • Rewrite the Abstract, Goals, Non-Goals Motivation, Rationale, and Backwards Compatibility sections to state that the same field is not being re-used, and explain why adding a new one was chosen
    • Revise the Specification sections and the How to Teach This and License Expression Example sections to reflect the various changes in field use and tool recommendations
    • Update references to the License field to refer to the License-Expression field, where appropriate
    • Replace the item describing the design and rationale of adding a new License-Expression field with that of re-using the License field in the Rejected ideas section
    • Rewrite and expand guidance on tool handling and conversion of legacy license field and classifiers, and include in a new dedicated section
    • Add rejected ideas for deprecation, alternatives and other things discussed earlier and later on discourse
  • Expand and update the PEP to reflect tool, spec and documentation changes since its initial drafting, fix link rot and follow current best practices

    • Rewrite the the "License Files in wheels and setuptools" section to reflect license_files being supported by both tools and both wheels and sdists and Include the additional license file names I added in Add a few additional common patterns for legal and attribution files pypa/wheel#251
    • Mention License-File support in package metadata added in Add License-File field to package metadata pypa/setuptools#2645 based on this PEP
    • Revise and expand PyPA sample project section to include PyPA guide and tutorial and reflect updates
    • Update the in-depth examples to be accurate for modern setuptools and use best-practice license names
    • Fix broken, redirected, outdated, non-HTTPS and non-perma links
    • Update License Core Metadata section in Appendix 2 to latest
    • Update SPDX license list version to current 3.15 from 3.10
  • Add changes to PEP 621 metadata, as suggested by @pfmoore , restructure accordingly and greatly expand examples

    • Add expression and files, deprecate license and file, and describe tool recommendations and examples thereof
    • Specify dynamic field handling
    • Reorganize sections and make headings shorter
    • Greatly expand examples and move out of core specification
    • Explicitly mention that tools may back-fill the License field
  • Further specify and clarify license-file handling, in PEP 621 and core metadata

    • When missing
    • Requirement to include
    • Path separators
    • Glob patterns
    • Default value
    • Tool recommendations
    • Clarify .dist-info vs .egg-info / sdists vs wheels
  • Add many additional missing ambiguous license classifiers and further tool guidance on handling them

  • Specify license-expression and license-files as separate PEP 621 fields, greatly refine dynamic field handling and add many rejected ideas

    • Update and reorganize PEP 621 keys specification and examples
    • Add and update dynamic handling for keys
    • Describe previous approach under Rejected Ideas
    • Update backward compat section to include PEP 621
    • Describe previous approach under Rejected Ideas, mentioning simplifies spec, explicitness, consistency and and dynamic, and avoiding bikeshedding, and update other
    • Describe other rejected syntax and semantics for the keys and values
  • Specify that license_files should match the subdir layout of the source as suggested on issues

    • Update specification accordingly (source, sdist, wheel and installed)
    • Update examples to include output license file paths
    • Describe previous status quo approach in rejected ideas
  • Add proposed license_file subdirectory as suggested on issues

    • Update specification and examples
    • Move license file subdir from open issues to rejected ideas
    • Rejected idea(s) for storing license files under data or new sys.prefix
    • Add section explicitly describing updates to the project format/distribution specs
  • Add requirements for PyPI validation of new fields, per consensus on Discourse

    • Update spec to include
    • Add rejected idea for not doing so
  • Add additional open issues and rejected ideas from Discourse/issues

    • Open Issue: What to do about backfilling? Require? Or drop as unnecessary?
    • Rejected Idea: license expressions on a per distribution basis
  • Add User Scenarios section for ordinary package authors

    • Private package
    • I don't care about licenses
    • Creating a new package
    • Existing package
    • Packages under multiple licenses
  • Update general sections to reflect changes

    • Abstract
    • Goals/Non-Goals
    • Rationale: Simply keep extending PyPI classifiers
    • Backward compat
    • Other sections
  • Add definitions section (and use consistently): https://packaging.python.org/glossary/

    • Packaging Tool (Tool), Build tool, Publishing Tool
    • Source metadata, Output/Built/Distribution metadata (not Binary)
    • License Expression, License Classifier, License identifier
    • Field, Key (Subkey, Table)
    • Package / Distribution / Artifact
    • Other correctness and consistency cleanup
  • Add, fix and improve links, references and more

    • Use inline links instead of footnote references, per issue Discourage using References? #2130
    • Add internal and external links where called for by the text
    • Fix existing links and syntax to be more clear, direct and consistent
    • Make a number of related copyedits, improvements and fixes in the text
  • Copyedit PEP for minor grammar, phrasing and style issues and proofread for typos and errors

pep-0639.rst Outdated Show resolved Hide resolved
Comment on lines +557 to +565
This PEP `adds <Add license-expression key_>`_ the ``license-expression`` key,
`adds <Add license-files key_>`_ the ``license-files`` key and
`deprecates <Deprecate license key_>`_ the ``license`` key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 621 specifically left room in the license key, so we don’t need to deprecate it. We only need to deprecate license = {text = ...} and license = {file = ...}, and define meaning to license = "MIT". No new [project] keys are needed.

To allow multiple licenses, we add

license = { names = ... }

Also, I think we should always glob to avoid confusion. So we define

license = { files = ... }

where each string is a glob pattern, and if someone really really need stars in the path (extremely unlikely), they can escape them manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 621 specifically left room in the license key, so we don’t need to deprecate it. We only need to deprecate license = {text = ...} and license = {file = ...}, and define meaning to license = "MIT". No new [project] keys are needed.

This was exactly the original approach I took here (see 4871854 ), later revised to split the license-files out into a top-level key while keeping the License-Expression field mapped to the license key, and further revised to split them both (see 88cba0c ). In practice, when considered more carefully, the first, and to a somewhat lesser extent the second approach provided to be distinctly suboptimal for a number of reasons, and quite possibly unworkable (due to the interaction with the dynamic key, depending on exactly how one interprets a rather imprecisely specified passage in PEP 621), while the only downside of the present approach was 1-2 extra top-level keys.

I spent a substantial amount of time carefully considering the alternatives here, and thoroughly address both of these in great detail in the Rejected Ideas section; see the Add expression and files subkeys to table and Define license expression as string value items under the Source metadata license key subsection. If you still have articulatable concerns with this approach, or can find ways around the various issues I've cited with the previous one, it might be better to discuss this on the Discourse thread, where we can all come to a clear consensus before I spend several more hours of my free time changing everything back (and us all having to bikeshed over exactly what to change it back to, heh).

To allow multiple licenses, we add license = { names = ... }

Perhaps I'm grossly misunderstanding your suggestion, but this proposal appears to contradict the underlying goal and basic objective of this PEP—adding SPDX license expressions to Python package metadata—which has been true its conception, has not been (at least as far as I could find) brought up throughout the course of the years-long relevant discussions, and is intrinsically incompatible with the core metadata field License-Expression that it is supposed to map to. Could you explain a little more about what you're suggesting here? Given it seems to be a fairly substantial shift in what this PEP has been proposing to do from the start, and which seemed to have a fairly strong consensus behind it, if you're really sure about it, you might want to bring it up on Discourse first.

Also, I think we should always glob to avoid confusion.

I think I'm the one that's a little confused, heh—could you explain a little more how users having to implicitly know that license.files is a glob is less potentially confusing than them explicitly specifying license-files.glob (or license-files.paths)? Furthermore, per the spec, for the latter tools will unambiguously know if the user mistakenly specify a glob instead of a literal path under the paths key, because it won't match an actual file.

In any case, like the previous, this was the approach I initially implemented here but after a substantial amount of careful thought and experimentation, this ran into significant conceptual and practical issues with, for a few extra characters, the more explicit alternative neatly avoids. I address this issue at length under the Only accept glob patterns and Infer whether paths or globs items in the Source metadata license-files key subsection of the Rejected Ideas section.

if someone really really need stars in the path (extremely unlikely), they can escape them manually.

Actually, this is not the main motivation for this choice, or even one of the primary drivers. The Only accept glob patterns and Infer whether paths or globs items in the Source metadata license-files key subsection of the Rejected Ideas section describes the rationale in detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm grossly misunderstanding your suggestion, but this proposal appears to contradict the underlying goal and basic objective of this PEP—adding SPDX license expressions to Python package metadata—which has been true its conception, has not been (at least as far as I could find) brought up throughout the course of the years-long relevant discussions, and is intrinsically incompatible with the core metadata field License-Expression that it is supposed to map to.

No, you’re right, I was misunderstanding how License-Expression works (I thought it could be specified multiple times to automatically mean “pick one of these”).

I think I'm the one that's a little confused, heh—could you explain a little more how users having to implicitly know that license.files is a glob is less potentially confusing than them explicitly specifying license-files.glob (or license-files.paths)?

It’s not, but there’s no harm for that :) It is extremely uncommon for license files to be put in paths that resemble a glob pattern, so the user does not actually care if this triggers globbing when they only want to specify one file. And if they want globbing, now they have only one key to remember instead of two.


Perhaps this would be a better approach?

  1. Add license-files to do glob (no nested keys).
  2. Use license = "<spdx-expression>" and deprecate nested keys.

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you’re right,

No worries, sorry for the confusion. Seems like the mixup might have been over a license identifier vs. a license expression; I do define these in the Terminology section, but is there anything in the PEP that I should clarify here?

It is extremely uncommon for license files to be put in paths that resemble a glob pattern, so the user does not actually care if this triggers globbing when they only want to specify one file.

Indeed; however, as discussed this isn't the (or even a) primary reason for this; sorry for not linking it when I referred to it previously, but I discuss the rationale in detail in the Only accept glob patterns and Infer whether paths or globs items in the Rejected Ideas, and see next item for a summary.

Add license-files to do glob (no nested keys).

Actually, as mentioned, this was also my initial approach, as "flat is better than nested", but it ended up conflicting with several of the behaviors specified so that "errors should never pass silently", especially if "in the face of ambiguity we must resist the temptation to guess", and in the end, I decided "explicit is better than implicit" :)

Use license = "<spdx-expression>" and deprecate nested keys.

In fact, following that which you previously proposed, this was my second approach here too :) However, upon implementing it, it still ran into less serious versions of some of the same issues as the previous, including mapping two distinct metadata types to the same key, inability to specify which was marked as dynamic, and creating additional inconsistency between core metadata, PEP 621 and setup.cfg/other tool formats. On top of that, it creates potential for serious user confusion and ambiguity, since it means license maps to completely syntactically and semantically different fields depending on the config format, and is much less explicit to authors, readers and consumers that it is intended to be a SPDX identifier/expression. See Define license expression as string value in the rejected ideas for more details.

pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Show resolved Hide resolved
pep-0639.rst Show resolved Hide resolved
pep-0639.rst Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
we similarly reject this here.


Must be marked dynamic to back-fill
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't PEP 621 mandate that the field be marked as dynamic to be backfilled?

Only when a field is marked as dynamic may a tool provide a "new" value.

Or am I misunderstanding what you're saying here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, depending on how strictly it is interpreted, but I address this explicitly in some detail in this very section. Given it is simply using the verbatim value of another specific, mutually-exclusive key as defined by this specification, such that any tool reading the pyproject.toml can "trivially, deterministically and unambiguously determine this using only the static data in the pyproject.toml file itself," it meets the spirit and intent, if (arguably) not the exact letter of PEP 621.

While I'd normally argue for explicitness, marking the field as dynamic has serious practical danger, as it means that any tool (including those not yet unaware of this PEP) could fill it with anything it wanted, directly violating the specification in this PEP, which is why the specification here explicitly prohibits marking license as dynamic if license-expression is also defined.

Also, as a side note, requiring it to be dynamic is unavoidably mutually exclusive with @uranusjr and @ofek 's proposal to move license-expression back under the license key, since it would be impossible to mark one one as dynamic without the other, and thus prohibited to define a license expression value there in the first place.

In any case, as this PEP updates the specification originally defined in PEP 621, it does not irrevocably constrain this PEP (in fact, for license-files, I also had to make what was arguably another mild exception in order to maintain safe, sane and backward-compatible default behavior, as also discussed here), and this section of the canonical project metadata spec will be updated accordingly to clarify these edge cases.

That said, I'm not completely wed to the necessity of allowing back-filling in the first place, as I haven't yet seen a 100% compelling rationale either way other than general backward compat and people on Discourse asking for it.

Copy link
Contributor

@cdce8p cdce8p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CAM-Gerlach! This is quite a comprehensive rewrite. I hope with that we're able to get this PR to the finish line. Only my opinion but I would say it's fine if you add yourself as coauthor.

pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Show resolved Hide resolved
Comment on lines +426 to +429
When used in the ``License-Expression`` field and as a specialization of
the SPDX license expression definition, a license expression can use the
following license identifiers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from LicenseRef-Proprietary there is no real way to specific a custom (proprietary) license type. That isn't an issue for most public packages, but since Python is also used in corporate environments with custom package indexes, I could imaging they might want some more flexibility. Especially as we deprecate the license field.

Does it make sense to add something like LicenseRef-Custom:<some custom license name>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. SPDX already specifies the use of LicenseRef-<CUSTOM-LICENSE-NAME>, which is what we are taking advantage of here, so if we did allow other custom identifiers, they would presumably want to follow that pattern rather than making up something bespoke. For niche cases that need very specific, proprietary custom licenses, making up an arbitrary bespoke license identifier is not terribly useful, I'd think, since people and tools won't know recognize it unless they are already aware of it anyway, and would need to reference the License-Files anyway for the actual terms, which License Ref-Proprietary makes explicit. If they do have their own custom private infra, that can still allow it, and they can also still use it in their individual files and in other standardized SPDX interchange formats. I'm just not sure it adds value allowing arbitrary license IDs for all Python packages given the room it creates for ambiguity and mistakes relative to the seemingly small benefit for such specific cases.

We could add an explicit LicenseRef-Custom for theoretical cases where a license is FSF- or OSI-approved and thus not technically LicenseRef-Proprietary but doesn't yet have a SPDX identifier, but as these sources are generally kept closely in sync (I've been a small part of that), this would only be true for a tiny number of edge cases packages, as far as I'm aware. There could also hypothetically be cases where the author has a strong good faith belief that their bespoke license is not proprietary, but it hasn't (yet) been reviewed by FSF or OSI...but not sure if that does more harm than good to allow specifying that, given the increased ambiguity. And, in any case, in terms of actively maintained modern packages it seems rather unlikely this is more than a handful of cases. As such, my inclination would be to stick with @pombredanne 's original language here (which this PR essentially leaves unchanged) unless there is a clear consensus otherwise.

Copy link
Contributor

@cdce8p cdce8p Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't though of LicenseRef-<CUSTOM-LICENSE-NAME>. You are right that LicenseRef-Proprietary should cover it. I'm just wondering though if we should at least allow the former for build tools. (Maybe it would even make sense if only PyPI would reject them.)

When processing the ``License-Expression`` field to determine if it contains
 a valid license expression, build and publishing tools:
 - SHOULD halt execution and raise an error if:
   - The field does not contain a valid license expression
   - One or more license identifiers are not valid (as defined above)   # <-- !

--
I don't plan on using this personally, it's just something I though about that could be an issue if the license field is eventually removed. This could lower the barrier in that area.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's worth thinking about. Currently, given its a SHOULD, custom build and publishing tools and those oriented toward this sort of use case are allowed to not raise an error if the license identifier is unknown, if they have reason to do so (as they would, if there was a need for this). We could carve out an exception "...unless they begin with LicenseRef-, in which case build tools SHOULD issue a warning instead.".

However, that does complicate the spec a bit, and means that build tools won't catch misspellings of LicenseRef-Proprietary or LicenseRef-Public-Domain, or users who prepend LicenseRef to their SPDX identifiers (as some mistakenly do), or users who go for a custom license without carefully considering the implications...unless users see the warning, which at present could be quite buried within the very verbose build output and be very difficult to spot, even for Python core devs, a point made on the Discourse thread. At least anecdotally, this seems more likely than the number of authors who genuinely need to make up their own bespoke license identifiers and use them in the packaging metadata with open source tools.

Since custom and enterprise-oriented build tooling is already free to allow such, which seems likely to be the case if custom internal-only IDs are actually going to be useful for other tools to consume, and the other ways that custom IDs can still be used (SPDX manifest, individual file SPDX tags, REUSE project standard, DEP5 standard, etc, plus actual license file detection, which this PEP makes much easier), on top of the fact that such organizations would have to be using the License field for that very specific purpose already for it to be a regression, I'm still not sure its necessary in this PEP.

To note, when NPM implemented SPDX identifiers, they initially had a (very poorly named) UNLICENSED identifier and one that was See license in FILE_PATH (akin to our License-File),, otherwise it didn't allow any LicenseRef custom ones at all, including LicenseRef-Proprietary and LicenseRef-Public-Domain, nor allow a package to not have a license expression unless private: true was manually set. They later removed even that, and only then did users complain in droves, after which they later re-implemented UNLICENSED following a lot of complaints (see discussion in npm/npm#8918 ). So we'd still be offering clearer and more diverse tags than they do, and there didn't appear to be demand for custom LicenseRefs as opposed to what this PEP has, LicenseRef-Proprietary.

My feeling is given the deprecation process will presumably last several years, if there are actual necessary use cases where specifying a custom license more specific than License Ref-Proprietary in the License-Expression field packaging metadata, using open source tools, for proprietary projects, currently using license for this, and for which other options aren't sufficient, we'll hear about it at some point before license is actually removed, and we can always loosen the spec later but can't tighten it without breaking backward compat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some good points you mentioned. Especially "we can change it later if necessary" is quite convincing.

Thanks for taking the time to discuss these issues. I appreciate that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we could; the guidance as it stands now is based on yours, and I didn't really touch that section. I just worry about catching unintentional mispellings of the two canonically defined ones, or users thinking they have to prepend LicenseRef in front of license identifiers, as there unfortunately still seems to be some confusion over. That would also create the need for more cross-field validation, and if license-files.globs was used or the default was relied upon, there would be no guarantee that the package would actually contain the named license files unless tools checked the source tree/sdist contents. We'd also have to consider how we'd want PyPI to handle that. I'm not sure the relatively niche use cases where something more specific than LicenseRef-Proprietary was needed outweighs that, but I'd ultimately trust your judgement, assuming others are okay with requiring and cross-validating that at least one License-File is matched (and additionally, the package may just have a AUTHORS.md which it would match, which would not really be a license, so its not necessarily reliable even then).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne As I understand it, the main reason against allowing LicenseRef-xyz is that it will make implementation and validation of License-Expressions much more difficult.

Maybe it's worth to consider a middle ground here. I think we agree that theoretically there might be instances where LicenseRef-Proprietary, LicenseRef-Public-Domain, and the other valid SPDX identifiers aren't enough. What do you guys think about allowing LicenseRef-Custom for these cases?

I would not recommend to enforce that a License-File must be provided, though. It would just get too complicated. For instance, what do we do if someone chooses a different name? Instead, anyone who encounters LicenseRef-Custom should assume it's equivalent to LicenseRef-Proprietary unless stated otherwise in a License-File. That's consistent with how it would have been handled anyway if the License-Expression was missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually initially attracted to the LicenseRef-Custom approach as well, having initially suggested something like that. However, I'm not sure how much it really adds over LicenseRef-Proprietary, since AFAIK (correct me if I'm wrong, @pombredanne ) all of the (FSF-recognized) Free and (OSI-recognized) Open Source licenses (i.e., non-proprietary) have SPDX identifiers. Thus, other licenses would thus implicitly qualify as both non-free and non-open, thus are proprietary, aside from the potential for extremely niche cases of licenses that appear to meet the OSS definition and/or the Free Software definition, but are not yet officially recognized by those organizations, and don't otherwise already have a SPDX identifier. So I'm not really sure if there's a clear use case for LicenseRef-Custom that wouldn't be covered by a license expression including LicenseRef-Proprietary, LicenseRef-Public-Domain and the allowed license identifiers, whereas while allowing custom LicenseRefs comes with more downsides, it also has more practical benefits. If there's interest, though, we could think about allowing them in build tools, warning in publishing tools and not allowing on PyPI?

I also thought about proposing distinguishing All Rights Reserved from more permissive but still proprietary licensing, but if a package was really explicitly licensed All Rights Reserved (which is essentially equivalent to not having a license at all), technically speaking outside of any rights granted by PyPI's ToS, no one would be allowed to download or use it, so there doesn't seem to be a clear motivation to package and share it in the first place. But perhaps LicenseRef-All-Rights-Reserved could be allowed by packaging tools, but not by publishing tools/PyPI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne @cdce8p Regardless of what we decide, I think the outcome of our fruitful and insightful discussion so far should be reflected in the PEP. Should I add it as an open issue, or should I document the current approach more completely in the "rejected ideas" section, leaving the door open for further revision on that point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, for now, I added an Open Issue that summarized the points made thus far in our thread above, so they can be included in the PEP and can serve as a basis for further discussion on the Discourse thread, as needed.

@pombredanne
Copy link
Contributor

@CAM-Gerlach Thank you ++ for reviving this! I apologize for going awol and I am grateful for your effort.

if you see this, would you prefer if I added myself to the authors list, so I can continue to help champion, refine and implement this PEP? We all haven't heard from you about it in many months, so I'm happy to take care of your baby for you!

Please do. You deserve it! Let me provide some feedback too in this PR too.

@CAM-Gerlach
Copy link
Member Author

Thanks so much for your awfully kind words, @pombredanne , and really glad to have you here! I went ahead and added myself, and looking forward to your feedback. Cheers!

@ofek
Copy link
Contributor

ofek commented Dec 16, 2021

@CAM-Gerlach I finished support (release is still not public)

Example:

[build-system]
requires = ['hatchling']
build-backend = 'hatchling.build'

[project]
name = 'foo'
license-expression = 'MIT'
dynamic = ['version']

[tool.hatch.version]
path = 'foo/__about__.py'

[tool.hatch.build]
packages = ['foo']

[tool.hatch.build.targets.sdist]
include = ['/tests']

[tool.hatch.build.targets.wheel]
core-metadata-version = '2.3'

@CAM-Gerlach
Copy link
Member Author

Great news @ofek , thanks! Does that include support for license-files too, or just license-expression for now?

@ofek
Copy link
Contributor

ofek commented Dec 16, 2021

Both


Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console (Text Based)
A new ``license-expression`` key is added to the ``project`` table, which has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really strongly prefer that we use the existing license key instead.

The plain text form for the license key was intentionally kept unused for this specific usecase. From https://python.github.io/peps/pep-0621/#license:

A practical string value for the license key has been purposefully left out to allow for a future PEP to specify support for SPDX [6] expressions (the same logic applies to any sort of “type” field specifying what license the file or text represents).

There's absolutely no reason to introduce a new key here; and certainly no reason to mirror key names under project with what's in the METADATA file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - this is what the license key was intended for, and we should use it as defined.

But this discussion should happen on Discourse, as noted below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plain text form for the license key was intentionally kept unused for this specific usecase. From https://python.github.io/peps/pep-0621/#license:

Just to note once again, I'm well aware of this and it was my initial preference too, and what I initially implemented in the PEP, as I discuss in detail under the Rejected Ideas section.

There's absolutely no reason to introduce a new key here; and certainly no reason to mirror key names under project with what's in the METADATA file.

Actually, there are, as I took the time to describe in detail under the Rejected Ideas section, and as recently discussed at length on this PEP's Discourse thread (the most important one being the hard conflict with the use of the dynamic key, and reader and author confusion and ambiguity with other tool formats). I revised it now to address the points raised more explicitly and focus more on those core issues while minimizing the rest.

I'm happy to discuss this further and hear your ideas for other ways around these issues on the Discourse thread.

I agree - this is what the license key was intended for, and we should use it as defined.

While reserving the string value for potential not-yet-specified future use is normative, the multiple suggestions for how more precise license metadata might be implemented does not appear to be, being contained in an explanatory note, specified as an envisioned future use, and with multiple options specified; and it does not consider the practical implementation complexities and interactions with the dynamic field that I've spent much time analyzing in detail here, and which we can discuss further on the thread.

But this discussion should happen on Discourse, as noted below.

Agreed; for the record, I tried to politely steer previous substantive discussion to the Discourse thread the first time this was brought up, but didn't entirely succeed, sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, here's the thread: https://discuss.python.org/t/12622

@pradyunsg
Copy link
Member

pradyunsg commented Dec 17, 2021

As an aside, can we merge this PR and move the discussion of the current iteration of the PEP to a new discuss.python.org thread? It's significantly more work to try and keep up with discussion happening on a GitHub PR for reviewing a design document (IMO), and it would be generally nicer to move the discussion to the Packaging category on d.p.o instead; since that has the right audience for reviewing the PEP's content + proposal.

@pfmoore
Copy link
Member

pfmoore commented Dec 17, 2021

As an aside, can we merge this PR and move the discussion of the current iteration of the PEP to a new discuss.python.org thread?

+1. On a personal note, I am not following the discussions here, and once the PR is merged, I will not review them. So when it comes to approving the PEP, any substantial discussions here will be lost. That will result in extra work for everyone, if there's a point that needs clarification.

On a procedural note, PyPA processes and PEP 1 state that discussions should happen on Discourse (technically, they say distutils-sig, but we're in the process of fixing that...). Only formatting issues should be discussed on the tracker here.

@pradyunsg
Copy link
Member

FWIW, @pfmoore I think you have the ability to approve this PR; and poke the PEP editors to merge this (or, maybe you can merge this yourself too. :P)

@pfmoore
Copy link
Member

pfmoore commented Dec 17, 2021

I do have the ability to approve/merge, yes. But TBH the diff is too big for me to reasonably review, and as I say, I'm not following the discussions here. I don't know what view the PEP editors would have (but unless someone shouts, I'll assume they don't have an issue with it).

I'm happy to merge unreviewed as (in effect) a rewrite of the PEP, with a new discussion of the rewrite to then follow on Discourse. Is that OK? @CAM-Gerlach are you happy if I take that approach? I'm assuming @pombredanne would be OK with that approach, based on his review here.

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read over the whole PR and have a few minor suggestions, but otherwise I'd also suggest merging and moving substantive discussion back to Discourse.

pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated
provided ``License`` or ``Classifier`` information if they are able to do so
unambiguously. For instance, if a package only has this license classifier::
**Note**: To avoid ambiguity, confusion and (per PEP 20, the Zen of Python)
"more than one (obvious) way to do it", a flat array of strings value for the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"flat array of strings value" sounds a little awkward, but I see what you mean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is not only awkward but potentially confusing; I really struggled with wording this in a way that was both precise, clear and concise. I thought about it some more and rephrased it as "allowing a flat array of strings as the value for the license-files key has been left out for now", which isn't perfect but somewhat better, I hope.

pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
pep-0639.rst Outdated Show resolved Hide resolved
@CAM-Gerlach
Copy link
Member Author

@pfmoore @pradyunsg Thanks and agreed; for the record I tried to steer earlier reviewers toward the Discourse thread for discussion substantive issues like that, but unfortunately was not entirely successful. I'm happy to see this merged for now and will create a new Discourse thread where we can discuss this further.

I also revised the relevant Rejected Idea for the license key string value to more specifically summarize and address the points made here and on the thread and the most important objective problems involved with the approach, those surrounding the dynamic handling; emphasize the potential confusion and ambiguity between tool formats rather than core metadata; and minimizing the other less important concerns regarding conceptual consistency, key-field correspondence and spec/tool complexity. This will hopefully provide a more focused and specific basis for further discussion on the Discourse thread.

@pombredanne @cdce8p For now, I added an Open Issue that summarized the points made thus far in our thread above, so they can be included in the PEP and can serve as a basis for further discussion on the Discourse thread, as needed.

@JelleZijlstra Thanks for your review and catching those issues! Unfortunately, for strange reason, GitHub is showing your suggestions as Outdated and won't let me apply them, so I've applied and fixuped them in manually, plus a few other issues I found doing so. As a quick aside, per the above "note to committers",

Once this is ready, a rebase-merge (which unlike a regular merge would keep a linear git history, since it looks like that's repo policy) would be appreciated, to preserve the carefully arranged commit history organizing the changes into digestible, logical units, and keeping a record of the previously-implemented alternatives that have been discussed at length here and on the thread, but I understand if repo policy requires a squash merge for non-committer contributions.

@JelleZijlstra
Copy link
Member

Screen Shot 2021-12-17 at 2 52 37 PM

Unfortunately rebase and merge is not supported for this repo. I probably don't have the permissions to change that setting, and I'm not sure we should—this PR is basically just a complete rewrite of the PEP, and I'd expect future readers to just read the result in its entirety.

I'm happy to merge this (or let someone else do it) whenever you feel it's ready.

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Dec 17, 2021

I created the new thread, left a comment on the old one linking to it, and updated the discussions-to link to point to it in the PEP, so this should be ready to go pending @JelleZijlstra 's review of my final textual additions summarizing the discussion points here.

I'm happy to merge this (or let someone else do it) whenever you feel it's ready.

Got it, understood @JelleZijlstra —thanks for checking! Its ready, minus your review of the open issue I added and the rejected idea I modified to summarize the discussion here, as a more focused starting point for the further Discourse discussion (see the last commit). Once that's done, feel free to squash merge whenever you're ready, then. Thanks again!

@JelleZijlstra
Copy link
Member

Also, I just want to note that with this PR, PEP 639 (3019 lines) would be the longest PEP by some margin, ahead of PEP 484 (typing; 2511 lines), PEP 622 (pattern matching; 2345), PEP 3156 (asyncio; 2117), and PEP 642 (pattern matching again; 2116), all of which introduced complex new concepts to the language. It's good to be thorough about all the options, but I'm worried that this PEP's sheer size will make it difficult to assess. Anyway, that's mostly up to the packaging community.

pep-0639.rst Outdated
build tooling and the ``License-Expression`` metadata field is still
desirable to use for this purpose.

This has the downsides, however, of not catching mispellings of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This has the downsides, however, of not catching mispellings of the
This has the downsides, however, of not catching misspellings of the

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the irony 😆 Funny enough, when I was doing a quick proofread, I actually caught this same word mispelled [sic] somewhere else, and thought about how embarrassing it would be if that had made it through...heh. Fixed, thanks.

pep-0639.rst Outdated

On the other hand, as SPDX aims to (and generally does) encompass all
FSF-recognized "Free" and OSI-approved "Open Source" licenses,
and and those sources are kept closely in sync and are now relatively stable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and and those sources are kept closely in sync and are now relatively stable,
and those sources are kept closely in sync and are now relatively stable,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, another one of those...I really should have a regex for that. Fixed, thanks.

@pfmoore
Copy link
Member

pfmoore commented Dec 17, 2021

It's good to be thorough about all the options, but I'm worried that this PEP's sheer size will make it difficult to assess. Anyway, that's mostly up to the packaging community.

I think this is a reasonable concern. I know I haven't been reading the new version at least in part because of the sheer size (of the diffs, but the same applies to the doc itself).

I'd strongly suggest trying to trim down the text.

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Dec 18, 2021

The revised version of the PEP is now live on the Python website. Thanks so much to everyone here who contributed feedback, fixes, ideas and support.

Also, I just want to note that with this PR, PEP 639 (3019 lines) would be the longest PEP by some margin

@JelleZijlstra I only count 2726 total, only just ahead of PEP 484 (2511), while my other numbers all match yours.

Regardless, you have a fair point. Doing a brief inventory, in order, with conceptually similar categories grouped:

  • 30 lines (1%) Header and footer
  • 50 lines (2%) Abstract
  • 200 lines (7%) Goals/non-goals, motivation and rationale
  • 150 lines (5%) Terminology section
  • 500 lines (20%) Specification
  • 250 lines (10%) User scenarios, backwards compat and how to teach
  • 25 lines (1%) Security and reference implementation
  • 1000 lines (35%) Rejected ideas and open issues
  • 100 lines (4%) References
  • 200 lines (7%) Appendix 1: Examples
  • 300 lines (10%) Appendix 2 and 3, surveying licensing in Python and other systems

In summary, only about 20% (500 lines) of those lines is the actual core specification, and the rest is reference material intended, variously, for general PEP readers and reviewers (goals/motivation/rationale, terminology, security), package authors (scenarios, examples, how to teach), tool authors (examples, backward compat, reference implementation), those considering alternate approaches (open issues, rejected ideas) and those curious to learn more about the existing license ecossytem (appendix 2 and 3).

To note, once #2155 is merged, the References section (perhaps the longest in any PEP) will no longer be rendered separately, so that knocks it down by 100 lines (including the non-printing header and footer). Also, the PEP extensively uses nearly 200 inline internal and external links for ease of navigation and reference, as well as other markup, making the rendered product easier to jump around in and navigate while increasing the source line count by perhaps another 100 lines. Also, in total, 600 lines (22%) is references or appendices rather than body text.

By far the biggest single element was the rejected ideas/open issues sections, where I document the approaches taken by previous iterations of this PEP and suggested by others, and summarize why they were accepted or rejected, as required by PEP 1. This section is intended only for reference, for readers who may have such ideas to consider (or be pointed to) before raising them again, as well as clearly documenting the rationale for the future, so only the relevant subsections need be read by those interested. I wanted to give each ideas its due and fairly summarize its merits and shortcomings, rather than just a cursory dismissal leaving the reader guessing or uncertain.

There have been a large number of such ideas over the long course of this PEP, and I had my doubts over whether including all them in sufficient detail was justified, due to the resulting length. Given many of them were brought up multiple times, including a number (especially those related to PEP 621 project source metadata, which is well over half the total length) on this PR, it would seem that at least a substantial number of them were. However, if this is seen as excessive, perhaps they can be significantly cut down if and when we find consensus on these topics, and those that others see as not really relevant/viable drastically reduced to a couple sentences or even eliminated.

Other than that, some options for addressing this, in order from least to most drastic:

  • Move the user scenarios, terminology, and perhaps other sections (the "transition plan" part of backward compat), to appendices, saving 300+ lines of body text (or around 15% of the current total)
  • Move the bulk of the rejected ideas there too, except for a summary of the most important ones, saving 35-40% and cutting the body text by well over 50%.
  • Elide Appendix 3, (≈200 lines), as its least directly relevant, or even Appendix 2 (≈100 lines), which would cut over 11% off the total all-up length...but I'd really hate to do this option, as @pombredanne spent a large amount of time and effort on it, along with the time I spent updating it (particularly the more relevant Appendix 2)
  • Once a rough consensus is reached on the open issues, create a new PEP superceding this one, to preserve the existing material for future reference, that eliminates Appendix 2, Appendix 3, and most of the rejected ideas, which combined with the references removal and other associated trimming, reduces the total length by around 50% (and the body text by >60%, counting the above)...but this would be fairly unusual practice.
  • Split it into multiple PEPs, one standards track PEP with the core material: goals/motivation/rationale, terminology, spec and compat/how to teach/security/reference plus a rejected issue summary (40%), and one informational PEP with the background material (60%), including the rejected ideas, user scenarios, examples and survey appendices. But this might be viewed as even more out of proportion relative to the PEP's importance, to have multiple related PEPs.

I'd strongly suggest trying to trim down the text.

@pfmoore Okay; do you have any suggestions for which sections you feel don't belong here, could be moved somewhere else, or have content cut out of them? Or any opinion on which option(s) above you'd prefer?

Personally, given all the time, thought, effort and analysis I've poured into trying to add helpful information for package authors, tool developers, spec readers and those with an interest in the subject matter, its naturally kinda hard to just throw a bunch of that away completely now, so I'd prefer the options that allow preserving as much as practical in one form or another (whether that's in the appendices, in a superseded version of the PEP, somewhere in the PyPA/packaging guide sites, etc). However, if it being too long and unweidely directly or indirectly results in the PEP being rejected, the negative consequences are obviously much greater and more far reaching, so I'll do whatever you think is best for that.

@pfmoore
Copy link
Member

pfmoore commented Dec 18, 2021

Okay; do you have any suggestions for which sections you feel don't belong here, could be moved somewhere else, or have content cut out of them? Or any opinion on which option(s) above you'd prefer?

One of the problems here is that I don't have the time to read this proposal as it stands. So it's hard for me to suggest how to shorten it, because I've not even been able to read and internalise what it's saying yet. But as a general point, will anyone read all of this before commenting? I spent some time this morning, and I feel like I barely got started. I don't have the free time to spend hours reading this proposal.

To me, the only things that need to be made clear are:

  1. We need a way for packages to specify licenses. That's more or less obvious, you can probably assume it.
  2. The current approach isn't good enough. Why? Because it's not structured (and hence not machine readable) and not standard. Also, a structured, standard approach is safer because there's no room for ambiguity.
  3. SPDX is a standard approach for structured licensing. That's good enough then. How do users find out how to express their license choice as SPDX? If it's not easy, that says that SPDX is a bad choice - we don't want an SPDX tutorial in Python's docs.
  4. We put a SPDX value in some field. What field?

And that's it. There's stuff I saw about shipping license files, but I was bogged down in too many words before I got into details there. I'd be hoping for something about this concise for that as well.

Personally, given all the time, thought, effort and analysis I've poured into trying to add helpful information for package authors, tool developers, spec readers and those with an interest in the subject matter, its naturally kinda hard to just throw a bunch of that away completely now

I understand how hard it is to shorten something you've worked on. What I write is always way too long, and I'm really bad at trimming it. But is it worth keeping all that writing if no-one ends up reading it, because they gave up after the first 2 sections?

To try to provide a bit more help, I picked on a few sections, jumped straight to them (so no wider context) and skimmed them (so I will have missed details). Here's my immediate impressions - but please understand that they are immediate impressions, and my comments are extremely abbreviated and shouldn't be taken to imply there's nothing of value, just that I couldn't spot it on a quick skim.

  • User Scenarios - Looks like all about teaching users how to choose a license. Not our issue.
  • Backward Compatibility - There's 2 new fields and one field is deprecated. But no duration for the deprecation, a different PEP will remove the old field. The rest seems like just explanatory text or justifications, but doesn't seem to say anything's incompatible, so I assume I don't need to change anything else.
  • Appendix 1 - The simple example looks good. The advanced example made my brain melt, and scared me off having anything to do with merging/combining licenses. I'm now scared of even looking at pip's licensing, so personally I'd ignore the issue and not use the new metadata because it looks way harder than just hoping what we have is OK. That's more of an "are we doing licensing correctly" problem than an issue with this PEP, but the net effect is not what you want here.
  • Appendix 2 - A survey of "what various bits of Python documentation say about licensing". Not sure of the benefit. Similar with Appendix 3, I really don't care how Haskell does licensing.
  • Rejected Ideas, Source metadata license key - I was looking here for "why a new key, why not just reuse license? I found "Define license expression as string value", but I don't understand the arguments at all. Specifically "it is not possible to separately mark the [project] keys corresponding to the License and License-Expression metadata fields as dynamic". Why would you want a dynamic license anyway? So as a justification of a rejected idea, it just raises more questions and prompts me to re-open the discussion1. That's not doing its job of documenting the reasons for a rejection to avoid going over old ground...

That's probably enough. I don't want to make this feel too much like a hatchet job on all the work you've done. The work is much appreciated and extremely valuable. It's just how you present the results of all the work that needs some tidying up, IMO.

Footnotes

  1. And I will re-open this over on Discourse, because I'm not at all keen on using anything other than license as the place to store license data...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.