Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of REUSE #99415

Merged
merged 5 commits into from
Aug 20, 2022
Merged

Conversation

pietroalbini
Copy link
Member

@pietroalbini pietroalbini commented Jul 18, 2022

This PR implements the first two steps of #99414 by:

  • Adding some scaffolding for REUSE. The .reuse/dep5 file now marks every file as the custom "TODO" license, which I'll remove in a future PR once Debian imports their metadata. The TODO license is needed so that reuse lint works.
  • Runs reuse lint in CI, in the mingw-check builder. REUSE currently has a bug when parsing some files in the LLVM source code. This means REUSE will fail when running it in source tarballs of rustc, and that bug prevents us from passing the --include-submodules flag in CI. I opened Find license identifiers in comments with ASCII art frames fsfe/reuse-tool#560 upstream with a fix, and as soon as it's merged/released I planned to bump the pinned version to include the fix we need.

r? @Mark-Simulacrum

@rustbot rustbot added the T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. label Jul 18, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 18, 2022
@Mark-Simulacrum
Copy link
Member

Since REUSE is not in the Ubuntu 18.04 repositories

Should we just bump that builder to 22.04? mingw-check doesn't produce artifacts and to my knowledge we're not relying on it to test old Ubuntu compat in some other way either. This is hopefully not too hard to do.

@pietroalbini
Copy link
Member Author

Should we just bump that builder to 22.04? mingw-check doesn't produce artifacts and to my knowledge we're not relying on it to test old Ubuntu compat in some other way either. This is hopefully not too hard to do.

While REUSE is indeed in the Ubuntu 22.04 repositories (yay!), REUSE currently has a bug when parsing some files in the LLVM source code. This means REUSE will fail when running it in source tarballs of rustc, and that bug prevents us from passing the --include-submodules flag in CI. I opened fsfe/reuse-tool#560 upstream with a fix, and as soon as it's merged/released I planned to bump the pinned version to include the fix we need.

@Mark-Simulacrum
Copy link
Member

Hm, okay. I think we should still track using a distro version -- it's much easier for local experimentation, at least, if I don't need to mess with pip :) Let's also update the PR description with that justification, which seems much better than "not in 18.04".

I think as-is this PR doesn't really merit merging, since we're basically not adding any testing (that I can see) and add a presumably quickly unnecessary TODO license which is kinda weird (and with our luck, will break someone downstream who is validating that we're MIT + APACHE 2.0).

Debian imports their metadata.

Maybe this is a key point I don't yet follow -- is this something you expect a Debian developer to do? Naively, I'd expect us to just be copying a few files here.

Files: *

I'm also wondering how this will work down the line. I think there's definitely value regardless, but it seems like if we can blanket-override license metadata present in files (e.g., headers) it becomes much easier to accidentally declare an overly broad generalization. Does reuse lint yell at us if we have a file header that's incompatible with Files: + license declaration in the reuse configuration?

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 26, 2022
@pietroalbini pietroalbini marked this pull request as draft July 29, 2022 11:03
@pietroalbini
Copy link
Member Author

Won't have time to address feedback before I get back from vacation, marking it as a draft.

@pietroalbini pietroalbini marked this pull request as ready for review August 18, 2022 10:44
@pietroalbini
Copy link
Member Author

Hm, okay. I think we should still track using a distro version -- it's much easier for local experimentation, at least, if I don't need to mess with pip :) Let's also update the PR description with that justification, which seems much better than "not in 18.04".

Done.

Maybe this is a key point I don't yet follow -- is this something you expect a Debian developer to do? Naively, I'd expect us to just be copying a few files here.

Yeah, my expectation was to ping the Debian developer I've been in contact with (@sylvestre) once the PR was merged to let him upload their contents. I guess I can also do them myself in this PR if you prefer.

I'm also wondering how this will work down the line. I think there's definitely value regardless, but it seems like if we can blanket-override license metadata present in files (e.g., headers) it becomes much easier to accidentally declare an overly broad generalization. Does reuse lint yell at us if we have a file header that's incompatible with Files: + license declaration in the reuse configuration?

I did some digging and experimentation, and it turns out the current logic of what REUSE includes is undocumented and a bit weird. There are three sources of licensing information:

  • Comments in the file itself.
  • Standalone ${path}.license file.
  • dep5 file at the root of the repository.

The way REUSE currently includes them is (standalone OR comments) AND dep5, so if a standalone license file is present the comments in the file are not parsed, but the dep5 is included regardless. So, for a file covered both by a comment and dep5, there will be two separate licenses in the REUSE output.

There is an open issue about this that is blocked on defining the precedence, which is blocked on defining a new REUSE.yaml file replacing dep5, which is blocked on SPDX defining its format that REUSE will borrow. That's a long chain of blocked.

In any way, the consensus in fsfe/reuse-tool#246 seems to be to ignore the comments in the file if a dep5 file is present, not the other way around. Given all of this, it's going to be easier to define all our licensing metadata in the .reuse/dep5.

Within a dep5 file, later rules override earlier rules. This allows us to define wildcards at the top of the file and more specific rules at the bottom of the file, knowing that this way the specific rules override the wildcards.


Another problem I discovered in my issue tracker rabbit hole is that .reuse/dep5 is only read in the root of the git repository, and not in subdirectories. This means if subtrees start to adopt REUSE their bulk annotations won't be included. This also seems to be something REUSE only plans to address with REUSE.yaml, which is blocked by a bunch of other issues.

@pietroalbini pietroalbini added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 18, 2022
@Mark-Simulacrum
Copy link
Member

easier to define all our licensing metadata in dep5

To be clear, I agree and want to, but my worry was more about us silently overriding licenses on files without realizing (e.g., as you say from subtrees). It seems like we might be able to run with two different reuse configs (one basically blank) and lint that files don't set anything that conflicts with that? Not sure if it's possible given the precedence order you suggest is a consensus (which seems weird to me... local state should typically win IMO), but maybe that's something to follow up that chain on.

I think this is all likely to be an improvement regardless, not a blocking concern, but I'd love for tooling to better support us here. I guess in Rust code we'll keep our tidy lints banning file specific headers and make sure they include dep5 headers.

I'll review the patch itself in a bit, but can you say if there's reasons to wait to import the Debian declarations? Seems like doing them in this PR gives us a better picture.

@sylvestre
Copy link
Contributor

Yeah, my expectation was to ping the Debian developer I've been in contact with (@sylvestre) once the PR was merged to let him upload their contents. I guess I can also do them myself in this PR if you prefer.

As you wish :)

@pietroalbini
Copy link
Member Author

Not sure if it's possible given the precedence order you suggest is a consensus (which seems weird to me... local state should typically win IMO), but maybe that's something to follow up that chain on.

Reading the thread again, the reasoning for that choice was overriding the license defined in vendored files that can't be touched, if for example they include invalid syntax. The discussion then moved to fsfe/reuse-docs#70, when there are multiple alternatives being discussed.

I'll review the patch itself in a bit, but can you say if there's reasons to wait to import the Debian declarations? Seems like doing them in this PR gives us a better picture.

[---]

As you wish :)

Ok, I'll import the annotations in this PR either later today or tomorrow.

pietroalbini and others added 3 commits August 18, 2022 15:46
Co-Authored-By: Angus Lees <gus@inodes.org>
Co-Authored-By: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Co-Authored-By: Hiroaki Nakamura <hnakamur@gmail.com>
Co-Authored-By: Jordan Justen <jljusten@gmail.com>
Co-Authored-By: Luca Bruno <lucab@debian.org>
Co-Authored-By: Sylvestre Ledru <sylvestre@debian.org>
Co-Authored-By: Ximin Luo <infinity0@debian.org>
@pietroalbini
Copy link
Member Author

Imported the debian copyright file and removed all the vendor/ stuff from it. It's not complete nor fully accurate yet, but it's a start.

@mxmehl
Copy link

mxmehl commented Aug 19, 2022

I am impressed, @pietroalbini, how well you figured out the current chain of blockers of REUSE.yaml, which would solve the problems mentioned here as well as many others!

Indeed, spdx/spdx-spec#502 is basically the first thing we'd need to tackle. Please feel free to share your interest and opinion on this issue. Inside SPDX, a lot of workforce is currently used up by defining the whole format of SPDX 3.0, but it may be good to already consider the REUSE.yaml usecase.

@Mark-Simulacrum
Copy link
Member

OK, I'm going to go ahead and approve this PR; it seems like the right step on #99414. It's also a fairly easily reversible decision, and while I think there's a number of interesting questions to answer around that (e.g., how we unify this with tidy's license checking, etc.), this is still roughly moving us in the right direction.

I'll also note that I didn't review the exact associations mapped here; I think we'll want to iterate further and probably seek help from Foundation resources on understanding what we want licenses to be (and work to reconcile with what they are).

@bors r+ rollup=iffy

@bors
Copy link
Contributor

bors commented Aug 20, 2022

📌 Commit 00c8306 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 20, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 20, 2022
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#99415 (Initial implementation of REUSE)
 - rust-lang#99544 (Expose `Utf8Lossy` as `Utf8Chunks`)
 - rust-lang#100585 (Fix trailing space showing up in example)
 - rust-lang#100596 (Remove unnecessary stderr files)
 - rust-lang#100642 (Update fortanix-sgx-abi and export some useful SGX usercall traits)
 - rust-lang#100691 (Make `same_type_modulo_infer` a proper `TypeRelation`)
 - rust-lang#100693 (Add LLVM15-specific codegen test for `try`/`?`s that now optimize away)
 - rust-lang#100710 (Windows: Load synch functions together)
 - rust-lang#100807 (Add TaKO8Ki to translation-related mention groups)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 2be85b0 into rust-lang:master Aug 20, 2022
@rustbot rustbot added this to the 1.65.0 milestone Aug 20, 2022
@pietroalbini pietroalbini deleted the pa-reuse-initial branch August 24, 2022 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants