-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define an order of precedence + provide a means to override it (REUSE.yaml) #779
Comments
Just documenting this here as well. I think having a precedence implies there is some sort of legal precedence as well. This is a problem if a repo supplies licensing information in a funny way, i.e. adds information in one way and then in another way while keeping both. So I think it would be good to keep on aggregating and issue a warning whenever that happens because that means licensing information is unclear and that's not what we want. In later versions, we could perhaps even make the warning a failure. This would improve licensing overall by making it clearer. However, the legitimate use case Carmen describes above, is an issue with my suggestion. Something we should figure out is how common static dependencies are and if they are typically necessary. If the answers are very common and necessary, then perhaps there is a way to declare something a static dependency and in that case be able to override included licensing information? Or perhaps we could ignore something that is declared a dependency? However, there is an underlying question we should also debate: Can a repo be REUSE compliant with static dependencies that aren't? |
This comment was marked as resolved.
This comment was marked as resolved.
@DerMolly This mechanism exists :D See https://github.com/fsfe/reuse-tool/blob/6b81d04dfa198423b3a1adc0483368af7b73fc7c/docs/usage.rst#ignoring-parts-of-a-file Between |
/opt/venv/lib/python3.11/site-packages/reuse/project.py:224: PendingDeprecationWarning: Copyright and licensing information for 'newsfragments/.gitignore' has been found in both 'newsfragments/.gitignore' and in the DEP5 file located at '.reuse/dep5'. The information for these two sources has been aggregated. In the future this behaviour will change, and you will need to explicitly enable aggregation. See <fsfe/reuse-tool#779>. You need do nothing yet. Run with `--suppress-deprecation` to hide this warning. warnings.warn(
``` /opt/venv/lib/python3.10/site-packages/reuse/project.py:224: PendingDeprecationWarning: Copyright and licensing information for 'tests/openssh_server/Dockerfile' has been found in both 'tests/openssh_server/Dockerfile' and in the DEP5 file located at '.reuse/dep5'. The information for these two sources has been aggregated. In the future this behaviour will change, and you will need to explicitly enable aggregation. See <fsfe/reuse-tool#779>. You need do nothing yet. Run with `--suppress-deprecation` to hide this warning. ``` Ref: https://github.com/libssh2/libssh2/actions/runs/6789274955/job/18456085964#step:4:4
I added a proposal for REUSE.toml in the section 'An actual for-real-this-time concrete proposal' of the issue. |
I support the TOML choice and also that we avoid having a dotfile. Regarding using The example TOML looks good to me. I would expect that that using SPDX annotation within the field would also work fine. e.g.: [[annotations]]
path = "docs/reuse*.rst"
precedence = "toml"
SPDX-FileCopyrightText = [
"2017 Free Software Foundation Europe e.V. <https://fsfe.org>",
"2023 Jane Doe",
]
SPDX-License-Identifier = [
"(CC-BY-SA-4.0 AND GPL-3.0-or-later) OR MIT"
] I have some concerns about the |
|
Let's no longer make 'precedence' mandatory -> default to file. |
After the discussion today, do you think differently? I think we need this feature, that was one of the USPs of REUSE.toml/yaml. |
I wrote that just before the meeting. My mind was changed during the meeting :) |
There is also an issue as the deprecation warning is getting printed when there is nothing to aggregate / both
I would expect no warnings as the information is redundant but in sync, no licenses to aggregate and therefore no precedence issues. However, a warning gets printed: $ reuse --version lint
reuse 2.1.0
$ reuse --root /tmp/issue779/ lint
/tmp/.venv/issue779/lib64/python3.12/site-packages/reuse/project.py:224: PendingDeprecationWarning: Copyright and licensing information for '/tmp/issue779/foobar.html' has been found in both '/tmp/issue779/foobar.html' and in the DEP5 file located at '.reuse/dep5'. The information for these two sources has been aggregated. In the future this behaviour will change, and you will need to explicitly enable aggregation. See <https://github.com/fsfe/reuse-tool/issues/779>. You need do nothing yet. Run with `--suppress-deprecation` to hide this warning.
warnings.warn(
/tmp/.venv/issue779/lib64/python3.12/site-packages/reuse/project.py:224: PendingDeprecationWarning: Copyright and licensing information for '/tmp/issue779/foobar.yaml' has been found in both '/tmp/issue779/foobar.yaml' and in the DEP5 file located at '.reuse/dep5'. The information for these two sources has been aggregated. In the future this behaviour will change, and you will need to explicitly enable aggregation. See <https://github.com/fsfe/reuse-tool/issues/779>. You need do nothing yet. Run with `--suppress-deprecation` to hide this warning.
warnings.warn(
# SUMMARY
* Bad licenses: 0
* Deprecated licenses: 0
* Licenses without file extension: 0
* Missing licenses: 0
* Unused licenses: 0
* Used licenses: GPL-3.0-or-later
* Read errors: 0
* files with copyright information: 2 / 2
* files with license information: 2 / 2
Congratulations! Your project is compliant with version 3.0 of the REUSE Specification :-) Is this expected behavior? If so, senseful wildcards would effectively prevent users to add |
I agree. The code just concatenates data from both sources without any attempt to avoid duplication, and emits multiple warnings. I logged an issue offering to fix this four months ago but there was no response. So I don't think there's any interest in that aproach. It's possible to suppress the warnings, but it's not possible to suppress the duplicates (in spdx output). Maybe it's messy but harmless. But I wish it was at least possible to instruct the code to not scan the source when there is a .reuse/dep5 file. That seems to me to be a very sensible thing to do, but it is also possible to scan both and cull duplicates (it's just less efficient if there is total agreement - where there is disagreement targeted warnings could help to achieve agreement). I think we have to wait until a decision is made about a configuration file that instructs reuse what to do in this regard. I just hope that the decision supports projects that want to continue to specify all of the licensing information (for REUSE purposes) in a single file separate to the source files (i.e., .reuse/dep5). |
@andreashaerter That is an interesting bug/feature/behaviour that probably shouldn't be too difficult to fix. However, #863 should fix this class of issues, so it's a touch low on the priority list. I'll create an issue for it nonetheless. Interestingly, the code does actually avoid duplication, ultimately. But at that exact stage in the code, not quite. |
If there is duplicate removal somewhere, it would be good if it occurred as part of or before the |
I updated the first post with some new details after doing some tinkering work in #863.
Some notes:
|
Practically speaking, this sounds like a good trade-off. But the tool should raise a warning if that happens. If someone is already using (It’s going to be fun explaining it in the spec.) |
Help, I got here because I got a PendingDeprecationWarning
You can get rid of this warning by upgrading to
>=4.0.0
ofreuse
, where the above behaviour is defined in REUSE Specification v3.2.The reason you're getting this warning is because of the following scenario. You have a file
my-project/foo.py
which contains the following header:But you also have a
.reuse/dep5
file which contains the following section:The problem: Under which licence is the file? Who are the copyright holders?
Prior to version 4.0.0, we erred on the side of caution, and just aggregated the results. The answer to both questions was 'both', as far as the tool was concerned.
However, that behaviour was not actually specified in the REUSE Specification v3.0, and there was a consensus among the maintainers of REUSE that this behaviour wasn't great. So we wanted to change it.
In REUSE Specification v3.2, we added a new file format
REUSE.toml
which allows you to specify the order of precedence of licensing information. The method of aggregation described above is now explicitly defined as the order of precedence for.reuse/dep5
.Find below the historical contents of this issue.
A naïve proposal + some history
We want to define an order of precedence for copyright and licensing information. Here is a concrete proposal:
In fact, this proposal is so concrete that—for a few hours—it was in REUSE Specification 3.1 and tool version 2.0.0! However, because of quick negative feedback, this update to the specification was promptly reverted, and tool version 2.0.0 was yanked. A little embarrassing on our part, but we're thankful for the constructive feedback.
Copied from the change log:
The legitimate use-case is the following scenario: You copy a project Foo wholesale into your own project as a static dependency. Foo is not REUSE-compliant, but does contain copyright statements in some code headers. You write a section into
.reuse/dep5
broadly declaring thatstatic/Foo/*
is under its declared licence, and attributeThe Foo Authors
as the copyright holders. However, because the DEP5 file is now no longer applied to the files that contain copyright statements, REUSE will complain that these files do not have a declared licence.Within the restrictions of the above proposal, there is no good workable solution to this use-case. You could manually edit the headers (not great, especially when Foo is big, or you regularly need to update it), or you could manually add
.license
files, which may be a huge task.An actual but not-so-concrete proposal
We still want to define an order of precedence. But we must provide a way to force aggregation (current behaviour) or hard-coding precedence (e.g., prefer
.reuse/dep5
over the file contents).There does not yet exist a concrete way of doing this, but you may think of it like this. Given the example
.reuse/dep5
section at the start of this issue, we could instead write this:The problem, however, is that DEP5 does not support this field, and we don't want to make it support this field.
So we want to pivot away from DEP5 and adopt a different configuration method. We've been brainstorming this since 2021 (volunteer projects aren't very fast), and we're internally referring to it as
REUSE.yaml
(although the YAML part is a bit up in the air).An actual for-real-this-time concrete proposal
Find below a real and actual concrete proposal:
Some notes on the implementation:
path
,SPDX-FileCopyrightText
, andSPDX-License-Identifier
can be either a single string or a list of strings. This (partially) matches DEP5 behaviour, making it easy to port. It's also convenient to not mandate lists; we'll probably convert string values into single-value lists in the under-the-hood implementation (edit: that is exactly what I did).SPDX-[...]
key names. This works better in TOML than in YAML (because the semicolon in YAML messes with this tool's parsing). It's a bit more annoying to type, but it's also very consistent, and means that the user has to memorise less.version
key doesn't do much of anything precently. I'm not sure if it'll ever become important, but if it does, it'll be good to have.Some notes about the file itself:
REUSE.toml
, and NOT.REUSE.toml
. People will be peeved by this choice (they don't like random tools littering their clean workspace), but I propose that we stand by this choice. By allowing dotfiles, we would run the risk that the licensing information is hidden on some computers. Licensing information should not be hidden, ergo let's not do dotfiles.closest
/aggregate
/override
precedence system.closest
resolves to the file itself OR to the nearest REUSE.toml (can be self).aggregate
just aggregates the REUSE.toml's information always, and then behaves likeclosest
.override
is an aggressive "ignore everything underneath me; I am the ultimate authority here" precedence setting. The topmost REUSE.toml withoverride
is authoritative.Related issues
Here are some issues of relevance (in order of relevance, feel free to reference more):
This issue will exist as a sort of meta issue to refer back to and track work in other issues.
The text was updated successfully, but these errors were encountered: