Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track licenses for each data pointers and records #63

Closed
pombredanne opened this issue Sep 26, 2019 · 15 comments
Closed

Track licenses for each data pointers and records #63

pombredanne opened this issue Sep 26, 2019 · 15 comments

Comments

@pombredanne
Copy link
Collaborator

We need to decide what we want to do wrt. licenses for data.
See https://cve.mitre.org/about/termsofuse.html for instance for the CVE/NVD.
There are a few ways to think about this:

  1. we are storing only pointers so there is no licenses issues to track as we are not storing third-party data
  2. we are storing only pointers and caching existing data so we should handle this in a way similar to what search engine do.
  3. we are storing data so we should track licenses either per-record or per source

Each of these cases may have an impact of the resulting data licenses, which should be as open as possible (ideally some CC0-1.0)

@pombredanne
Copy link
Collaborator Author

Another take on this topic: we are building open tools to collect, aggregate and redistribute a free and open software vulnerability database. At a high level we are keeping pointers/references and relate together many vulnerability records and software package versions they impact.

A pointer/reference is typically a URL and an ID to vulnerability information and to packages such as these below that are all related together:

We have a few areas where we would need some help and make some decisions soon enough:

  • we want the data we re-distribute to be as open as possible (ideally
    some kind of CC0),

  • the data we collect are itself under a variety of more or less open
    licenses, but all available publicly, and we need to decide:

  • what data we can aggregate or not based on licenses?

  • what if we keep only pointers/URLs as opposed to actual details of
    the records?

  • should we track the license of individual records or not? (and also
    based on what data we keep)

@pombredanne
Copy link
Collaborator Author

pombredanne commented Nov 5, 2019

I did have an extensive chat with @LeChasseur on that topic and some of the key points are:

  1. while a single vulnerability record may not be copyrightable, databases can be copyrighted
  2. on top of that, Europe has a notion of "sui generis" rights that may apply when data comes originally from Europe. It does not extend to non European data
  3. the cleanest way to promote reuse for the data we create (and for any right in the aggregate) is to use a public domain CC0-1.0 dedication. We could request but not demand some attribution. Anything else makes the data rather problematic to reuse, and we want to promote maximum reuse.
  4. we would need to track at the minimum the license if any of each data source.
  5. some data source may be out of reach for us to redistribute as aggregates based on their licenses

Some pointers about possibly problematic sources:

@pombredanne
Copy link
Collaborator Author

This is best handled a tad later to decide how we will implement license tracking possibly in #123
This is important enough to defer to the next milestone.

@msrb
Copy link

msrb commented Jan 14, 2020

Slightly off-topic comment, but here we go :)

I am no longer involved with https://github.com/fabric8-analytics/cvedb but it should only contain data collected from NVD (there is a dummy bot which scans NVD once a day and opens pull-requests in the database repository for further review/human curation). AFAIK, the team never decided on what license the database should use.

@pombredanne
Copy link
Collaborator Author

@msrb Thank you ++ for chiming in as this is quite useful

@pombredanne
Copy link
Collaborator Author

@msrb I'd love to reuse, integrate and further the code of https://github.com/fabric8-analytics/cvejob too... let me enter a ticket there to ask about the license of the code too... or would you know?

@pombredanne
Copy link
Collaborator Author

@sbs2001
Copy link
Collaborator

sbs2001 commented Jul 27, 2020

Things which we already use, without clarification of LICENSE. We need to reach/dig deeper these sources

@pombredanne
Copy link
Collaborator Author

See also aboutcode-org/scancode-toolkit#2143 for the Rubysec data

@pombredanne pombredanne changed the title Track (or not?) licenses for each data pointers and records Track licenses for each data pointers and records Sep 10, 2020
@pombredanne
Copy link
Collaborator Author

See also #277

@pombredanne
Copy link
Collaborator Author

pombredanne commented May 23, 2022

SUSE CVRF and OVAL data is CC-BY-NC which makes completely non open and impractical to reuse
See https://ftp.suse.com/pub/projects/security/cvrf/cvrf-suse-su-2017%3A2968-1.xml

SUSE has changed (some? all?) its vulnerability data license from CC-BY-NC-SA to CC-BY

Though there is still some global ambiguity based on the text of https://ftp.suse.com/pub/projects/security/cvrf-cve/LICENSE

The SUSE CVRF data is provided by SUSE under the Creative Commons license,
with Attribution for Non Commercial use:
CC-BY-4.0
https://creativecommons.org/licenses/by/4.0/

This text makes a reference to CC-BY but still mentions Non Commercial Use from a CC-BY-NC

And based on https://ftp.suse.com/pub/projects/security/cvrf/cvrf-opensuse-su-2015%3A0255-1.xml or https://ftp.suse.com/pub/projects/security/cvrf1.2/cvrf-opensuse-su-2015%3A0225-1.xml we still have some records left with this CC-BY-NC:

The CVRF data is provided by SUSE under the Creative Commons License 4.0 with Attribution for Non-Commercial usage (CC-BY-NC-4.0).

But even in the same data source we have other CC-BY licenses in https://ftp.suse.com/pub/projects/security/cvrf/cvrf-opensuse-su-2016%3A1623-1.xml

Copyright SUSE LLC under the Creative Commons License 4.0 with Attribution (CC-BY-4.0)

So this is a bit messy. I am reaching out to SUSE security by email.

@pombredanne
Copy link
Collaborator Author

I sent this to security@suse.de:

Hi:
Thank you for changing most of your vulnerability data license to CC-BY somewhat recently.
Yet there are still some problems with leftover CC-BY-NC.
Because of this, it makes the data difficult to consume automatically as each record need to be cherry picked based on its licenses allowing or not allowing usage (CC-BY-NC essentially prohibits any usage beyond mere reading)
May I suggest to use the plain CC-BY license consistently everywhere?
Or update your web pages and top level license notices to be consistent to alert that there is a mix of CC-BY and CC-BY-NC?

Thank you for your kind consideration!

For extra details, please see #63 (comment) for reference that I am pasting here:

@pombredanne
Copy link
Collaborator Author

And we got a super speedy reply from SUSE security team:

Sorry for this oversight that it was not done consistently.
(the ones affected were not being regenerated by the tooling.)
I now did a massive replacement, and all of cvrf files should be fine.
Also adjusted the LICENSE files in the directories.

Thank you ++ SUSE!

@TG1999 TG1999 modified the milestones: Core data collection, v33.0.0 Jan 13, 2023
@pombredanne
Copy link
Collaborator Author

pombredanne commented Jan 16, 2023

I chatted on the side with Ubuntu folks on their IRC:
on libera.chat #ubuntu-security

@stevebeattie FYI

This is about

pombreda> Philippe Ombredanne Hiya :) What the license for the security data at https://ubuntu.com/security/notices (and the usn-db dump)
6:51 PM And the license for https://ubuntu.com/security/oval reports itself as GPL and I do not know what to do for data with a GPL.
6:51 PM Who to talk to?
6:51 PM FWIW, we aggregate this in our little project at https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/ubuntu.py and https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/ubuntu_usn.py and we like to have a license for that!
6:52 PM Debian did not have a license ... but that was clarified at https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/debian_oval.py

Steve Beattie pombreda: hey, thanks for the question, and apologies that we don't have explicit terms on this stuff. In general, we want people to be able to consume, integrate, aggregare, and use the data presented in tools like nexB (so long as the data is represented accurately).
7:19 PM I'll poke people internally about getting more explicit statements in place.

@TG1999 TG1999 removed this from the v33.0.0 milestone Aug 15, 2023
@pombredanne
Copy link
Collaborator Author

With #1393 we could add a field at the advisory level to track its license, but on the other hand we are tracking the license consistently for each importer. I am closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants