Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage vulnerability sources in database #152

Merged
merged 20 commits into from
May 23, 2020

Conversation

haikoschol
Copy link
Collaborator

@haikoschol haikoschol commented Mar 1, 2020

This is an implementation of #123.

The change adds a new model Importer that stores metadata as well as the name of a class (a vulnerabilities.data_source.DataSource) that fetches and transforms vulnerability data into a common, DB schema agnostic format. This format is encoded in the two data classes vulnerabilities.data_source.Package and vulnerabilities.data_source.VulnerabilityInfo.

Running an Importer and storing the results in the database is done by ImportRunner.

The unused and redundant model PackageReference has been removed.

Another minor, but overdue change is renaming the scraper module to importers.

@reviewers:

It might be helpful to start with looking at how the import management command uses the above described classes and then jump into the rest of the code from there.

@haikoschol haikoschol added the WIP Work in progress PRs, tickets, etc. label Mar 1, 2020
vulnerabilities/models.py Outdated Show resolved Hide resolved
@haikoschol haikoschol force-pushed the 123-vulnsrc-in-db branch 4 times, most recently from 7a7295c to 6404723 Compare March 15, 2020 14:01
@haikoschol haikoschol force-pushed the 123-vulnsrc-in-db branch 2 times, most recently from 4e76c42 to 08ef849 Compare March 15, 2020 20:14
@haikoschol haikoschol linked an issue Mar 15, 2020 that may be closed by this pull request
if vuln_info.cve_id:
vuln, created = models.Vulnerability.objects.get_or_create(cve_id=vuln_info.cve_id)
if created:
vuln.summary = vuln_info.summary
Copy link
Collaborator

@sbs2001 sbs2001 Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some way to determine, which summary is better and then decide to update the existing summary or leave it as it is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not optimistic there is one. But even if there was a way, if all we have is the summary, we can't detect that a new vulnerability is actually an update to an existing vulnerability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of moving summary from Vulnerability model to the ImpactedPackage model . So the context won't be lost.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it makes sense that the same vulnerability could be described in different terms, depending on the affected package. But in practice, do any of the source we ingest so far do this? Or would we end up copying the same description to every ImpactedPackage? That wouldn't really be an issue though.

Fundamentally the questions are:

  • What is the definition of "vulnerability"?
  • What is the minimum number of attributes (and what are those) for identifying a vulnerability?

@haikoschol haikoschol force-pushed the 123-vulnsrc-in-db branch 5 times, most recently from c9e33a2 to 9c54a5d Compare April 4, 2020 15:38
@haikoschol haikoschol force-pushed the 123-vulnsrc-in-db branch 2 times, most recently from 7e322b7 to 63e607d Compare April 10, 2020 21:38
@haikoschol haikoschol force-pushed the 123-vulnsrc-in-db branch 4 times, most recently from 87ea90a to 0eed1b5 Compare April 17, 2020 19:46
@haikoschol haikoschol force-pushed the 123-vulnsrc-in-db branch 3 times, most recently from 16e4afb to 6c4c5a1 Compare April 21, 2020 17:46
@haikoschol haikoschol removed the WIP Work in progress PRs, tickets, etc. label May 23, 2020
@haikoschol
Copy link
Collaborator Author

Don't we need a migration script to have an Importer for safety_db , since you have implemented new importer for it ?

We do and it seems I missed that. Good catch!

Bump the Python 3.6 to Python 3.8 in the README :)

Also good catch! :)

sbs2001 and others added 18 commits May 23, 2020 16:58
Signed-off-by: Shivam Sandbhor <shivam.sandbhor@gmail.com>
This change introduces the model Importer, which contains metadata and a
pointer to the implementation of the importers DataSource subclass. This
subclass knows how to query whatever storage mechanism is used in that
case (e.g. git repository with yaml files, a single OVAL XML file, HTTP
API returning JSON, etc.)

Additionally it defines an API and data model that DataSource subclasses
need to provide to ImportRunner. ImportRunner leverages that API and data
model to efficiently store vulnerability information in the database.

Existing importers have not been migrated yet (only renamed from "scraper").

Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Having separate methods added_files() and updated_files() is not ideal
since the same file can be returned by both methods. When a file has
been added and then later modified within the window of time considered
by these methods, the file should only be reported as an added file.

Therefore this commit combines the methods added_files() and
updated_files() into file_changes(), which returns a tuple of sets for
added and updated files, respectively.

It also fixes a time-math issue; commit timestamps are converted to UTC
before being compared to the cutoff timestamp (that is always in UTC).

Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Reduce the amount of converting between PackageURL and string to the
bare minimum. Add/improve some type annotations and reduce API surface
area.

Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Many derived classes need a Unix timestamp with the cutoff date as well
as a method to split a list of advisories into batches.

Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Signed-off-by: Haiko Schol <hs@haikoschol.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data Ingestion Scenarios
2 participants