Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect vulnerabilities from arch linux #20

Closed
pombredanne opened this issue Jul 22, 2017 · 11 comments · Fixed by #125
Closed

Collect vulnerabilities from arch linux #20

pombredanne opened this issue Jul 22, 2017 · 11 comments · Fixed by #125

Comments

@pombredanne
Copy link
Collaborator

The data is available at https://security.archlinux.org/

@lohani2280
Copy link
Contributor

@pombredanne I want to work on this issue. Can I take this up?.

@pombredanne
Copy link
Collaborator Author

@lohani2280 sure thing! that's very gentle of you: thank you for considering this.
Though you have to keep in mind that #32 may need to be done first and is higher priority.

@lohani2280
Copy link
Contributor

The dataset input is https://security.archlinux.org/json
Current format of the data is-

{
    "name": "AVG-396",
    "packages": [
      "bluez"
    ],
    "status": "Fixed",
    "severity": "High",
    "type": "information disclosure",
    "affected": "5.46-1",
    "fixed": "5.46-2",
    "ticket": "55603",
    "issues": [
      "CVE-2017-1000250"
    ],
    "advisories": [
      "ASA-201709-3"
    ]
  }

I am trying to make a scraper to collect the full scope of data needed in vulnerablecode.
Among the important pieces that I plan to extract from that dataset are:

  • package name
  • vulnerability id
  • status
  • severity
  • affected version
  • fixed version

@pombredanne Please give your reviews on this. Shall I proceed this way?.

@pombredanne
Copy link
Collaborator Author

that sounds about right, but how would you map this to the actual models? and did you see my comment above wrt. package URLs aka purl?

@lohani2280
Copy link
Contributor

lohani2280 commented Feb 27, 2019

@pombredanne Well to map this to the actual models I'll implement a method archlinux_dump in the vulnerabilities/data_dump.py file.

Yeah I have seen your comment about purl. Actually I was going through the codebase to understand the architecture of the entire project so just thought to try this issue first to get more familiarity with the existing codebase.

lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Mar 3, 2019
Signed-off-by: Ayush Lohani <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Mar 10, 2019
Signed-off-by: Ayush Lohani <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Mar 12, 2019
Signed-off-by: Ayush Lohani <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Mar 17, 2019
Signed-off-by: Ayush Lohani <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Mar 17, 2019
Signed-off-by: Ayush Lohani <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Mar 17, 2019
Signed-off-by: Ayush Lohani <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Mar 17, 2019
Signed-off-by: Ayush Lohani <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Apr 1, 2019
Signed-off-by: Ayush Lohani <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Apr 1, 2019
Signed-off-by: lohani2280 <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Apr 1, 2019
Signed-off-by: lohani2280 <lohani.ayush01@gmail.com>
@pombredanne
Copy link
Collaborator Author

the advisories at https://security.archlinux.org/advisory/json would also need to be handled

@lohani2280
Copy link
Contributor

TODOs :-
Collect all the vulnerability references that are listed on the link https://security.archlinux.org/vulnerability_id

lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Apr 9, 2019
Signed-off-by: lohani2280 <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Apr 9, 2019
Signed-off-by: lohani2280 <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Apr 10, 2019
Signed-off-by: lohani2280 <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Apr 10, 2019
Signed-off-by: lohani2280 <lohani.ayush01@gmail.com>
lohani2280 added a commit to lohani2280/vulnerablecode that referenced this issue Apr 11, 2019
Signed-off-by: lohani2280 <lohani.ayush01@gmail.com>
@pombredanne
Copy link
Collaborator Author

pombredanne commented Apr 11, 2019

#33 has been merged bu there are few extra thing to consider to complete this:

  1. we need more tests
  2. I do not like the name "scraping" or "scraper". We are inventorying, collecting, referencing, harvesting but certainly not scraping.
  3. there are some data that I am not sure we have fully collected yes, like the main advisories that I mentioned in discussions in the Collect vulnerabilities from arch linux #33

@haikoschol haikoschol added this to the Kickoff and baseline milestone Sep 17, 2019
@pombredanne
Copy link
Collaborator Author

See this chat:

pombreda_�>���30 Is there some structured API for https://security.archlinux.org/ (beside the Atom feed for https://security.archlinux.org/advisory/feed.atom ) �
�18�<�phrik�18�>�� Title: Vulnerable issues - Arch Linux (at security.archlinux.org)
�20�<�pombreda_�>���30 this is for an open and free aggregated vulnerabilities DB  that we are building at https://github.com/nexb/vulnerablecode�
�18�<�phrik�18�>�� Title: GitHub - nexB/vulnerablecode: [WIP] A free and open vulnerabilities database and the packages they impact. And the tools to aggregate and correlate these vulnerabilities. (at github.com)
�18�<�Foxboron�18�>�� https://security.archlinux.org/CVE-2019-14318/json
�18�<�Foxboron�18�>�� https://security.archlinux.org/json
�18�<�Foxboron�18�>�� basically
�20�<�pombreda_�>���30 ah... just pad json :)�
�18�<�Foxboron�18�>�� Yep :)
 felixonmars FFY00 Foxboron foxcpp foxxx0 funnel 
�20�<�pombreda_�>���30 Foxboron: excellent!�
�18�<�Foxboron�18�>�� Our tracker is not complete fyi
�20�<�pombreda_�>���30 would you know where the code for this app lives?�
�18�<�Foxboron�18�>�� as in it's not an exhasive list of issues and CVEs
�18�<�Foxboron�18�>�� https://github.com/archlinux/arch-security-tracker
�18�<�phrik�18�>�� Title: GitHub - archlinux/arch-security-tracker: Arch Linux Security Tracker (at github.com)

Eventually most URL are available as JSON too https://security.archlinux.org/CVE-2019-17596/json

@haikoschol
Copy link
Collaborator

After looking at our archlinux code and their various JSON feeds again, I do think we need to make changes to the code. At the moment we consume a feed of AVGs and create one Vulnerability for each entry. But one AVG can reference multiple CVEs, so those should be the vulnerabilities. And ASAs should be VulnerabilityReference. I don't think we need to store AVGs themselves at all, just the information about what packages and versions are affected.

@pombredanne
Copy link
Collaborator Author

@haikoschol re

one AVG can reference multiple CVEs, so those should be the vulnerabilities. And ASAs should be VulnerabilityReference. I don't think we need to store AVGs themselves at all, just the information about what packages and versions are affected.

this is making 100% sense to me.

@lohani2280 FYI this project is restarting... it could be of interest to you

haikoschol added a commit to haikoschol/vulnerablecode that referenced this issue Nov 4, 2019
This change adds the field cve_id to the Vulnerability model and based
on that, improves the data import for Arch Linux.

The improvements made have been discussed in issue aboutcode-org#20:

- For each CVE in a given AVG, exactly one Vulnerability is stored

- For each CVE, one VulnerabilityReference to its page on
  security.archlinux.org is stored

- Each ASA mentioned in an AVG is stored as a VulnerabilityReference

Since there is no production deployment of vulnerablecode yet, I took
the opportunity of changing the models to remove all migrations and
create a new one that creates the whole schema.

Since the cve_id field on Vulnerability has a unique constraint set, I
needed to make some changes to the import code that belong to issue aboutcode-org#28.
I kept them minimal however so aboutcode-org#28 is still open and needs to be
addressed later.

closes aboutcode-org#20

Signed-off-by: Haiko Schol <hs@haikoschol.com>
haikoschol added a commit to haikoschol/vulnerablecode that referenced this issue Nov 6, 2019
This change adds the field cve_id to the Vulnerability model and based
on that, improves the data import for Arch Linux.

The improvements made have been discussed in issue aboutcode-org#20:

- For each CVE in a given AVG, exactly one Vulnerability is stored

- For each CVE, one VulnerabilityReference to its page on
  security.archlinux.org is stored

- Each ASA mentioned in an AVG is stored as a VulnerabilityReference

Since there is no production deployment of vulnerablecode yet, I took
the opportunity of changing the models to remove all migrations and
create a new one that creates the whole schema.

Since the cve_id field on Vulnerability has a unique constraint set, I
needed to make some changes to the import code that belong to issue aboutcode-org#28.
I kept them minimal however so aboutcode-org#28 is still open and needs to be
addressed later.

closes aboutcode-org#20

Signed-off-by: Haiko Schol <hs@haikoschol.com>
haikoschol added a commit to haikoschol/vulnerablecode that referenced this issue Nov 6, 2019
This change adds the field cve_id to the Vulnerability model and based
on that, improves the data import for Arch Linux.

The improvements made have been discussed in issue aboutcode-org#20:

- For each CVE in a given AVG, exactly one Vulnerability is stored

- For each CVE, one VulnerabilityReference to its page on
  security.archlinux.org is stored

- Each ASA mentioned in an AVG is stored as a VulnerabilityReference

Since there is no production deployment of vulnerablecode yet, I took
the opportunity of changing the models to remove all migrations and
create a new one that creates the whole schema.

Since the cve_id field on Vulnerability has a unique constraint set, I
needed to make some changes to the import code that belong to issue aboutcode-org#28.
I kept them minimal however so aboutcode-org#28 is still open and needs to be
addressed later.

closes aboutcode-org#20

Signed-off-by: Haiko Schol <hs@haikoschol.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants