-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Improving caching, making a full NVD mirror available #2577
Comments
hi @terriko, is it exclusively for gsoc contributors? or open for everyone? if it's open, i would love to work on this. |
@b31ngd3v If you're able to work on it right now, go ahead and we'll find something else for the gsoc folk. I think we're going to want this sooner rather than later if possible. |
I'll start working on it then 👍🏻 |
hi @terriko, looks like we can't push the database to github |
Not entirely surprising though I was hoping we wouldn't hit that point for a while. We'll have to see if chopping it up makes sense or if we need other storage options. |
This is, incidentally, a pro for the "make a bunch of json files that can be re-loaded" theory, since they could and would be chopped into more manageable year/month chunks. |
I was having health issues and was busy with university exams, i would continue working on this issue with more speed now, and sorry for the delay! |
Summarizing some thoughts here so they don't wind up buried in #2807 and #2811 @b31ngd3v has gotten us to the point where we have a json export, so now we need to figure out
For parts 1 & 2: mirror data has the potential to get big and messy, and is potentially not the greatest for git since every single change winds up in the tree forever (even if you can't see it, the data is in the git history). BUT having the history opens some interesting options for research and the ability of others to examine vulnerability data and do things like verify the validity of the mirror over time, which are advantages that we might want. Plus, github gives us some space to play around and a CI system that I don't have to set up. So, I've set up https://github.com/sec-data/mirror-sandbox as a repo for us to experiment with scripts without "tainting" the existing cve-bin-tool repo. Since "sec-data" is a personal free org, I can add anyone I want to it, so I'll add @b31ngd3v and @anthonyharrison to it now. (you should have emails shortly) Longer-term: I've approached the micro mirror team about distributing our json mirror on their servers. They currently handle mirroring for a number of Linux distributions and open source projects, and they've got machines in data centers across the US and are starting to build out more globally. Once we get the mirror scripts working and are able to use the data in cve-bin-tool, we can basically hand that off to them and let them replicate, and they'll be able to watch the traffic and see what's happening. I'll probably add some of those folk to the sec-data org as well. |
So then the next question is how should we use this mirror once it's set up? What I was envisioning was something like this:
So if we wanted to do that, we need some config options:
I'd been thinking about this specifically with NVD since that's our biggest barrier to entry, but we should also consider:
And we probably need to consider some basic info provided per-mirror with the json files..
|
We should add checksums to the data (is that what signatures means?) to add
some integrity checks to the data.
…On Wed, 15 Mar 2023, 19:34 Terri Oda, ***@***.***> wrote:
So then the next question is how should we use this mirror once it's set
up?
What I was envisioning was something like this:
- mirror gets data
- cve-bin-tool defaults to using the mirror (so no one needs to get a
nvd key on first run of the product, which is currently a large barrier to
entry)
- cve-bin-tool has options to configure the mirror(s) in use
(presumably to choose a more "local" one, but also allowing people to use
an internal company one or share a cache across machines in an air-gapped
network)
- cve-bin-tool provides an option skip the mirror and go directly to
nvd (i.e. the current default behaviour)
- in future: we figure out how to also deal with mirroring of
gad/redhat/etc. and maybe how to configure mirroring of each of those
separately/together
So if we wanted to do that, we need some config options:
- providing a list of mirrors
- maybe some options about default mirrors
- options for failover if mirrors are broken (inaccessible, content is
invalid)
- probably some failover options if mirrors are out of date too (do we
use the same 24 rule, allow this to be configured separately, something
else?)
- maybe a way to pull from multiple mirrors at once?
- an option to revert to the current behaviour (using NVD directly)
I'd been thinking about this specifically with NVD since that's our
biggest barrier to entry, but we should also consider:
- having mirrors for each data source in separate directories
- allowing configuration options to use the mirror for all/some sources
And we probably need to consider some basic info provided per-mirror with
the json files..
1. Original data source
2. License
3. Time of last update
4. Any signatures, etc. for validation?
5. where to find our mirroring code
6. How to set up your own
—
Reply to this email directly, view it on GitHub
<#2577 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAID2YF2SAUPZK7HM7MCV3W4IKSRANCNFSM6AAAAAAUEKSFQI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Checksums: probably? In an ideal world, this mirroring system would be 100% automated with no human in the loop unless it can't update for some reason, but a checksum to make sure you downloaded correctly seems still useful. I'm not sure how valuable a signature would be since we'd likely be blindly signing whatever we download rather than attesting to its reliability, but it could fill the same niche if we wanted. |
Okay, had a bit of a chat with my mirroring expert:
Now, it's a little debatable what the automated signature is really going to mean in terms of data quality and integrity with respect to NVD:
|
@terriko @anthonyharrison what if we use gpg clearsign feature? |
@b31ngd3v yeah, I think that's likely the one we need to use. Basically for the mirroring folk it makes their lives easiest if we use whatever the distro folk use, and pgp is it. |
* feat: add sign with pgp flag while exporting json data * feat: verify sign while importing the json data * feat: update FETCH_JSON_DB to use pgp signing * fix: update test_fetch_json_db.py * fix: existing broken tests * fix: change the file extension to `.asc` * fix: removed `signed: true` * fix: update tests * fix: update tests
I think we're good to close this one alongside #3181 |
What we're currently doing as far as cache goes:
What I think might be useful:
Potential problems:
Any thoughts? I'm mentally trying this out as a potential GSoC project but I'm not sure if it's quite the right size/complexity for that, so thoughts on that as well as the general technical/social challenges.
The text was updated successfully, but these errors were encountered: