Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpm: consult dnf database for repository information #869

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

hdonnay
Copy link
Member

@hdonnay hdonnay commented Mar 13, 2023

These changes make the rhel machinery consult both the rpm and dnf databases when possible.

See-also: #809

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: Patch coverage is 29.52381% with 74 lines in your changes missing coverage. Please review.

Project coverage is 55.23%. Comparing base (c2fc225) to head (57bb054).

Files Patch % Lines
rpm/dnf.go 25.71% 52 Missing ⚠️
rpm/packagescanner.go 14.28% 18 Missing ⚠️
rhel/coalescer.go 57.14% 1 Missing and 2 partials ⚠️
rpm/native_db.go 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #869      +/-   ##
==========================================
- Coverage   55.39%   55.23%   -0.17%     
==========================================
  Files         282      283       +1     
  Lines       17836    17923      +87     
==========================================
+ Hits         9881     9899      +18     
- Misses       6923     6990      +67     
- Partials     1032     1034       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hdonnay hdonnay changed the title wip: rpm: consult dnf database for repository information rpm: consult dnf database for repository information Dec 19, 2024
rpm/dnf.go Outdated
Comment on lines 49 to 52
if fi, err := fs.Stat(sys, `root/buildinfo/content_manifests`); errors.Is(err, nil) && fi.IsDir() {
// This is a RHEL layer, skip.
return nil, nil
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Update this to check the files for a marker value and continue if found.

Should look for from_dnf_hint key in the yaml files.
See also: containerbuildsystem/atomic-reactor#2140

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Punting on this unless we determine it's needed.

@cgwalters
Copy link

I created rpm-software-management/rpm#3505 related to this

@hdonnay hdonnay force-pushed the hack/dnf branch 3 times, most recently from 10b07ff to e59f165 Compare January 13, 2025 15:57
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Using these should give us a nice "todo" list when the time comes.

Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
This adds two helpers: one to discover repoids, and one to annotate a
stream of Packages.

Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
@hdonnay hdonnay marked this pull request as ready for review January 22, 2025 20:22
@hdonnay hdonnay requested a review from a team as a code owner January 22, 2025 20:22
@hdonnay hdonnay requested review from RTann and removed request for a team January 22, 2025 20:22
@RTann RTann requested a review from crozzy January 22, 2025 22:11
Copy link
Contributor

@RTann RTann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it through most of the files, but I'll need to come back to this another time. Would you mind making more commits and then squashing things at the end so it's easier to see the diffs after each round? Thanks

// outside of the Red Hat build system's builder's context. See [CLAIRDEV-45]
// for more information.
//
// [CLAIRDEV-45]: https://issues.redhat.com/browse/CLAIRDEV-45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ticket specifically calls out OSBS. What about Konflux? Should we update the ticket to mention Konflux, or is this not true for Konflux?

Comment on lines 18 to 20
"github.com/quay/claircore/internal/rpm"
"github.com/quay/zlog"
_ "modernc.org/sqlite" // register the sqlite driver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just sanity checking here

"github.com/quay/claircore/internal/rpm" is in the same module, so I expected to see this in the last block

Wrap(context.Context, iter.Seq[claircore.Package]) (iter.Seq[claircore.Package], func() error)
}

// Identity is an [Annotator] who's [Annotator.Wrap] method does nothing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whose

Comment on lines +81 to +85
var Identity Annotator

func init() {
Identity = ident{}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why not just var Identity = ident{}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here is the variable can't be (accidentally or deliberately) reassigned

return nil
}

// NewAnnotator holds book-keeping for producing multiple independent mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the convention of doing something like NewAnnotator instead of newAnnotator for the godoc confuses me because now we have two of these

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but I don't know the tables well enough to verify this is definitely the right query

continue
default:
final = fmt.Errorf("internal/dnf: database error: %w", err)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting used to iterators in Go. It's ok to have a path where we don't yield?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the consumer will just see that at the end of the iteration (in this case there's also a consumable error so the reason for it should be known.

switch n {
case `Packages`:
db.kind = kindBDB
if ok, err := testpath(p, bdb.CheckMagic); !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirming we want to ignore true, err != nil?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, testpath could return that if fs.File.Close() fails, I think that case should be accounted for.

blobs, dbErr := db.All(ctx)
seq, parseErr := parseBlob(ctx, blobs)
defer func() {
final = errors.Join(err, parseErr(), dbErr(), db.Close())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just confirming we want to close the DB here?

//
// [yara]: https://github.com/VirusTotal/yara
pat := []string{
`^.*/[^/]+\.jar$`, // Jar files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is copied over, but I'm wondering if we should extend this to the other file extensions we support like WAR, EAR, etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but maybe let's address that in #1473

for _, db := range found {
ctx := zlog.ContextWithValues(ctx, "db", db.String())
zlog.Debug(ctx).Msg("examining database")
if _, ok := done[db.Path]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirming this is no longer needed because of the slices.CompactFunc in rpm.FindDBs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I read it as too, the slices.SortFunc is putting the found in order (so dbs with the same path are adjacent), then slices.CompactFunc is compacting them together, so by the time FindDBs returns the dupes should be taken care of

Comment on lines +36 to +41
switch {
case a.kind < b.kind:
cmp = -1
case a.kind > b.kind:
cmp = +1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother checking this if we only care about the path later? Also not sure if it's possible for paths to be the same but different kind at the moment anyway


seq, checkPkgs := db.Packages(ctx)
seq, checkDNF := a.Wrap(ctx, seq)
wart.AsPointer(seq)(yield)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was kind of a 🤯 for me, and I had to play with it in Go Playground to understand this: https://go.dev/play/p/j2-vMOOmrGw

Comment on lines +110 to +114
/*
for _, pkg := range got {
t.Logf("%s: %#q", pkg.Name, pkg.RepositoryHint)
}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still desired or to delete?

func (*Scanner) Name() string { return pkgName }

// Version implements scanner.VersionedScanner.
// Version implements [indexer.VersionedScanner].
func (*Scanner) Version() string { return pkgVersion }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming: even though the code is very different, we aren't expecting different results, right?

ctx, done := context.WithTimeout(ctx, r.cfg.Timeout)
defer done()

// !!! This ends up being a weird nonlocal return. !!!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's saying that any error returned from the mapContainerAPI is not evaluated or returned until outside of this function body L#259

}
}

if r.apiFetcher != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's say this is enabled. we seem to go here, no matter what, right? Before, we only got here if we were unable to read the content-sets files or the prod sec data is faulty.

Copy link
Contributor

@crozzy crozzy Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems like if len(repoids) == 0 && r.apiFetcher != nil would be preferred, I can't see a situation where we'd want to reach out to the container API if we had content-sets either from DNF or the embedded file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These txter test files are not net-new, I think they should be removed from rpm/testdata

for _, cpe := range cpes.CPEs {
s[cpe] = struct{}{}
}
func (m *mappingFile) GetOne(ctx context.Context, repoid string) (cpes []string, ok bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment

db *sql.DB

// Concurrent maps for memoizing database lookups.
absent sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for using an absent map here and not saving an empty struct in repo that can be checked to signal if it's absent, readability?

environment.RepositoryIDs[i] = layerArtifacts.Repos[i].ID
v, _ := url.ParseQuery(pkg.RepositoryHint)
if id := v.Get("repoid"); id != "" {
environment.RepositoryIDs = v["repoid"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will leave inconsistencies in how the repo IDs are reported in index report. Packages with no specific repos will report their repo ids as numerical with the ability to be referenced back to repository objects (like current behaviour):

        "repository_ids": [
          "1",
          "2",
...

But packages that have a specific repoid specified by the DNF DB will have the repo slug that can not be directly referenced back to a repository object:

        "repository_ids": [
          "rhel-9-for-x86_64-appstream-rpms"
        ]

We should think of a way to make these consistent to keep the expectation that "repository_ids" will refer to keys in the "repository" section and not a mix of the claircore.Repository.ID and (what will be persisted as) the claircore.Repository.Name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do something like this:

func lookupRepositoryID(repos []*claircore.Repository, slug string) string {
       for _, r := range repos {
               if r.Name == slug {
                       return r.ID
               }
       }
       return ""
}

and then

if slug := v.Get("repoid"); slug != "" {
    if id := lookupRepositoryID(layerArtifacts.Repos, slug); id != "" {
         environment.RepositoryIDs = []string{id}
    }

This should always find a repo as repository scanner should be finding the same repos as the package scanner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is incomplete in that we need to append all repositories who's Name == repoid, not just the first one as the example code does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants