Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpm: consult dnf database for repository information #869

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
58c5c61
internal/rpm: new iterator-based rpm package
hdonnay Jan 6, 2025
787c3e0
internal/rpm/sqlite: better error return for Validate
hdonnay Jan 8, 2025
78c26b0
wart: add helpers for `iter.Seq[T]` implementations
hdonnay Jan 8, 2025
cc4677a
rpm: move to use new internal/rpm package
hdonnay Jan 8, 2025
79f59a7
internal/rpm: change "RepositoryHint" format
hdonnay Jan 8, 2025
1eb75f1
internal/dnf: add dnf-handling package
hdonnay Jan 7, 2025
77b8b68
rpmtest: allow specifying repository hint
hdonnay Jan 22, 2025
ff75397
rhel: add package scanner
hdonnay Jan 22, 2025
9f75bb4
rhel: fix testcase names
hdonnay Jan 22, 2025
26642a6
rhel: have repository scanner consult dnf
hdonnay Jan 22, 2025
9ea862e
rhel: use repoids in coalescer
hdonnay Jan 22, 2025
7a459ba
containerapi: add deprecation notices
hdonnay Jan 8, 2025
0797287
fixup! internal/dnf: add dnf-handling package
crozzy Feb 14, 2025
e8469e9
fixup! internal/dnf: add dnf-handling package
BradLugo Feb 24, 2025
b6c5b3b
fixup! internal/dnf: add dnf-handling package
BradLugo Feb 24, 2025
7a9803c
fixup! internal/dnf: add dnf-handling package
BradLugo Feb 24, 2025
8a75493
fixup! internal/dnf: add dnf-handling package
BradLugo Feb 24, 2025
f492c89
fixup! internal/dnf: add dnf-handling package
BradLugo Feb 24, 2025
de0886e
fixup! internal/dnf: add dnf-handling package
BradLugo Feb 24, 2025
0123676
fixup! internal/dnf: add dnf-handling package
BradLugo Feb 24, 2025
d9d5f87
fixup! internal/dnf: add dnf-handling package
BradLugo Feb 24, 2025
c1346ba
fixup! internal/rpm: new iterator-based rpm package
BradLugo Feb 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module github.com/quay/claircore

go 1.22.0
go 1.23

require (
github.com/Masterminds/semver v1.5.0
Expand Down
330 changes: 330 additions & 0 deletions internal/dnf/dnf.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,330 @@
// Package dnf interfaces with dnf 4 and 5 history databases to extract repoid
// information.
package dnf

import (
"context"
"database/sql"
"errors"
"fmt"
"io"
"io/fs"
"iter"
"net/url"
"os"
"runtime"
"sync"

"github.com/quay/claircore/internal/rpm"
"github.com/quay/zlog"
_ "modernc.org/sqlite" // register the sqlite driver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just sanity checking here

"github.com/quay/claircore/internal/rpm" is in the same module, so I expected to see this in the last block


"github.com/quay/claircore"
)

// BUG(hank) The dnf mapping is less useful than it could be because there's no
// reliable way to turn the RepoID that it reports into something with meaning
// outside of the Red Hat build system's builder's context. See [CLAIRDEV-45]
// for more information.
//
// [CLAIRDEV-45]: https://issues.redhat.com/browse/CLAIRDEV-45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ticket specifically calls out OSBS. What about Konflux? Should we update the ticket to mention Konflux, or is this not true for Konflux?


// BUG(hank) This package needs tests.

// NewAnnotator returns an [Annotator] using any discovered dnf database in the
// provided [fs.FS].
//
// The returned [Annotator] must have its [Close] method called, or the process
// may panic.
//
// If no dnf database is found, the [Identity] [Annotator] will be returned.
func NewAnnotator(ctx context.Context, sys fs.FS) (Annotator, error) {
toOpen, enum, err := findDatabase(sys)
switch {
case errors.Is(err, nil):
case errors.Is(err, errNoDatabase):
return Identity, nil
default:
return nil, err
}

zlog.Debug(ctx).
Str("path", toOpen).
Bool("is-5", enum == rmDNF5).
Msg("found dnf history database")
r, err := sys.Open(toOpen)
if err != nil {
return nil, fmt.Errorf("internal/dnf: unexpected error opening history: %w", err)
}
defer func() {
if err := r.Close(); err != nil {
zlog.Warn(ctx).Err(err).Msg("unable to close fs.FS sqlite db file")
}
}()

return newAnnotator(ctx, r, enum)
}

// Annotator produces iterators that map over packages, adding repoid where
// discovered.
type Annotator interface {
io.Closer
Wrap(context.Context, iter.Seq[claircore.Package]) (iter.Seq[claircore.Package], func() error)
}

// Identity is an [Annotator] who's [Annotator.Wrap] method does nothing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whose

//
// Provided as a variable to allow callers to determine if the returned
// [Annotator] will do anything.
//
// [Annotator.Close] is safe to call multiple times.
var Identity Annotator

func init() {
Identity = ident{}
}
Comment on lines +82 to +86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why not just var Identity = ident{}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here is the variable can't be (accidentally or deliberately) reassigned


// Ident backs the [Identity] Annotator.
type ident struct{}

// Wrap implements [Annotator].
func (ident) Wrap(_ context.Context, seq iter.Seq[claircore.Package]) (iter.Seq[claircore.Package], func() error) {
return seq, func() error { return nil }
}

// Close implements [Annotator].
func (ident) Close() error {
return nil
}

// NewAnnotator holds book-keeping for producing multiple independent mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the convention of doing something like NewAnnotator instead of newAnnotator for the godoc confuses me because now we have two of these

// iterators for a given dnf history database.
func newAnnotator(ctx context.Context, r fs.File, enum int) (*annotator, error) {
var err error
a := annotator{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious if it's preference or there's some performance reason behind opting to do this vs a := &annotator{ ... }?

removedEnum: enum,
}
a.db, a.spool, err = openDatabase(ctx, r)
if err != nil {
return nil, err
}

// Only way to get this is via the outer [NewAnnotator], so skip an extra
// frame.
_, file, line, _ := runtime.Caller(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which file/line are we trying to show? The one which called NewAnnotator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so yes, it's skipping newAnnotator and NewAnnotator and reporting the caller or NewAnnotator which is what we're after.

runtime.SetFinalizer(&a, func(_ *annotator) {
panic(fmt.Sprintf("%s:%d: Annotator not closed", file, line))
})

return &a, nil
}

// Annotator holds the state for a mapping iterators that use a given dnf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Annotator holds the state for a mapping iterators that use a given dnf
// Annotator holds the state for mapping iterators that use a given dnf

// database.
type annotator struct {
spool *os.File
db *sql.DB

// Concurrent maps for memoizing database lookups.
absent sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for using an absent map here and not saving an empty struct in repo that can be checked to signal if it's absent, readability?

repo sync.Map

// "Removed" action enum.
removedEnum int
}

// Wrap implements [Annotator].
func (a *annotator) Wrap(ctx context.Context, seq iter.Seq[claircore.Package]) (iter.Seq[claircore.Package], func() error) {
var final error

mapFunc := func(yield func(claircore.Package) bool) {
for pkg := range seq {
key := rpm.NERVA(&pkg)
// If a previous iteration found that a Name is definitely not
// present, do nothing:
if _, ok := a.absent.Load(key); ok {
if !yield(pkg) {
return
}
continue
}

// If this is an unknown Package, look up the repoid.
// Otherwise, use the known repoid.
var id string
if idAny, ok := a.repo.Load(key); !ok {
err := a.db.
QueryRowContext(ctx, repoidForPackage, a.removedEnum, key).
Scan(&id)
switch {
case errors.Is(err, nil):
a.repo.Store(key, id)
case errors.Is(err, sql.ErrNoRows):
a.absent.Store(key, struct{}{})
if !yield(pkg) {
return
}
continue
default:
final = fmt.Errorf("internal/dnf: database error: %w", err)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting used to iterators in Go. It's ok to have a path where we don't yield?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the consumer will just see that at the end of the iteration (in this case there's also a consumable error so the reason for it should be known.

}
} else {
id = idAny.(string)
}
// Re-parse and edit the RepositoryHint.
//
// It's annoying to do this, a [claircore.Package] redesign should
// make sure to fix this wart where we need structured information
// in a string.
v, err := url.ParseQuery(pkg.RepositoryHint)
if err != nil {
final = fmt.Errorf("internal/dnf: malformed RepositoryHint (%s: %#q): %w",
pkg.Name, pkg.RepositoryHint, err)
return
}
v.Add("repoid", id)
pkg.RepositoryHint = v.Encode()

if !yield(pkg) {
return
}

}
}

return mapFunc, func() error { return final }
}

// Close implements [Annotator].
func (a *annotator) Close() error {
runtime.SetFinalizer(a, nil)
return errors.Join(a.db.Close(), os.Remove(a.spool.Name()))
}

// FindRepoids reports all the repoids discovered in the dnf history database in
// the provided [fs.FS].
func FindRepoids(ctx context.Context, sys fs.FS) ([]string, error) {
toOpen, enum, err := findDatabase(sys)
switch {
case errors.Is(err, nil):
case errors.Is(err, errNoDatabase):
return nil, nil
default:
return nil, err
}

r, err := sys.Open(toOpen)
if err != nil {
return nil, fmt.Errorf("internal/dnf: unexpected error opening history: %w", err)
}
defer func() {
if err := r.Close(); err != nil {
zlog.Warn(ctx).Err(err).Msg("unable to close fs.FS sqlite db file")
}
}()
db, spool, err := openDatabase(ctx, r)
if err != nil {
return nil, err
}
defer func() {
if err := db.Close(); err != nil {
zlog.Warn(ctx).Err(err).Msg("unable to close sqlite db")
}
if err := os.Remove(spool.Name()); err != nil {
zlog.Warn(ctx).Err(err).Msg("unable to remove spool file")
}
}()

var ret []string
rows, err := db.QueryContext(ctx, allRepoids, enum)
if err != nil {
return nil, err
}
defer func() {
if err := rows.Close(); err != nil {
zlog.Warn(ctx).Err(err).Msg("error closing returned rows")
}
}()

for rows.Next() {
var id string
if err := rows.Scan(&id); err != nil {
return nil, fmt.Errorf("internal/dnf: error scanning repoid: %w", err)
}
ret = append(ret, id)
}
if err := rows.Err(); err != nil {
return nil, fmt.Errorf("internal/dnf: error reading rows: %w", err)
}

return ret, nil
}

func findDatabase(sys fs.FS) (toOpen string, rm int, err error) {
for i, p := range []string{
`usr/lib/sysimage/libdnf5/transaction_history.sqlite`, // dnf5 location
`var/lib/dnf/history.sqlite`, // dnf3/4 location
Comment on lines +267 to +268
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried looking at registry.access.redhat.com/ubi9/ubi:latest to see if I could find the dnf database, but I didn't find either of these files:

[root@abb84c2eafaa /]# ls /usr/lib/sysimage/
[root@abb84c2eafaa /]# ls /var/lib/dnf/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registry.access.redhat.com/ubi9/httpd-24 if anyone wants an example c/image with a DNF DB

} {
switch _, err := fs.Stat(sys, p); {
case errors.Is(err, nil):
return p, removedEnum(i == 0), nil
case errors.Is(err, fs.ErrNotExist): // OK
default:
return "", 0, fmt.Errorf("internal/dnf: unexpected error handling fs.FS: %w", err)
}
}
return "", 0, errNoDatabase
}

var errNoDatabase = errors.New("no database found")

// RemovedEnum reports the enum for a "removed" action for the indicated
// database version.
func removedEnum(is5 bool) int {
// Defined here:
// https://github.com/rpm-software-management/dnf5/blob/13886935418e28482de7b675169482b85303845d/include/libdnf/transaction/transaction_item_action.hpp#L35
if is5 {
return rmDNF5
}
// Defined here:
// https://github.com/rpm-software-management/libdnf/blob/93759bc5cac262906e52b6a173d7b157914ec29e/libdnf/transaction/Types.hpp#L45
return rmDNF4
}

const (
rmDNF5 = 5
rmDNF4 = 8
)

// OpenDatabase contains all the logic for opening the provided [fs.File] as a
// [sql.DB].
//
// The returned [os.File] is already closed.
func openDatabase(ctx context.Context, r fs.File) (*sql.DB, *os.File, error) {
// Currently needs to be linked into the filesystem.
// See also: quay/claircore#720
f, err := os.CreateTemp(os.TempDir(), `dnf.sqlite.*`)
if err != nil {
return nil, nil, fmt.Errorf("internal/dnf: error reading sqlite db: %w", err)
}
defer func() {
if err := f.Close(); err != nil {
zlog.Warn(ctx).Err(err).Msg("unable to close sqlite db file")
}
}()

zlog.Debug(ctx).Str("file", f.Name()).Msg("copying sqlite db out of tar")
if _, err := io.Copy(f, r); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: do we have to copy it, or is it ok to just use the file we find in the layer? As in, why do we need this openDatabase + findDatabase instead of just openDatabase?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this copy needs to happen in order to make the file that gets passed to the sql.Open seekable (fulfill the io.Seeker interface). This is mentioned in the comment above.

return nil, nil, fmt.Errorf("internal/dnf: error spooling sqlite db: %w", errors.Join(err, os.Remove(f.Name())))
}
if err := f.Sync(); err != nil {
return nil, nil, fmt.Errorf("internal/dnf: error spooling sqlite db: %w", errors.Join(err, os.Remove(f.Name())))
}

db, err := sql.Open("sqlite", f.Name())
if err != nil {
return nil, nil, fmt.Errorf("internal/dnf: error reading sqlite db: %w", errors.Join(err, os.Remove(f.Name())))
}
return db, f, nil
}
18 changes: 18 additions & 0 deletions internal/dnf/sql.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
package dnf

import _ "embed" // embed sql

// Return one column containing the repoid.
// Takes two arguments, in order:
// - the "removed" enum
// - package name.
//
//go:embed sql/repoid_for_package.sql
var repoidForPackage string

// Return one column containing the repoids.
// Takes one argument:
// - the "removed" enum
//
//go:embed sql/all_repoids.sql
var allRepoids string
16 changes: 16 additions & 0 deletions internal/dnf/sql/all_repoids.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but I don't know the tables well enough to verify this is definitely the right query

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
SELECT
repoid
FROM
trans_item
JOIN (
SELECT
max(id) AS uniq
FROM
trans_item
WHERE
action <> ?
GROUP BY
item_id
) ON (uniq = trans_item.id)
JOIN
repo ON (repo.id = repo_id);
Loading
Loading