-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build system dependency management / lockfile support #833
Comments
This is an interesting idea. The rpm.deps and rpm.lock feature can be, in a crude form, implemented as a third-party convertor from those formats to arguments for "dnf install" command. More advanced features, like comparing package hashes could be implemented as a DNF plugin and exposed as a DNF subcommand (e.g. "dnf install-rpm-deps rpm.deps"). This again can be a third-party plugin. Though I'm not sure whether DNF library exposes all necessary functions. What I worry more about is versatility of the format. Once it is implemented by multiple distributions, people will naturally want to have it portable across distributions. And there is the problem because different distributors package software under different names. What Fedora calls curl-devel, Debian calls libcurl4-openssl-dev, and Gentoo net-misc/curl. Who's going to map the names? The formats you examples completely ignore it and instead pin to a specific package repository of a specific distribution release. Or even hard-code URLs of the packages. With a format specific to a distribution, I actually cannot see a benefit in implementing this new format if user needs to tailor it the the specific distribution. Then he can simply write "dnf install curl-devel = 1.2.3" without bothering any intermediate format. |
I don't think it's a good idea to even attempt to to have portable dependency naming. Users do not expect this today and it will always break in practice. Distributions carry custom patches, do custom versioning, enable or disable features. I think doing this per package manager is the right approach. fedora~38.deps
ubuntu~jammy.deps
Pinning RPMs to expected hashes is still very much an improvement for reproducible builds. While it is unlikely, there is no real guarantee that |
Hashes and URLs to mirrors are useless in case of Fedora because Fedora repositories do not preserve historical packages. One would have to maintain its own repository, or link to Fedora build system which preserves history but is too slow. I believe that a package hash is not stored anywhere now. DNF has to handle packages installed without DNF directly with rpm. As far as I know there is only a hash of an RPM header (SHA256HEADER) and a hash of a files content (PAYLOADDIGEST). Maybe one could use a signature of the package (SIGPGP) instead. But packages can be generally unsigned. But you are right that "curl-devel = 1.2.3" is not unique. As well as full NEVRA is not unique. I remember @j-mracek was pondering how to give a user a mean of selecting an exact package for installation (at least from syntax point view) but I never heard any outcome. |
This is where a custom mirror or the vendor command from above might come into play as it provides an option to download everything referenced in a lockfile for archival storage. |
I think many thing are achievable with current DNF. And some of the features might get added. What we cannot do - specify direct dependency for particular package. We can favor, disfavor, exclude or even lock package to modify decision of solver for dependencies in whole transaction, but we cannot modify a decision for particular package. In some rare cases (but valid one) favor and disfavor switches off obsoletes. |
I wonder if something like this would fit better outside DNF. One of challenges I've face with DNF is its memory usage, as it will attempt to solve package needs regardless of how finely you specify what you want. This can make it really difficult to run DNF on resource-restricted environments like tiny cloud VMs. Rather than making this something dnf can consume, could we make it something dnf can genrate and can be consumed by a much simpler tool? Unless I'm totally off, a consumer of a lockfile just needs to compare what is installed locally to what is requested in the lockfile and download and install those packages from the specified locations. Is this what is meant with the "vendoring" command? |
"dnf --assumeno install ..." displays what packages would be installed, removed etc. The format is not ideal for parsing by machines, but I think a JSON output is considered in issue #867. Maybe DNF could output the transaction in a form of an rpm command. rpm tool does not solve a dependency tree. It only verifies that dependencies are satisfied. And that is much faster and needs fewer memory. Another approach would be doing a real "dnf install" and exporting the transaction with "dnf history store" on a beefy machine. Then consuming the "dnf history store" output with "dnf history replay ..." command on the target machine.
That would be true if the lockfile was exhaustive. I.e. listing desired packages including all their dependencies. And even then it would require resolving a dependency tree because a package installation is not purely additive operation. E.g. if a package to be installed conflicts with an already installed package, DNF will need to uninstall the installed package first. E.g. when replacing libcurl-minimal with libcurl. |
To cross reference existing implementations, rpm-ostree has support for a lockfile format: https://coreos.github.io/rpm-ostree/treefile/#experimental-options ( Example in https://github.com/coreos/fedora-coreos-config/blob/testing-devel/manifest-lock.x86_64.json and https://github.com/coreos/fedora-coreos-config/blob/472c23d6a0656867f5bf12e8f169e33617990e59/manifests/fedora-coreos.yaml#L72 |
I worked briefly on a poc of a dnf plugin to support something like this before discovering this issue tracker: |
I like the format, few comments. Would be great if the format can include also information about SRPMs (sources) so they can be in the listed together with the binaries - ideally if the format distinguish between sources and binary RPMs explicitly so sources can be downloaded if needed or ignored if not needed. What I'm missing here is information about architectures, current world is more heterogeneous then before (x86_64 is still around, ARM is trending, RISC is potential big thing). What I would even like is if the file can store info for multiple arches so artifacts that target multiple architectures and build based on a single lockfile. Multiple files (one per arch) would be also acceptable, but then there are more likely to be consistency issues (what if one lockfile is regenerated/pushed to git with new versions while the other is not?). Another missing piece are modules as just list of RPMs is not enough for modules. For example a record for modulemd, defaults and obsoletes would be needed so consumer of the lock file knows where to download these files. Btw. what is |
@Tojaj can you give me an example of where multiple arches per lockfile other than producing source containers would be useful? Other than that I agree with the suggestions. One thing that hasn't been discussed much (spoiler alert: goes probably too deep too soon), but @malt3 have you considered grouping packages by repos rather than having a flattened list of package objects carrying redundant information about |
@eskultety it's common for build almost all build tasks
Content producer desire is to have consistent content in their composed artifacts across different arches. |
It is questionable though whether such an exploded lockfile is the right thing for the task at hand. IOW I don't really perceive such and exploded lockfile to be something that anyone will want to maintain directly, instead, I would treat this as something that is generated from a more generic template that could spit out these lockfiles per arch. To support my argument it is still pretty common for projects to rely on autotools mechanims where you don't maintain the actual Makefiles and configure scripts and instead you only maintain parametrized templates in form of |
@eskultety Not sure I follow, why manually maintained? None on this thread or in my comment implied manual maintenance, or did it? I would suggest to take a discussion about the Btw. Some time back I shared with you draft of an |
@eskultety Not sure your proposal can satisfy the following use stories: |
@lmilbaum it would satisfy both just fine, those use cases would not be harmed by anything I mentioned, since the pointers to the RPMs (e.g. URLs) would remain static and unchanged, so from that perspective there would be no change to maintainers. |
Right so...I've been thinking about this more and in chatting with some other folks, I think it would be really valuable for dnf-using operating systems to support image -> package version locking like Amazon Linux is doing. This isn't the same thing as a lockfile - but (making up numbers) I think 80% of lockfile use cases would be satisfied by it. I suspect for e.g. the FCOS and rpmoci cases having a single version number and digested pull spec would just be a lot nicer than a giant lockfile. (With support for overrides of course; like "pin to An important thing in implementing this is very much fixing how Fedora manages RPM repositories to retain history. The best way to implement that would be moving to OCI artifacts for RPMs (as mentioned above, see also this issue). This would be a nontrivial change in code and also how distributions ship and how users consume things. (The biggest downside I think is "I typed dnf update and it didn't get that kernel update, why"?). |
Side note on this bit: that repo actually does exist nowadays: https://src.fedoraproject.org/rpms/fedora-repos/blob/rawhide/f/fedora-updates-archive.repo It's enabled by default on Silverblue and FCOS.
Indeed. This is what we do in FCOS; we have a process to bump lockfiles by just doing a full depsolve from the toplevel requests and getting the updated lockfiles from the solved set. When doing production builds from lockfiles, we exclude every package not mentioned in the lockfile to force the solver to exactly match our set. It's not ideal, but it works. Working at the compose level would be a nicer UX indeed in many cases. At least for FCOS though, the explicitness of listing out every single RPM is really nice. One thing is that a compose ID limits your choices to a single version of the repos, but you still want some rigor in which packages within that repo you want to install. Ideally, dnf would have a mode where given a lockfile it would skip over the libsolv step entirely and just let librpm verify that the provided set is consistent. |
I implied this above but I think nothing stops input to the system like:
Or in an imperative form,
to cherry pick things forward or backwards. I personally would way prefer operating on that versus the FCOS-style lockfiles today and the giant YAML/JSON in git. |
Right yeah, but that's still top-level requests that need to get depsolved. Which yeah, it should give the same result, since dnf and libsolv themselves are locked. So I agree that'd be enough for most cases. But in the case of FCOS, where we do promotion and have overrides, we want more certainty in what the resulting package set will be.
The JSON files are not really user-facing (just like you don't edit |
From the image builder team we also have quite a bit of interest in this. I'll describe our current approach and I hope this is the right place to do so. I've clicked through some of the internal issues and proposal documents but I'd prefer to do this in public. We do repeatable builds of operating system images (qcows, live iso's, ami's, ostree commits, that sort of stuff) and for that we pin all of our inputs including RPMs as we also build in a network constrained environment. We do this at our lowest level called osbuild. From a higher level in our stack (generally osbuild-composer or our console. API) we get a set of package specifications such as Currently we have our own osbuild-depsolve-dnf and osbuild-depsolve-dnf5 which receive some JSON (package specifications, repositories) and output a JSON blurb again which contains all the RPMs as resolved. We have to deal with multiple transactions and being unable to delete from certain transactions but I'll skip that bit. Perhaps those things can be used as inspiration. However, a long standing issue in our approach has been package markings and I immediately recalled when I saw these proposals that pinning all the inputs must lead to the same package markings as during the dependency solving. Thus I'm hoping that this proposal could include some functionality in dnf to allow us to extract the reasons from package installations and to serialize that in the format. Currently the reasons only contain As for the RPM pinning and RPMs being removed from repositories, perhaps it is time we offer Fedora an archive mirror? |
To flesh out how this would work, the way I'd implement this is (using OCI artifacts to hold RPMs):
It's not required to host RPM content as OCI artifacts to make this work...we could also just inject The obvious implementation of that is literally just pointing it at the composes. |
I linked to this elsewhere and someone asked me: how precisely would we store RPMs as OCI artifacts and how would dnf/librepo fetch them? One giant impedance mismatch is python/c++ vs Go here. On that topic, we are using containers/skopeo#1476 for a long time (now wrapped up with a nice Rust API) in bootc. I think if we did this we'd probably work to clean up and officially stabilize that "out of process API for registry fetches" that works by forking skopeo today. So basically As far as how it'd work on the registry side, it'd be most obvious to store an RPM physically as a blob. (Again with OCI artifacts it's not an image, we're not converting RPMs to tar). How the metadata works...well, it would make a ton of sense to move away from XML to JSON at the same time IMO, but it'd be easiest to just stick the metadata XML as another blob, and then finally the OCI artifact JSON just references that collection of stuff. |
We (the mock utility upstream) are interested in having something like this natively supported by DNF. For the time being, we'll generate a custom lockfiles per this schema for "hermetic builds" purposes, but it would be nice to have something native in the future so we can 1:1 switch to it. I thought I'd open a separate RFE for this, but at this point in time it seems wiser to just let you know here. Here comes the request - these major things need to be done in such a DNF feature:
I hope I'm in the right place with this request, if not please trampoline me. |
Hi Pavel, we've already started work related to your request. Check out the libpkgmanifest project. Multiple parties are involved with different use cases, so further development is expected. The library is already usable with some minimal functionality, including Python bindings. The dnf4 plugin is a work in progress (see rpm-software-management/dnf-plugins-core@03e32be), but is also in a usable state. For testing purposes, you can use this Copr: https://copr.fedorainfracloud.org/coprs/rpmsoftwaremanagement/manifest-plugin-testing/. |
Wait, you invented a another lockfile format? When there are multiple that already exist? Especially https://github.com/konflux-ci/rpm-lockfile-prototype that at least I'm forced to use even though we already had one in rpm-ostree that the people inventing that one were unaware of. |
For Mock, the file format is mostly internal, not a "standard" that we or anyone else would have to conform to. No format promises, no intentions to make this "an invention" itself. ATM we just need something to start with, so the rest of the pipeline development can continue, and rpm-lockfile-prototype project is "transitively" mentioned through this issue in Mock code. I tried to experiment there, spent some time on packaging it as RPM, but we need something "built into" DNF (the format itself is not so important, see requirements in previous comment). |
I think I identified a missing feature that many users of dnf would benefit from:
Let's say I have a build system that uses my source code to bundle an rpm.
I also want to build further artifacts based on my rpm. This could include:
docker build
,podman
,Buildah
, ...)mkosi
, CoreOS builder, or a rawdnf5 --installroot
)For supply chain security and general repeatability of build steps, I think the rpm/dnf ecosystem should support the following:
A dependency file format
If you are familiar with go, this would be the equivalent of a
go.mod
file.You specify your direct dependencies here.
As an example, if I have a program written C that depends on libcurl and xz, the dependency file might look like this:
rpm.deps
A dependency lock file format
If you are familiar with go, this would be the equivalent of a
go.sum
file.DNF could take this as an optional input. And, if provided, any rpm installed has to be included in the lockfile.
The lockfile would include: a specifier for an rpm file including the name of the rpm, the version and architecture (
nevra
) and one or more hashes of the expected rpm file itself.Here is an example for a lockfile format. The actual format could be completely different.
rpm.lock
https://github.com/microsoft/rpmoci/blob/5b3d17345e9b012d36c05b3c2de06426d6db9922/tests/fixtures/update_from_lockfile/rpmoci.lock#L1-L67
A vendoring command
This would read a dependency file and a lock file and download all rpms requested by the dependency file (with their dependencies) into a local folder.
This could be a part of the
dnf5 download
command.Benefits to the ecosystem
Many projects are implementing their own version of this today and could standardize instead
Examples include Fedora CoreOS, bazeldnf, rpmoci, repro-get.
Standardizing also allows dependency management systems (like renovate) to notify developers that dependencies are outdated (and could also warn about known CVEs by parsing the lockfile).
Incremental build systems
Incremental build systems (including Bazel, mkosi, Dockerfile / Containerfile) could read a dependency lockfile to decide if an image (or layer) needs to be rebuilt. This can lead to better caching and incremental builds.
Correct cache invalidation
This proposal eliminates the well known issue of having unknown state in a container image layer:
If I do a
docker build
once, docker will give me an old (and probably vulnerable) version of openssl.Let's instead say I could do this:
Now if I update my lockfile and rerun
docker build
I could get a newer version of openssl (instead of the old vulnerable one that is cached).Reproducible / repeatable builds
This is a basic requirement for reproducible builds. If I build an OS or container image based on RPMs, I basically need to vendor / mirror / pin every RPM myself. Otherwise, when rebuilding the image, dnf could install other versions of rpms from a repository.
Next steps
I would encourage feedback from different stakeholders and would like to get feedback if this rough idea would be welcome by the maintainers of dnf.
If this is the case, I would be happy to create a more detailed proposal.
The text was updated successfully, but these errors were encountered: