-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cargo.lock
considered harmful
#327063
Comments
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/cargo-lock-considered-harmful/49047/1 |
To add to the evidence: |
I don’t think this should be prohibited entirely because I think it’s the only option if upstream doesn’t provide a |
I'm currently including a |
On making too easy for doing the wrong thing, only the wrong thing is done. |
I don’t think blocking Nixpkgs on upstream packaging issues is a viable approach, especially ones that primarily only affect us. We wouldn’t have a package set at all. In this case, many upstreams are actively unwilling to maintain a |
The same goes for composer.lock files for php packages. There are soooo many upstreams that will never accept a lockfile, no matter how much you talk with them. Perhaps we can still allow lockfiles, with the recommendation that you first talk to upstream to see if they are willing to accept it there. And reduce the tarball size slowly that way? |
@emilazy I don't think @superherointj suggested doing that. They suggested prioritising upstream collaboration and only falling back to in-tree workarounds when collaboration fails but have a proper process for that in place. I don't think we need to eradicate Cargo.lock entirely myself. There are likely edge-cases where anything else is simply impossible. It needs to be the exception rather than the norm though; something that 30 packages might do, not 300. @patka-123 w.r.t. composer.lock and friends, also see #327064. I have two further thoughts on possible solutions:
|
I’m all for upstream collaboration – I’ve opened like 7 upstream PRs during Nixpkgs work in the past month or so – but from my experience of upstream responsivity, I don’t think we can viably have a workflow like “work with upstream, then work around when that fails”. A lot of the time upstreams just take ages to even acknowledge an issue or PR, and even when they do it can take several rounds to achieve mutual understanding and consensus. “Work with upstream, apply a workaround in the meantime, then remove it if upstream collaboration succeeds” is much better for Nixpkgs maintainers and users. I think that making sure that workarounds get removed in a timely fashion when no longer necessary, and aren’t introduced unnecessarily in the first place, are more effective sites of intervention. |
Since a simple way to avoid a lockfile is to ask upstream to add the lockfile. My proposal:
Can we agree with this? |
I think that is a reasonable enough expectation for packages with non‐dormant upstreams that haven’t previously expressed an opinion on lock files, when there is no other obstacle to removing the lock file (e.g. Git dependencies or us needing to extensively patch it for our own purposes), yeah. Note that Cargo upstream used to explicitly recommend committing lock files for applications but not libraries, but they have since changed their guidance and it is now considerably more equivocal. So we don’t really have anything slam‐dunk that we can point people to here. |
I've written this out more thoroughly in an amendment in #327064 |
I think that the Rust stuff can’t process a fetcher‐produced |
What if Nix would be able to import files from zip files? Flakes and stuff could just download zip files instead of the extracted tree and Nix could load the files from the inside of the zip file without unpacking. Wouldn't solve the problem but could be a little easier on the IOPS, so, probably more usable in slower storage. |
See the perpetual‐work‐in‐progress lazy trees work. It’s a hard problem, unfortunately, so we shouldn’t hold our breaths for it. |
I haven't studied the problem of git dependencies carefully, but this problem may be solved by using some scripts (instead of cargo) to parse Cargo.lock at build time (instead of eval time)? |
#217084 The original plan was to replace every cargoHash with Cargo.lock, but this was not implemented. This PR also lists some benefits of migrating to Cargo.lock. |
I think I remember people recommending parsing the For example, with But with Is this something we want to abandon, and either have ecosystem-specific tools, or wait for something like recursive Nix? |
We don't need all the data in the Cargo.lock file. (We could drop the |
The other big reason we vendor Cargo.lock are git dependencies. Is there maybe a way to make them worm without dependency vendoring? |
Eventually, the way I'd like this to work would be that we have deduplication as well — some big file that defines every version of a Cargo package required for Nixpkgs, that we'd only have to process once. We could even eventually deduplicate semver-compatible packages, since the Rust ecosystem is very good about this. This would mitigate the problem of relying on upstream to update for dependency crates for security fixes. But this would require some tooling to keep that file up to date when adding / updating a package. The quicker fix would be to remove the dependency information as suggested above, which would just require doing the deletion, and modifying the code that checks the Cargo.lock file matches to allow this. |
This would help with filesystem size but does it help with RAM usage? |
I like the idea of One Gigantic Lock, but a directory with one package per file would probably be better than one file, even if less efficient on disk, because we won’t be fighting Git merges constantly. And of course we’d probably still want per‐package local additions/overrides for things like random Git fork dependencies. |
Btw. Recent nix versions also have a feature called |
Is this actually true? My experience has been that we always need to specify individual hashes for Git dependencies. |
Ah yeah, you are right. Wrong memory. We can only support stuff that we can reliable download with cargo-vendor. |
Could we ask upstream to add an option that doesn't update Git dependencies? Isn't this an issue with cargo-vendor and not Nix/Nixpkgs? Edit: Doesn't |
This is not the problem. The problem is that cargo vendor does not promise a stable output format at all, for git dependencies or crate tarballs. The reason it is not allowed for Git dependencies specifically in Nixpkgs is that, since we became aware of this problem, only the representation of Git dependencies has changed in practice. When that happened, we decided to disallow fetching git dependencies with fetchCargoTarball, as a compromise solution that solved the immediate problem, but at any time a new version of cargo vendor could also change the representation for tarball crates, and then we'd have a big problem. |
I recently created a PR adding a possible alternative to cargo's built-in non-stable vendoring logic: Linking it here for discoverability. |
I found rust-lang/cargo#13988, which is closed. Possibly git dependencies will now be vendored deterministically by cargo. |
Non-determinism was never the main problem. It was that new versions of cargo could change the format. |
Couldn't we employ the same pattern as pnpm and version the vendor hook? |
We could, but we'd have to package every version of Cargo we wanted to support. I don't think there's any advantage to doing that over an approach like #349360. |
With #349360 merged, we should now have a sustainable solution which covers all uses-cases that previously needed to resort to While it'd be great to see, we don't necessarily need mass-migration of packages to this pattern to solve the core of this issue. If even just the newly added packages were to use Thank you to @TomaSajt and everyone else involved in making this happen. If you're interested in this topic, I'd also like to direct your attention at #327064 which concerns itself with the 1/3 of lockfile bloat that is not caused by |
FWIW, we will still have to vendor |
We should ask the upstreams to provide it but if they don't cooperate, I think that's mostly unavoidable. There are some solutions to this which I've also discussed in #327064 as other lockfiles frequently have the same issue. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Well, #333702 would make each one of those packages add less to the tarball on the margin, though of course by amortizing the cost of the information across other packages (in a way that would provide value to maintainers and users). I agree that with the current scheme there’s not much we can do. |
Introduction
I've been doing a little investigation on the impact of
Cargo.lock
files because, if you runncdu
against a Nixpkgs checkout, they're usually the largest individual files you come across and rust packages are frequently at the top in any given sub-directory.AFAICT the functionality to import
Cargo.lock
has existed since May 2021. Usage has exploded since:Measurements
Next I measured the total disk usage of all
Cargo.lock
files combined:24MiB!
Realistically though, anyone who cares about space efficiency in any way will use compression, so I measured again with each
Cargo.lock
compressed individually:Further, evidence in #320528 (comment) suggests that handling
Cargo.lock
adds significant eval overhead. Eval time for Nixpkgs vianix-env
is ~28% lower if parsing/handling ofCargo.lock
files is stubbed.Analysis
Just ~300/116231 packages (~0.25%) make up ~6MiB of our ~41MiB compressed nixpkgs tarball which is about 15% in relative terms (18.5KiB per package).
For comparison, our hackage-packages.nix containing the entire Hackage package set (18191 packages) is ~2.3MiB compressed (133 Bytes per package).
Breaking down eval time by package reveals that each
Cargo.lock
takes on average about 76.67 ms to handle/parse.Discussion
I do not believe that this trend is sustainable, especially given the likely increasing importance of rust in the coming years. If we had one order of magnitude more rust packages in Nixpkgs and assumed the same amount of data per package that we currently observe, just the rust packages alone would take up ~54 MiB compressed.
If nothing is done, I could very well see the compressed Nixpkgs tarball bloat beyond 100MiB in just a few years.
Extrapolating eval time does not paint a bright picture either: If we assume one order of magnitude more
Cargo.lock
packages again, evaluating just those packages would take ~4x as long as evaluating the entire rest of Nixpkgs currently does.This does not scale.
Solutions
I'm not deep into rust packaging but I remember the
vendorHash
being the predominant pattern a few years ago which did not have any of these issue as it's just one 32 Byte string literal per package.Would it be possible to revert back to using
vendorHash
es again?(At least for packages in Nixpkgs, having
Cargo.lock
support available for external use is fine.)What else could be done to mitigate this situation?
Limitations/Future work
Files were compressed individually, adding gzip overhead for each lockfile. You could create a tarball out of all
Cargo.lock
files and compress it as a whole to mitigate this effect.I found some
Cargo.lock
files that have a different name or a prefix/suffix and were not considered.CC @NixOS/rust
The text was updated successfully, but these errors were encountered: