Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC 0109] Nixpkgs Generated Code Policy #109
[RFC 0109] Nixpkgs Generated Code Policy #109
Changes from 19 commits
1391dac
71c4f28
9b343df
a43045b
183218c
a311978
a8e2f35
35df663
b93b2a9
da89a76
e38002d
d7a6f92
d38062a
e724981
d3b8f6f
f4d1e9b
1ab91b6
24078f4
47cb1a9
3876e77
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the previous suggestion, where I mistakenly missed out making sternenseemann the author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I thought @Ericson2314 wrote this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#109 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs an explanation as to why reproducibility affects simplicity of updating the packages.
I don't see a clear relationship between
reproducibility
and thedifficulty updating a package
.We can make update mechanisms that are very easy to use without guaranteeing reproducibility, or vice versa have reproducible mechanisms which are hard to use.
For example, an update interface like
nix run .#my-packages.update
would be very simple to use, but its result not necessarily reproducible.Vice verse, an update mechanism can be perfectly reproducible, but if it lacks a good UI, and instead requires the user to read the source code of nixpkgs to find the right commands or attributes to trigger it, it is difficult to use despite being reproducible.
Currently the RFC seems to focus on reproducibility only. I think the statement about the UX improvement should either be dropped, or an explanation added as to how this is achieved.
I think it makes sense to clarify which users are targeted by the RFC. Is it
nixpkgs maintainers
ordownstream users
or both?What
downstream users
need is a way to create, update and modify package sets to their needs. For them, these interactions need to be simple, more than reproducible. The modifications become manifested in the code anyways and reproducibility is ensured from there on.Increasing the complexity of tools in order to comply with the RFC can introduce a cost for these users, with no added benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sense that when it is easy to reproduce the state of the generated code as present in the repository, it is easier to also effectuate desired changes. If it is, for example, not possible to update generated code without pulling in all available updates, it may be harder to just change a specific, little thing.
This would be my account of this, but perhaps the paragraph can be improved.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This part is not talking about "reproducibility" as a Nix concept: it talks about extending, updating or modifying existing code. The wording might require some polish here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly recommend against Fixed-ouput drvs. they lead to a situation where it’s trivial to run into a reproducibility problem. The prefetching step as mentioned a few lines later is the way to go, since that way only one maintainer generates the file and we are not at the mercy of $version resolver being reproducible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version solvers etc. should be executed inside the normal derivation. Fixed output derivations would be package repository indices etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Profpatsch I am a bit confused what you mean? I mean fixed output derivations to just fetch things, not fixed output derivations that to a lot of arbitrary "vendoring" work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but then I’d just call them “fixed output fetchers” here, to make the distinction.
Cause you can also used FODs for doing arbitrary stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we that anyway no matter how we implement it? Vendored FOD or not? We somehow need to pin dependency versions and that part must be reproducible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is necessary, in most cases the input addressed approach will be ok: We have fixed-output data files as an input (which is updated via an "impure step" i.e. an update script) and run a solver in a regular derivation (which is hopefully deterministic!).
For taming quicklisp I don't think impure derivations are useful, since they are still sandboxed, aren't they?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the quicklisp lockfile fully self-contained? The ideal is when the lockfile can be converted to Nix because it has all the information we need, and the hashes are compatible with Nix. In most cases, there are often missing pieces that require querying the Internet.
For example the yarn.lock files are fully self-contained unless one of the dependencies points to a git repo. Then you have to go fetch that repo and calculate the Nix hash out of it.
That's where you would typically reach out for FOD, __noChroot or the new "impure" attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Quicklisp we look at the packages themselves to check the dependency information (index contains some claims about dependencies that are usually correct but we have seen some exceptions so we don't trust that blindly)
I don't think Quicklisp has lockfiles: normal mode of operations works within a single globally coordinated (and tested) version set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ericson2314 I think your suggested change from above is good and should be applied.
The approach of running a version resolver inside a derivation is not a practical approach for some ecosystems.
The reason usually being that there is no package index readily available to feed into the
solver derivation
.As mentioned above, some ecosystems expose dependency info only inside package sources, not offering a central index.
We can build our own package index by crawling all packages regularly, but the upstream resolver will most likely be incompatible to our custom package index.
-> we now have to invent our own dependency resolver as well.
I already went through all of that when building mach-nix which is a nix code generator for python using a custom python package index pypi-deps-db.
My personal experience is that this approach introduces significant maintenance overhead. Not only must the
crawler
/db
/resolver
be compatible to the complete history of standardization of the ecosystem but also adopt all upcoming standards to not fall out of sync.Therefore I think it is a good idea for the RFC to allow a minimal amount of
impure operations
. This simply allows to use upstream tooling to lock down the inputs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to clarify how the dependency tree should look like. Is my intuition here correct?
EDIT: I think this page from dream2nix documentation describes exactly what we want to achieve, does it not? If so, would be nice to reference it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can give a specific example of Gradle lockfiles. It specifies the package and its version, but it doesn't specify its hash. There is a separate mechanism for storing dependencies' hashes, but due to the way Gradle works the same version may be fetched from a different repository in a different part of the build graph, which may have a different hash, but Gradle only stores one hash per dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear what issue is being pointed out. Is it about hashes not being cryptographically strong, or that they were forced out of Nix ecosystem, or that it's a mix of the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most generally, any hash we cannot cajoul into a store path or fixed output derivation.
It needs to be a hash algo we support, and (assuming no one else supports NAR) a hash of a flat file (such as a tarball).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just support the ones we don't right now, but I do think it's mainly cryptographic strength. We don't want to support non-cryptographic hashes, yet we support SHA1 ironically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true. Given a commit, you can fetch the tree corresponding to that commit without fetching any of the parent commits (or their data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And just to make it explicit: indeed the downloaded content can be verified (the previous history is mediated in hashing by just the hash of the parent commit, which is small)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@L-as
Err i mean the commit hash doesn't work as a fixed output hash. Yes, it does identify what to download, but isn't sufficient to very the downloaded tree in isolation.
@7c6f434c Yes, I mentioned such Merkle inclusion proofs as future work in a comment in the IPFS RFC: https://github.com/NixOS/rfcs/pull/133/files#r957445525. But I think they are well out of scope for this one.
Any idea how I might make this clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the part about downloading the whole history and saying that git commit hashes are not supported by Nix would be an improvement. If you want to give a justification why they probably won't be supported really soon, say they are more complicated to verify than normal content hashes. I just agreed that whole-history justification doesn't justify things because it's not completely true; I agree that extra complexity (and git-specificity) justifies lack of Nix support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sternenseemann, fetching the crates.io index is pure (it's an FOD). So if that's an example of "prefetching", why does the text describe "prefetching" as being "run impurely"?
This terminology is a landmine that we will regret. If you prefer something besides "map-building" then great, go with that. Just please don't overload an already-heavily-used term to mean the opposite (i.e. impure) of what it currently means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am convinced in getting rid of "prefetch". @sternenseemann among other things the "pre" makes a terminology based on statefulness. This is sort of confusing and generally not how we want people to think about Nix things, so an alternative non-state-oriented term seems like an upgrade to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is... not really helping much.
We can't just say "
lang2nix
tool authors and language maintainers I dunno" will do it. This is not an actionable point. I think the point of RFCs is to provide a clear path forward; shifting the burden of complying to an "idea" that is not even standardized onto the community will get us nowhere.Let's start talking with @DavHau , a lot. He's the biggest player out there. He has spent quite a while developing the solution. He has spent no less time working on formal definitions that are of utmost importance here: he documented architecture, its components in detail, extension details, and he has quite detailed API details in the works. This is outstanding groundwork that exists and is working already; let's go ahead and discuss the critical details while we're in a discussion stage. Going through with the PR that states "the community will figure it out somehow" will at best leave @DavHau to do all the work himself without upstream support; at worst it will render his work obsolete because we have decided to do something else long after the discussion phase has ended.
I propose @DavHau to become a shepherd. He might not have the time for it, or might decline for any other reason. This is fine. But we NEED his input on this matter; he's the most competent person in the room on the topic.
We should also consider the lengths at which we are willing to go to. At the moment of me writing this, 495/500 npm packages are built using
dream2nix
correctly. That sounds good enough to deserve some serious evaluation regarding upstreaming the efforts. I propose to start discussion with maintainers ofnodePackages
about it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using standard IFD might be worth including.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is below, no? Because hydra will stall out with no good GUI feedback with IFD I don't think it is a realistic proposal yet :/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I proposed above - I think we should begin with detailed discussion with
dream2nix
maintainer. After that, the question resolves itself - the "grace period" is as long as he and relevantnixpkgs
maintainers have agreed on.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds awful, I hate how manual
bootstrap-tools
are, doing it to the whole Nixpkgs feels even worse.The question, though: Why can't only the generated parts of Nixpkgs be bootstrapped using the rest? That would better than either alternative, IMHO.
In fact, I wanted to do this for
all-packages.nix
in NixOS/nixpkgs#50643 (4 years before #140 and NixOS/nixpkgs#237439).The ultimate idea (which was abandoned after the initial pushback) there was to eventually generate most of
all-packages.nix
by walking the Nixpkgs tree via a simple shell script.So,
stdenv
and the knot of packages supporting it would have been left in a tiny hand-written equivalent ofall-packages.nix
, and the rest ofall-packages.nix
would have been auto-generated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this RFC should pose an incremental step, not a grand new design. In particular, the goal is to integrate all current codegen into this scheme while keeping changes necessitated by this to a minimum. Codegen tools in nixpkgs are packaged using code generated by them already today, so the bootstrap problem already exists and can not be alleviated easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes, but there would be quite a difference between bootstrapping from an autogenerated
.nix
file that depends on current version of Nixpkgs (and which you can still read and understand, even if it is technically programmatically generated) and bootstrapping from an autogenerated.nix
file that depends on an outdated version of Nixpkgs that bootstraps from an autogenerated.nix
file that depends on an even more outdated version of Nixpkgs, which bootstraps....There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why this is listed as an alternative to the actual proposal which does not require that!