Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that a conda-lock file "agrees" with its sources without triggering a solve #479

Open
2 tasks done
MattF-NSIDC opened this issue Aug 4, 2023 · 10 comments
Open
2 tasks done

Comments

@MattF-NSIDC
Copy link

MattF-NSIDC commented Aug 4, 2023

Checklist

  • I added a descriptive title
  • I searched open requests and couldn't find a duplicate

What is the idea?

I'd like to be able to validate that a lock file has been created from the current source specifiction's contents. So for example, if someone changes environment.yml and opens a PR, my CI could very quickly catch that the lockfile wasn't updated by re-calculating a hash and comparing it to content_hash in the lockfile.

conda-lock lock --check-input-hash is very close to what I want to do, with the downside that instead of failing immediately if the hash validation fails, it will solve the environment and update the lockfile. What about a --check-input-hash-without-solve or similar flag for this behavior?

Why is this needed?

This helps projects automatically catch and quickly prompt (by e.g. failing CI job or bot comment) contributors who have forgotten to lock the environment after changing it. This repo2docker-based project, for example, expects users to manually update their lockfile or use /condalock in a PR comment to trigger a bot to update the lock file, but they have to remember to do that!

Detecting that environment.yml has changed isn't enough. Sure, if conda-lock.yml also hasn't changed, we can be relatively assured that the user forgot to update the lock-file. But if conda-lock.yml has changed, how do we know the version of environment.yml that's being committed is the same content that generated conda-lock.yml? The contributor could have updated environment.yml, locked it, then changed environment.yml again before committing.

Using pre-commit with --check-input-hash is a great solution for people like me who are going to clone the repo and work on their local machines, but changing environment.yml in the GitHub interface is also a common workflow for some projects/users.

What should happen?

On running some command like conda-lock ensure-sources or conda-lock lock --check-input-hash-without-solve:

  1. Calculate a new content_hash from sources in conda-lock.yml
  2. Does it match content_hash in conda-lock.yml?
  3. If yes, exit 0, if no exit non-zero

No solve or write to a file should occur at any time.

This should also work for "explicit" lockfiles which contain an input_hash in the header.

Additional Context

I apologize if this has been asked already! Tried to find it and came up empty.

@MattF-NSIDC
Copy link
Author

This feels like a "good first issue". I'd like to take a stab at it this weekend :)

@MattF-NSIDC MattF-NSIDC changed the title Validate that an input hash matches current sources Validate that an input hash matches current sources without also triggering a solve Aug 4, 2023
@maresb
Copy link
Contributor

maresb commented Aug 4, 2023

Hi @MattF-NSIDC, great to hear from you again! This seems fairly straightforward, and I have no objection, but be warned that content_hash may disappear soon. Please see the discussion in #432.

@MattF-NSIDC
Copy link
Author

Hey Ben, likewise :)

Do you expect that change will come with a version increment of the unified lock file format? I also think it would be a valuable to retain support for this validation functionality by enabling conda-lock to continue to output and operate on version 1 unified lockfiles!

@maresb
Copy link
Contributor

maresb commented Aug 4, 2023

Yes, we will definitely have a version increment and a transition period.

@MattF-NSIDC
Copy link
Author

MattF-NSIDC commented Aug 4, 2023

Wonderful :)

Do you think this suggestion (your pseudocode) are in-scope for a new conda-lock feature? This would address the same problem, but better IMO. Your point about equivalent but unequal specifications was a good one.

@MattF-NSIDC MattF-NSIDC changed the title Validate that an input hash matches current sources without also triggering a solve Validate that a conda-lock file "agrees" with its sources without triggering a solve Aug 4, 2023
@maresb
Copy link
Contributor

maresb commented Aug 4, 2023

That would absolutely be in-scope for a new conda-lock feature! Would you be interested in tackling this? (I'd really like to do it myself, but I'm really time-constrained at the moment, so I'd be really grateful for any help!)

@MattF-NSIDC
Copy link
Author

I'm going to try and give it a shot this weekend, will let you know whether I'm able to work it in or not! You may see a PR from my personal account @mfisher87 (I continue to regret having two GitHub accounts!)

@maresb
Copy link
Contributor

maresb commented Aug 4, 2023

That's wonderful!!!

If along the way you find tangential things that you want to improve / document, please be bold and don't hold back.

For code review, I greatly prefer lots of smaller commits rather than one monolith, if you can manage it.

@MattF-NSIDC
Copy link
Author

For code review, I greatly prefer lots of smaller commits rather than one monolith, if you can manage it.

No problem :)

@mfisher87
Copy link
Contributor

I've been using this feature as a learning opportunity for some friends who are new to Python and open-source (and for myself to learn about the conda-lock code). We started with implementing the "check input hashes and exit" approach because it was a quicker path to victory, and didn't quite nail it down in our limited session this weekend. We'll probably get it next weekend and move on to the "check for agreement" approach. I was thinking we could submit them each as separate pull requests, and reject one of them if needed.

If someone else wants to implement any of this, we won't be offended if you don't wait for us. We're just in it for learning :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants