-
-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lockfile invalidation -- add metadata to generated lockfiles #12427
Lockfile invalidation -- add metadata to generated lockfiles #12427
Conversation
@Eric-Arellano Happy to take pointers here, otherwise, fyi! |
Notably, this implementation should make it easy to add the usage instructions automagically, by amending this line: https://github.com/pantsbuild/pants/pull/12427/files#diff-e09a7e81d00d6394fbb4336f41dde1ba1aea4b0b1af07d173124ecb65b9e9308R155 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay!
m = hashlib.sha256() | ||
for requirement in self.requirements: | ||
m.update(requirement.encode("utf-8")) | ||
for constraint in self.interpreter_constraints: | ||
m.update(str(constraint).encode("utf-8")) | ||
return m.hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud here..my first thought was "This seems inefficient to iterate over every element! You could hash the whole collection to avoid that".
But I suspect this is intentional to iterate over every element, and that it makes the algorithm more robust to hash collisions? Is that right? If so, maybe add a comment explaining this nuance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secure hashes from hashlib
work on bytes
objects -- you can't hash an iterable, but you can hash a bytes
representation of every item of an iterable. I don't think there's any nuance here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, @stuhood thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning to this, under the hood, m.update(a + b)
is equivalent to m.update(a); m.update(b)
(see https://docs.python.org/3/library/hashlib.html) so I presume that hashlib
is doing something with stringio
or similar under the hood to build up the input to the digest function, which is about as efficient as what we'd end up doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I'm wondering if we could do m.update(repr(self.requiremenets).encode("utf-8"))
, so that would not quite be the same thing. I'm pretty sure that's more likely to have a hash collision, only not sure it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the repr
of the collection would be equivalent to adding a bit of salt to the hashes; I don't think either approach meaningfully changes the probability of a collision. The chance of a collision over a 256-bit cryptographic hash is vanishingly small either way, so the presence/otherwise of a few delimiters isn't going to meaningfully change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to consider in this is sort order. Are the requirements/constraints sorted when going into this, otherwise a change of order would yield different hashes, which I suppose is undesirable, as they would still resolve to the same thing once used, iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding tests!
m = hashlib.sha256() | ||
for requirement in self.requirements: | ||
m.update(requirement.encode("utf-8")) | ||
for constraint in self.interpreter_constraints: | ||
m.update(str(constraint).encode("utf-8")) | ||
return m.hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, @stuhood thoughts?
de8da6e
to
00b1d91
Compare
src/python/pants/backend/experimental/python/lockfile_metadata.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks.
The lockfiles should maybe be regenerated? But it's a tedious process..you have to run on a machine w/ Py36-Py39 (not M1) and manually restore the headers for how to regen. I'm personally fine with punting on it for now and one of us implementing #12382 first.
m = hashlib.sha256() | ||
for requirement in self.requirements: | ||
m.update(requirement.encode("utf-8")) | ||
for constraint in self.interpreter_constraints: | ||
m.update(str(constraint).encode("utf-8")) | ||
return m.hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I'm wondering if we could do m.update(repr(self.requiremenets).encode("utf-8"))
, so that would not quite be the same thing. I'm pretty sure that's more likely to have a hash collision, only not sure it matters.
src/python/pants/backend/experimental/python/lockfile_metadata.py
Outdated
Show resolved
Hide resolved
@Eric-Arellano Once the lockfile digests are being consumed, we can potentially add the regen information to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only small nits. Sounds like you're headed out for today, so I'll plan to merge this and then they can be fixed in your upcoming followup PR.
Thanks!
"""Returns an invalidation digest for the given requirements and interpreter constraints.""" | ||
m = hashlib.sha256() | ||
pres = { | ||
"requirements": [str(i) for i in requirements], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're already strings
"requirements": [str(i) for i in requirements], | |
"requirements": list(requirements), |
print( | ||
invalidation_digest( | ||
FrozenOrderedSet(requirements), InterpreterConstraints(interpreter_constraints) | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale
) | ||
|
||
|
||
def test_hash_depends_on_requirement_source(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_hash_depends_on_requirement_source(): | |
def test_hash_depends_on_requirement_source() -> None: |
@Eric-Arellano I've fixed those |
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
… be simpler, I promise. # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
2ab468d
to
3694ae3
Compare
This WIP adds the lockfile invalidation header, but does not yet consume it.
Partially addresses #12415.