-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic checking of hashes for transferred files #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, pretty much just a docstring comment.
|
||
with open(filepath, "rb") as f: | ||
file_hash = hashlib.blake2b() | ||
while chunk := f.read(8192): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah baby, python 3.8 walrus operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks StackOverflow!
return file_hash.hexdigest() | ||
|
||
|
||
def compare_directories(dir_1, dir_2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick q here. is the intention to compare hashes between directories on the CAC and those on an external drive, or...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. We copy a couple TBs off the CAC, want to ensure it went smoothly
creed/file_hash.py
Outdated
"""Computes the hash of the specified file to verify file integrity | ||
|
||
Args: | ||
filepath: full path to file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arg type, str | PathLike
here (and the other function).
dir_1_files = io_utils.list_files(dir_1) | ||
dir_2_files = io_utils.list_files(dir_2) | ||
|
||
misc_utils.verify_same_elements(directory_1=dir_1_files, directory_2=dir_2_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, the error messages of verify_same_elements
(and, by extension, verify_in_list
) can be a bit cryptic and, at the very least, inflexible. Is it necessary to add some flexibility to the error message displayed or is the default one fine for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default error message is okay, but the wording right now is still a little bit confusing. You can open an issue about it, and we can address it in future PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one thing about verify_same_elements
. Also, will we be adding test functions to these (same with the auto-sweep and slanted-TMA PRs)?
Yes, "we" will definitely need to add test functions before the PRs get merged in. I opened up a bunch this weekend, some of them to hotfix stuff I needed right away. If I get to it later this week/next I'll write them, otherwise I may have one of you guys do it later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one comment about testing the ==
case too in test_get_hash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. only one mega optional comment
Given the large volumes of data being moved around, an easy way to ensure nothing went wrong during the file transfer process would make verifying transfers much easier.
This PR adds functionality to ensure that two directories share the exact same files, and that those files have the same hashes.
Closes #9