Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic checking of hashes for transferred files #21

Merged
merged 4 commits into from
Jan 26, 2022
Merged

Conversation

ngreenwald
Copy link
Member

Given the large volumes of data being moved around, an easy way to ensure nothing went wrong during the file transfer process would make verifying transfers much easier.

This PR adds functionality to ensure that two directories share the exact same files, and that those files have the same hashes.

Closes #9

Copy link
Contributor

@ackagel ackagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, pretty much just a docstring comment.


with open(filepath, "rb") as f:
file_hash = hashlib.blake2b()
while chunk := f.read(8192):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah baby, python 3.8 walrus operator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks StackOverflow!

return file_hash.hexdigest()


def compare_directories(dir_1, dir_2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick q here. is the intention to compare hashes between directories on the CAC and those on an external drive, or...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. We copy a couple TBs off the CAC, want to ensure it went smoothly

"""Computes the hash of the specified file to verify file integrity

Args:
filepath: full path to file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg type, str | PathLike here (and the other function).

dir_1_files = io_utils.list_files(dir_1)
dir_2_files = io_utils.list_files(dir_2)

misc_utils.verify_same_elements(directory_1=dir_1_files, directory_2=dir_2_files)
Copy link
Contributor

@alex-l-kong alex-l-kong Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, the error messages of verify_same_elements (and, by extension, verify_in_list) can be a bit cryptic and, at the very least, inflexible. Is it necessary to add some flexibility to the error message displayed or is the default one fine for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default error message is okay, but the wording right now is still a little bit confusing. You can open an issue about it, and we can address it in future PR

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing about verify_same_elements. Also, will we be adding test functions to these (same with the auto-sweep and slanted-TMA PRs)?

@ngreenwald
Copy link
Member Author

Yes, "we" will definitely need to add test functions before the PRs get merged in. I opened up a bunch this weekend, some of them to hotfix stuff I needed right away. If I get to it later this week/next I'll write them, otherwise I may have one of you guys do it later

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one comment about testing the == case too in test_get_hash.

creed/file_hash_test.py Show resolved Hide resolved
Copy link
Contributor

@ackagel ackagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. only one mega optional comment

creed/file_hash_test.py Outdated Show resolved Hide resolved
@ngreenwald ngreenwald merged commit b41624c into main Jan 26, 2022
@ngreenwald ngreenwald deleted the file_hash branch January 26, 2022 00:16
@camisowers camisowers added the enhancement New feature or request label Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MD5 check
4 participants