-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support to anyopen for taking filelike objects; remove broken .zip
support
#197
Conversation
Use in this way requires specifying the compression, if any, that the stream features in order to decompress it.
Codecov Report
@@ Coverage Diff @@
## master #197 +/- ##
==========================================
- Coverage 98.20% 98.15% -0.06%
==========================================
Files 24 24
Lines 1340 1352 +12
Branches 290 295 +5
==========================================
+ Hits 1316 1327 +11
Misses 5 5
- Partials 19 20 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It looks like this will largely work for OpenFE's needs. I've tested it out with our in-progress storage, and we'll just have to be sure we handle encodings on our end when that is needed.
A couple minor comments/suggestions from looking over the code.
src/alchemlyb/parsing/util.py
Outdated
'zip': zipfile.ZipFile} | ||
|
||
# if `datafile` is a stream | ||
if hasattr(datafile, 'read'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably also explicitly consider the mode. (At least need to check if it has write
for writing modes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far anyopen
hasn't been used with writing in mind in alchemlyb
, but there's no reason it couldn't be I suppose. Added explicit checks for certain modes, and tests for roundtripping through a file and a stream.
src/alchemlyb/parsing/util.py
Outdated
compression : str | ||
Use to specify compression. | ||
Must be one of 'bz2', 'gz', 'zip'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, if pathlike, the filename's extension overrides the explicitly given compression here, which seems odd (although a conflict between compression algo and filename is a sign of something having gone seriously wrong).
Whether that stays as-is or not, the docstring here should clarify what happens if compression
is given with a pathlike as datafile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. I've changed the behavior such that compression
, if specified, always forces the compression assumed by anyopen
.
src/alchemlyb/parsing/util.py
Outdated
# compression selections available | ||
compressions = {'bz2': bz2_open, | ||
'gz': gzip_open, | ||
'zip': zipfile.ZipFile} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we safely assume that the options here will always correspond to the extensions, just without the .
? If so, maybe it would be better to just do something like compression = "." + compression
and use the extensions
list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted compression dict to make things more explicit, in which we specify either bzip2
or gzip
for compression.
@@ -12,3 +14,42 @@ def test_gzip(): | |||
for filename in dataset['data'][leg]: | |||
with anyopen(filename, 'r') as f: | |||
assert type(f.readline()) is str | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to another comment, but maybe add a test for what happens when compression
is given with a pathlike, just to ensure no API breaks on whatever behavior is decided on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added explicit test forcing compression despite wrong extension.
.zip
support
We never had working zip archive support from the looks of it. This is because zipfiles are not just a compression scheme, they are an archive format that features compression.
Thank you @dwhswenson for the review! I've implemented changes in response to your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the upgrade. Small comments on docs. Coverage indicates that there's still quite a bit uncovered code, please increase testing. Please add an entry to CHANGES.
@dotsdl any chance you can come back to the PR and address remaining comments? Would be nice to get it in. |
@orbeckst yes, apologies, was on vacation through middle of last week and looping back to this shortly. Thank you for your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for the case where the wrong compression
is set on a stream. Otherwise I am very happy with everything — thank you @dotsdl !
I'll get #200 fixed first and then re-run CI here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my remaining comment. All good! Thanks!
@orbeckst danke! 🙏 |
(The checks had passed before I added everything to CHANGES and I because I need to get some supper, I took a short cut...) |
Use in this way requires specifying the compression, if any, that the stream features in order to decompress it.