Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support to anyopen for taking filelike objects; remove broken .zip support #197

Merged
merged 10 commits into from
Jun 30, 2022

Conversation

dotsdl
Copy link
Member

@dotsdl dotsdl commented Jun 8, 2022

Use in this way requires specifying the compression, if any, that the stream features in order to decompress it.

Use in this way requires specifying the compression, if any,
that the stream features in order to decompress it.
@codecov
Copy link

codecov bot commented Jun 8, 2022

Codecov Report

Merging #197 (a036582) into master (e2f8c39) will decrease coverage by 0.05%.
The diff coverage is 94.73%.

❗ Current head a036582 differs from pull request most recent head 8240e64. Consider uploading reports for the commit 8240e64 to get more accurate results

@@            Coverage Diff             @@
##           master     #197      +/-   ##
==========================================
- Coverage   98.20%   98.15%   -0.06%     
==========================================
  Files          24       24              
  Lines        1340     1352      +12     
  Branches      290      295       +5     
==========================================
+ Hits         1316     1327      +11     
  Misses          5        5              
- Partials       19       20       +1     
Impacted Files Coverage Δ
src/alchemlyb/parsing/util.py 96.55% <94.73%> (-3.45%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2f8c39...8240e64. Read the comment docs.

Copy link

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It looks like this will largely work for OpenFE's needs. I've tested it out with our in-progress storage, and we'll just have to be sure we handle encodings on our end when that is needed.

A couple minor comments/suggestions from looking over the code.

'zip': zipfile.ZipFile}

# if `datafile` is a stream
if hasattr(datafile, 'read'):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also explicitly consider the mode. (At least need to check if it has write for writing modes.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far anyopen hasn't been used with writing in mind in alchemlyb, but there's no reason it couldn't be I suppose. Added explicit checks for certain modes, and tests for roundtripping through a file and a stream.

Comment on lines 36 to 38
compression : str
Use to specify compression.
Must be one of 'bz2', 'gz', 'zip'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if pathlike, the filename's extension overrides the explicitly given compression here, which seems odd (although a conflict between compression algo and filename is a sign of something having gone seriously wrong).

Whether that stays as-is or not, the docstring here should clarify what happens if compression is given with a pathlike as datafile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. I've changed the behavior such that compression, if specified, always forces the compression assumed by anyopen.

Comment on lines 51 to 54
# compression selections available
compressions = {'bz2': bz2_open,
'gz': gzip_open,
'zip': zipfile.ZipFile}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we safely assume that the options here will always correspond to the extensions, just without the .? If so, maybe it would be better to just do something like compression = "." + compression and use the extensions list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted compression dict to make things more explicit, in which we specify either bzip2 or gzip for compression.

@@ -12,3 +14,42 @@ def test_gzip():
for filename in dataset['data'][leg]:
with anyopen(filename, 'r') as f:
assert type(f.readline()) is str

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to another comment, but maybe add a test for what happens when compression is given with a pathlike, just to ensure no API breaks on whatever behavior is decided on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added explicit test forcing compression despite wrong extension.

@dotsdl dotsdl changed the title Added support to anyopen for taking filelike objects Added support to anyopen for taking filelike objects; remove broken .zip support Jun 10, 2022
dotsdl added 3 commits June 9, 2022 22:16
We never had working zip archive support from the looks of it. This is
because zipfiles are not just a compression scheme, they are an archive
format that features compression.
@dotsdl
Copy link
Member Author

dotsdl commented Jun 10, 2022

Thank you @dwhswenson for the review! I've implemented changes in response to your feedback.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the upgrade. Small comments on docs. Coverage indicates that there's still quite a bit uncovered code, please increase testing. Please add an entry to CHANGES.

src/alchemlyb/parsing/util.py Show resolved Hide resolved
src/alchemlyb/parsing/util.py Show resolved Hide resolved
src/alchemlyb/parsing/util.py Show resolved Hide resolved
src/alchemlyb/parsing/util.py Show resolved Hide resolved
@orbeckst
Copy link
Member

@dotsdl any chance you can come back to the PR and address remaining comments? Would be nice to get it in.

@dotsdl
Copy link
Member Author

dotsdl commented Jun 28, 2022

@orbeckst yes, apologies, was on vacation through middle of last week and looping back to this shortly. Thank you for your review!

@dotsdl dotsdl added this to the 0.7.0 milestone Jun 29, 2022
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for the case where the wrong compression is set on a stream. Otherwise I am very happy with everything — thank you @dotsdl !

src/alchemlyb/parsing/util.py Show resolved Hide resolved
@orbeckst
Copy link
Member

I'll get #200 fixed first and then re-run CI here.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my remaining comment. All good! Thanks!

@dotsdl
Copy link
Member Author

dotsdl commented Jun 30, 2022

@orbeckst danke! 🙏

@orbeckst orbeckst merged commit 72622c9 into master Jun 30, 2022
@orbeckst orbeckst deleted the anyopen-stream branch June 30, 2022 02:05
@orbeckst
Copy link
Member

(The checks had passed before I added everything to CHANGES and I because I need to get some supper, I took a short cut...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants