-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AuxReader for EDR Files #3749
AuxReader for EDR Files #3749
Conversation
Hello @BFedder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2022-09-12 10:12:49 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far. Just some small comments and nitpicks. Would be good to add type hints also when you have decided on final design. :)
Codecov ReportBase: 94.31% // Head: 94.33% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #3749 +/- ##
===========================================
+ Coverage 94.31% 94.33% +0.02%
===========================================
Files 192 193 +1
Lines 24781 24975 +194
Branches 3341 3369 +28
===========================================
+ Hits 23371 23561 +190
- Misses 1362 1365 +3
- Partials 48 49 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@@ -76,6 +76,8 @@ inputs: | |||
default: 'pytest-cov<=2.10.1' | |||
pytest-xdist: | |||
default: 'pytest-xdist' | |||
pyedr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we making pyedr an optional dependency? I'm somewhat ok either way, if we could drop the 14MB test files in the next release it's small enough that it could be core if we wanted to. Thoughts @MDAnalysis/coredevs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, pyedr needs to go into the setup.py
either as an extra_requires (if not a core dependency), or just as an install_requires otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can decrease the size to <1 MB then it's just more convenient to have it as a core dependency — it's light weight and we maintain it.
@@ -76,6 +76,8 @@ inputs: | |||
default: 'pytest-cov<=2.10.1' | |||
pytest-xdist: | |||
default: 'pytest-xdist' | |||
pyedr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs adding to the azure pipelines file:
mdanalysis/azure-pipelines.yml
Lines 84 to 98 in 2c32ed1
python -m pip install --only-binary=h5py | |
cython | |
hypothesis | |
matplotlib | |
numpy | |
packaging | |
pytest | |
pytest-cov | |
pytest-xdist | |
scikit-learn | |
scipy | |
h5py>=2.10 | |
tqdm | |
threadpoolctl | |
fasteners |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not at this point making pyedr a core dependency - going by the fact that we aren't touching setup.py etc... (I would like to get more coredev oks on this anyways)
As a result, I would ask that this be moved to the pip optional dependencies section.
package/MDAnalysis/auxiliary/EDR.py
Outdated
:class:`~MDAnalysis.auxiliary.base.AuxStep` | ||
""" | ||
|
||
def __init__(self, time_selector="Time", data_selector=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should go ahead and add type hints here as you go along
I have now adapted the tests for the trajectory-independent functionality. Next step will be the association of energy data to trajectory frames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor drive-by comments
The remaining PEP8 complaints are URLs, a table, and the import done to match what's already present. |
@IAlibay you happy with moving forward with the PEP8 the way it is? |
I'll try to give it a read through tomorrow, but technically I'm on holiday so no promises (I'm on "pressing matters or stuff that I want to play with" only atm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review set - base maintenance stuff.
Pyedr should remain an optional dependency for now.
@@ -76,6 +76,8 @@ inputs: | |||
default: 'pytest-cov<=2.10.1' | |||
pytest-xdist: | |||
default: 'pytest-xdist' | |||
pyedr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not at this point making pyedr a core dependency - going by the fact that we aren't touching setup.py etc... (I would like to get more coredev oks on this anyways)
As a result, I would ask that this be moved to the pip optional dependencies section.
- name: pip_install | ||
shell: bash -l {0} | ||
env: | ||
PIP_MIN_DEPS: | | ||
${{ inputs.coverage }} | ||
${{ inputs.pytest-cov }} | ||
${{ inputs.pytest-xdist }} | ||
${{ inputs.pyedr }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, optional dep please.
azure-pipelines.yml
Outdated
@@ -96,6 +96,7 @@ jobs: | |||
tqdm | |||
threadpoolctl | |||
fasteners | |||
pyedr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not common practice to have optional deps in azure pipeline runners, I'd remove it for now.
package/CHANGELOG
Outdated
@@ -35,7 +47,6 @@ Deprecations | |||
|
|||
08/29/22 IAlibay, PicoCentauri, orbeckst, hmacdope, rmeli, miss77jun, rzhao271, | |||
yuxuanzhuang, hsadia538, lilyminium | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't remove this.
I see that it is being added to setup.py... honestly I don't know, here @orbeckst @hmacdope can you please run this through the business channel? I've become increasingly reluctant to add new dependencies to core (indeed my plan is to remove half the ones we already have by 3.0). I'll agree to whatever the majority agreement is. edit: I should note, if we make it a core requirement, that means we need to ensure that we either ship it with very small test files, or we don't ship the test files at all. The ~ 20 MB tarball with the tests is not appropriate for the core library. That also needs to be done before 2.4.0 is released. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunately a quick review, but there are a few things that need addressing.
with pytest.warns(UserWarning, match="AuxReader: memory usage " | ||
"warning! Auxiliary data takes up 3.328e-06 GB of " | ||
r"memory \(Warning limit: 1e-08 GB\)"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String match on a float seems like it might be a bit of a flaky test. I'd be happy with just matching through to warning!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use regular expressions for something vaguely float-like. Btw, the .
is already matched as "any character".
Maybe r"warning! Auxiliary data takes up 3[0-9.]*e-06 GB of "
will roughly capture the message and will fail if the size changes dramatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the regular expression as suggested, thanks! The reason I match through to the warning limit here is to also test if changing the warning limit actually works as expected.
package/MDAnalysis/auxiliary/EDR.py
Outdated
|
||
.. note:: | ||
The following will change with the release of MDAnalysis 3.0, which is | ||
scheduled for September 2023. From then on, the order of these two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add specific dates for releases in the docs. We never know what might happen and I prefer knowing exactly where our public facing schedule is (which right now is a work-in progress blog post).
package/MDAnalysis/auxiliary/EDR.py
Outdated
Note | ||
---- | ||
The file is assumed to be of a size such that reading and storing the full | ||
contents is practical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate based on the memory warning stuff
package/MDAnalysis/auxiliary/XVG.py
Outdated
raise ValueError('Step {0} has {1} columns instead of ' | ||
'{2}'.format(self.step, len(auxstep._data), | ||
'{2}'.format(self.step, | ||
len(auxstep._data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to change, use f-strings
package/MDAnalysis/auxiliary/base.py
Outdated
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One line between class methods?
package/setup.py
Outdated
@@ -603,6 +603,7 @@ def long_description(readme): | |||
'packaging', | |||
'fasteners', | |||
'gsd>=1.9.3', | |||
'pyedr', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go ahead with pyedr as a core dep, please pin to >=0.7.0 everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BFedder , sorry to come back with changes. After a quick discussion I became convinced that we should start out with pyedr being an optional dependency. We want to reduce the initial dependency footprint of MDAnalysis anyway and we especially cannot require people to install 20 MB of test files for pyedr. Until PR MDAnalysis/panedr#55 (or something similar that reduces the test set) is merged, we definitely want to keep pyedr optional.
Please change the PR accordingly. For some inspiration, see @hmacdope 's TNG PR #3765 to see how he set up the whole "optional reader" thing. For example, the dependency then goes under the "extra_formats" in setup.py and you typically guard your imports with a documented HAS_PYEDR module variable.
I'm not sure what to do about the Cirrus CI failing, the error message seems like it's not due to any change I made... Could I ask for advice, please? |
Ping @tylerjereddy? RE cirrus? I'm not so sure either. |
I've restarted the task, as Tyler previous detailed somewhere else, the issue seems to be purely down to Cirrus getting hammered pretty heavily by a bunch of folks that suddenly realised they had free M1 runner available. Hopefully it should complete now. |
Looks good now, feel free to deactivate Cirrus if it becomes annoying of course. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more things from a quick review.
with pytest.raises(ImportError, match="please install pyedr"): | ||
aux = mda.auxiliary.EDR.EDRReader(AUX_EDR) | ||
|
||
# The EDRReader behaves differently from the XVGReader, creating dummy test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this comment needs to be here, could it possibly be docstring instead?
return_time_diff=True) | ||
|
||
assert frame == idx | ||
np.testing.assert_allclose(time_diff, idx * 0.002) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why call the full signature here when you import assert_allclose
specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bunch of comments, mostly in response to @IAlibay 's review. Please see inline.
**kwargs): | ||
# allow auxname to be optional for when using reader separate from | ||
# trajectory. | ||
self.auxname = auxname | ||
self.represent_ts_as = represent_ts_as | ||
self.cutoff = cutoff | ||
# no cutoff was previously passed as -1. This check is to maintain this | ||
# behaviour: negative number is appropriately turned to None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not quite the old behavior: we now use any negative number as None
. Previously only -1 would be seen as default. Arguably, that was a bug and we would have gotten weird behavior for -2.
Therefore, I support the change as is.
I think you can add a CHANGELOG entry for fixes that you removed undefined behavior for negative numbers <-1 for cutoff.
package/MDAnalysis/auxiliary/base.py
Outdated
if term not in value: | ||
value[term] = cutoff_data[dataset][term] | ||
else: | ||
value[term] += cutoff_data[dataset][term] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a defaultdict
avoids the if/else. Or initialize a dict with all the terms and zeros. But avoid branches insides loops if you can.
(It might not be a big performance penalty here but I think it's always good to strive for cleaner code.)
I have now used a defaultdict in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few final couple of things and I think we're good.
I'll approve now so that I'm not blocking once this get addressed.
package/CHANGELOG
Outdated
|
||
Enhancements | ||
* AuxReaders not have a memory_limit parameter to control when memory usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* AuxReaders not have a memory_limit parameter to control when memory usage | |
* AuxReaders now have a memory_limit parameter to control when memory usage |
|
||
* 2.4.0 | ||
|
||
Fixes | ||
* Fixes distances.between not always returning AtomGroup (Issue #3794) | ||
* Upgrade to chemfiles 0.10.3 (Issue #3798) | ||
* Auxiliary; determination of representative frames: Removed undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kinda lost in the comment sea, I'd have thought it'd be fine to only have one of these in "changes", but if someone requested it here too then that's fine. Although maybe newest entry first ;)
aux_spec = {term: term for term in self.terms} | ||
|
||
elif isinstance(aux_spec, str): | ||
# This is to keep XVGReader functioning as-is, working with strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment not review thing - The fact that we have a special case for XVG kinda breaks the idea that this should be auxreader independent (I know we've discussed this before). We should keep an eye on this when we add new readers, if this ever expands we'll need to revisit it.
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
This latest batch of reviews is now also addressed, and the checks pass. I believe the EDRReader is now ready to be merged, then 🥳 Thanks everyone for your mentorship, reviews, and help over these past few months! I have learnt a lot and am looking forward to continue contributing to MDAnalysis. |
Congratulations @BFedder! |
Fixes #3629
With
panedrlite now closer topyedr now being a thing, and the actual code itself less likely to still change, I have now finally started work on the EDR auxiliary reader that I want to implement as part of my GSoC project.Currently, this is just a skeleton, but it already allows EDR data to be read and assigned to AuxStep instances.
To Do:
PR Checklist