Add pV and U to reduced potential when they are given #62

harlor · 2018-10-21T13:22:40Z

As explained in #59 The current parser assumes U and pV to be given in the second respectively last column - which is not very robust.

With these changes the parser adds them to the potential differences only when they are given.

Moreover I am interested if these changes pass the tests.

codecov-io · 2018-10-21T14:06:51Z

Codecov Report

Merging #62 into master will increase coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #62      +/-   ##
==========================================
+ Coverage   98.12%   98.17%   +0.04%     
==========================================
  Files           9        9              
  Lines         481      493      +12     
  Branches       94      100       +6     
==========================================
+ Hits          472      484      +12     
  Misses          4        4              
  Partials        5        5

Impacted Files	Coverage Δ
src/alchemlyb/parsing/gmx.py	`98.49% <100%> (+0.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eba1f9a...bfbe0b0. Read the comment docs.

orbeckst

This looks sensible. However, there should also be a test case without either/or pV and U columns. You might be able to edit the benzene files or add new datasets. Either way, open a PR in https://github.com/alchemistry/alchemtest/ to add the data and then add tests here, please.

src/alchemlyb/tests/test_preprocessing.py

harlor · 2018-10-26T12:11:38Z

This looks sensible. However, there should also be a test case without either/or pV and U columns.
You might be able to edit the benzene files or add new datasets. Either way, open a PR in https://github.com/alchemistry/alchemtest/ to add the data and then add tests here, please.

Agree - when I find time to work my way into the testing system I will do that!

orbeckst · 2018-10-26T17:05:21Z

I think you are right regarding our tests having been wron regarding parsing (or lack thereof) of the correct columns. However, why does only statistical inefficiency change and not the actual free energies? (Sorry, I'm typing from mobile and it is difficult to look at code to answer my question myself).

@dldotson @ianmkenney can you you also have a quick look and confirm?

orbeckst · 2018-10-26T17:08:02Z

Er, Ping @dotsdl --- sorry, mistyped your handle in previous comment.

dotsdl

See my comment on the possible appearance of "Potential Energy". We'll need to at least address this before proceeding. Thanks for this!

dotsdl · 2018-11-01T05:10:30Z

src/alchemlyb/parsing/gmx.py

-    col_match = r"\xD\f{}H \xl\f{}"
+    h_col_match = r"\xD\f{}H \xl\f{}"
+    pv_col_match = 'pV'
+    u_col_match = 'Total Energy'


Checking the files you provided @harlor (https://userpage.fu-berlin.de/dominikwille/nvt300K.tgz), I see "Potential Energy (kJ/mol)" in the ti_0.xvg file. Should we also try and match that?

@mrshirts explained that in #59 (comment). So I think we need the total energy.

I uploaded a dataset with total energy here: http://dominikwille.userpage.fu-berlin.de/nvt300K_u.tgz

Okay, but I think depending on MDP options given to gromacs it's possible to have a dataset without "Total Energy" but with "Potential Energy". I was thinking that we could handle this in another PR with kwargs, but I don't believe this should be neglected.

Could you add a check for the existence of a "Potential Energy" column as you did for the "Total Energy" column, and if present, use this instead of the "Total Energy" column? That would satisfy me for the purposes of this PR

src/alchemlyb/tests/test_preprocessing.py

dotsdl · 2018-11-01T05:29:25Z

@harlor, thanks for putting this together. I see the issue as well. The Gromacs benzene dataset is indeed missing the potential energy (U_i) assumed by the u_nk parser, so currently it's giving really wrong results as in this test.

I am very much in favor of these changes. I'm trying to think of how we should alert users to issues where the parser can still give meaningful results, but that the result may not be in absolute terms (such as when potential energy is missing).

Perhaps the best approach is to throw an exception when either the pV or Potential Energy column is missing? We can then have keyword arguments such as use_pV=False (default True) and use_U=False (default True) to make the parser proceed with ignoring these values, giving "reduced potentials" without them. We can do the same with Total Energy with a use_H (default False).

Exceptions will be raised if one of these keyword args is given but can't be honored with the given dataset, or if two are mutually exclusive (such as use_U=True and use_H=True).

Thoughts? This doesn't have to be in the scope of this PR, but I think it's time we add some knobs to these parsers.

harlor · 2018-11-01T14:26:24Z

@dotsdl I like this Idea! And I think in principle we also know what we need when we parse gromacs datasets.

U is needed when the temperature is not the same for all states (Temperature is given in the header).
pV is needed when the volume is not constant (And as far as I know Gromacs always writes out pV in this case)

This doesn't have to be in the scope of this PR

Strongly agree! This PR is a Bugfix - What goes beyond that I suggest to do in a new PR.

dotsdl · 2018-11-03T19:44:36Z

I think if we address the presence of "Potential Energy", we'll be good to go on a merge for this.

@orbeckst, is there anything else you see? The benzene dataset doesn't have either "Total Energy" or "Potential Energy", so I think it satisfies your earlier comment already.

orbeckst · 2018-11-04T19:21:18Z

I agree with the above – do this one as a bare fix. However, I'd like to have tests with pV and U as part of this PR. I don't like adding code that is untested and deferring tests "to later".

Could you please raise an issue/PR in https://github.com/alchemistry/alchemtest/issues , reference this issue, and then we try to get the test data quickly into MDAnaysisTests. I know, this is all a bit of extra work but one of the basic ideas of alchemlyb was to make heavy use of best software engineering practices to keep the library well-tested at all times. Issues still slip through as you found, but at least by adding tests for the bug right away we will make sure that this bug does not show up again.

dotsdl · 2018-11-04T21:22:30Z

Agreed with @orbeckst. I look forward to accepting this contribution, so we'll need to make sure we test against examples that exhibit the problem it is addressing. @harlor, can you trim down the dataset you've linked to here and add to alchemtest as a PR? This will then allow us to build tests against it here in alchemlyb. That dataset will then be part of the infrastructure stabilizing alchemlyb going forward.

Check out the structure of some of the existing Gromacs datasets for examples.

harlor · 2018-11-15T18:12:49Z

I opened a PR in alchemtest with datasets that allow to write a test for this PR.

@orbeckst Do you consider it sufficient to check a certain value of the reduced potential?

orbeckst · 2018-11-15T19:44:36Z

I reviewed alchemistry/alchemtest#30.

For tests I would generally use aggregate calculations such as a mean or a sum, in addition to specific values.

harlor · 2018-11-16T20:02:26Z

Ok I added tests for specific elements and for the sum of elements on the diagonal.

Btw: Is it possible/a good idea to use hash values of dataframes to write tests?

orbeckst · 2018-11-16T20:38:58Z

Btw: Is it possible/a good idea to use hash values of dataframes to write tests?

Not a good idea if it contains floating point numbers.

For those you can never test for equality. Instead you do

from numpy.testing import assert_almost_equal

assert_almost_equal(calculated, reference)

orbeckst

be careful when checking floats!!

orbeckst · 2018-11-16T20:40:03Z

src/alchemlyb/tests/parsing/test_gmx.py

+    assert _diag_sum(dataset) == 16674041445589.646
+
+    # Check one specific value in the dataframe
+    assert float(extract_u_nk(dataset['data']['AllStates'][0], T=300).iloc[0][0]) == -15659.655560881085


needs to use assert_almost_equal or similar. You cannot compare floats for equality because this depends on hardware etc.

orbeckst · 2018-11-16T20:40:16Z

src/alchemlyb/tests/parsing/test_gmx.py

+    assert _diag_sum(dataset) == 20572988148877.555
+
+    # Check one specific value in the dataframe
+    assert float(extract_u_nk(dataset['data']['AllStates'][0], T=300).iloc[0][0]) == 18.134225023007403


float comparison!

orbeckst · 2018-11-16T20:40:32Z

src/alchemlyb/tests/parsing/test_gmx.py

+    dataset = load_water_particle_without_energy()
+
+    # Check if the sum of values on the diagonal has the correct value
+    assert _diag_sum(dataset) == 20572988148877.555


float comparison!

orbeckst · 2018-11-16T20:40:39Z

src/alchemlyb/tests/parsing/test_gmx.py

+    dataset = load_water_particle_with_potential_energy()
+
+    # Check if the sum of values on the diagonal has the correct value
+    assert _diag_sum(dataset) == 16674041445589.646


float comparison!

harlor · 2018-11-16T20:56:20Z

As you suggested I use assert_almost_equal now.

Btw: Is it possible/a good idea to use hash values of dataframes to write tests?

Not a good idea if it contains floating point numbers.

Good to know!

orbeckst · 2018-11-17T01:15:51Z

The tests fail because the accuracy is not high enough. Add

assert_almost_equal(..., decimal=6)

to decrease accuracy.

When you work with continuous integration, check the output of the tests and if it failed (red X), fix it/ask for advice for how to fix.

You should also be able to run the tests locally with

pytest

to catch anything obvious so that you don't have to wait for Travis.

orbeckst · 2018-11-17T01:16:12Z

P.S.: I think it was decimal, could also be decimals.

orbeckst · 2018-11-17T01:16:33Z

P.P.S.: Also merge master into your branch.

harlor · 2018-11-17T11:57:33Z

I will adopt local testing into my development workflow!

It turned out that I get the datasets from alchemtest in a different order on my machine - which causes not only these tests but also TestStatisticalInefficiency to give different results.

Is using something like sorted(glob(...)) instead of glob(...) in alchemtest, which sorts the datasets by filename, a good idea?

See: https://stackoverflow.com/questions/6773584/how-is-pythons-glob-glob-ordered

orbeckst · 2018-11-19T17:53:29Z

yes!

(I just merged your PR alchemistry/alchemtest#31)

harlor · 2018-11-19T18:31:33Z

Thanks - is this PR also ready then?

orbeckst

Looks great.

Minor comments – please address as necessary and ping me or @dotsdl when you're done.

orbeckst · 2018-11-19T18:33:17Z

src/alchemlyb/tests/parsing/test_gmx.py

+
+    # Check one specific value in the dataframe
+    assert_almost_equal(
+        float(extract_u_nk(dataset['data']['AllStates'][0], T=300).iloc[0][0]),


Why is float needed?

Should probably work without.

orbeckst · 2018-11-19T18:33:35Z

src/alchemlyb/tests/parsing/test_gmx.py

+
+    # Check one specific value in the dataframe
+    assert_almost_equal(
+        float(extract_u_nk(dataset['data']['AllStates'][0], T=300).iloc[0][0]),


Why is float needed?

orbeckst · 2018-11-19T18:34:11Z

src/alchemlyb/tests/parsing/test_gmx.py

+
+    # Check one specific value in the dataframe
+    assert_almost_equal(
+        float(extract_u_nk(dataset['data']['AllStates'][0], T=300).iloc[0][0]),


Why is float needed?

orbeckst · 2018-11-19T18:34:29Z

src/alchemlyb/tests/parsing/test_gmx.py

+            for i in range(len(dataset['data'][leg])):
+                ds += u_nk.iloc[i][i]
+
+    return ds


add newline

orbeckst · 2018-11-19T18:36:34Z

@dotsdl you will also need to check and approve. Apparently, we have strict merge rules here... not enough to only satisfy one reviewer ;-).

orbeckst · 2018-11-21T23:01:14Z

@dotsdl do you want to review or should I go ahead and merge? If I don't hear anything by 6pm AZ I'll merge using admin powers.

dotsdl · 2018-11-22T21:10:03Z

@orbeckst I should be able to review this during the weekend. If I don't manage to by Monday 11/26, please proceed with the merge.

harlor · 2018-11-23T10:47:59Z

I don't know why the build of the tests failed - it worked after a restart.

dotsdl · 2018-11-23T16:03:00Z

It appeared to be a problem with building scikit-learn wheels, so something upstream was broken. Looks resolved now. I'll be reviewing this later today. Thanks @harlor!

dotsdl · 2018-11-26T05:47:36Z

Looks good to me @harlor! Thanks for all the work you put into on this. And thanks @orbeckst for moving this along! Merging!

harlor added 2 commits October 21, 2018 15:13

Add pV and U to reduced potential when they are given

dcad2b8

Update TestStatisticalInefficiency after u_nk parser update

037bee4

orbeckst requested changes Oct 25, 2018

View reviewed changes

src/alchemlyb/tests/test_preprocessing.py Show resolved Hide resolved

dotsdl requested changes Nov 1, 2018

View reviewed changes

If Potential energy is given add it to reduced potential

65159f7

harlor mentioned this pull request Nov 15, 2018

Add water particle free energy datasets alchemistry/alchemtest#30

Merged

Add tests for the calculation of the reduced potential

27cb260

orbeckst requested changes Nov 16, 2018

View reviewed changes

Use assert_almost_equal instead of assert

0af3093

harlor added 4 commits November 17, 2018 09:55

Merge branch 'master' into reduced_potential_pr

837c266

Use decimal precision 6 for float comparison in tests

fe014b3

Use reasonable decimal precision for float comparison in tests

886b0ef

Compare with ordered datasets

f8ad375

harlor mentioned this pull request Nov 17, 2018

Avoid UNIX paths in accessor functions and sort datasets alchemistry/alchemtest#31

Merged

orbeckst approved these changes Nov 19, 2018

View reviewed changes

Remove unnecessary type conversion to float

74333c5

harlor mentioned this pull request Nov 22, 2018

Switching Travis CI to Xenial (Ubuntu 16.04) gives different statistical inefficiency test results #66

Closed

Merge branch 'master' into reduced_potential_pr

bfbe0b0

dotsdl approved these changes Nov 26, 2018

View reviewed changes

dotsdl merged commit 82ce951 into alchemistry:master Nov 26, 2018

orbeckst mentioned this pull request May 3, 2019

GOMC parser #77

Closed

Add pV and U to reduced potential when they are given #62

Add pV and U to reduced potential when they are given #62

Conversation

harlor commented Oct 21, 2018

codecov-io commented Oct 21, 2018 • edited Loading

Codecov Report

orbeckst left a comment

Choose a reason for hiding this comment

harlor commented Oct 26, 2018

orbeckst commented Oct 26, 2018 • edited Loading

orbeckst commented Oct 26, 2018

dotsdl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harlor Nov 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotsdl commented Nov 1, 2018

harlor commented Nov 1, 2018 • edited Loading

dotsdl commented Nov 3, 2018

orbeckst commented Nov 4, 2018

dotsdl commented Nov 4, 2018

harlor commented Nov 15, 2018

orbeckst commented Nov 15, 2018

harlor commented Nov 16, 2018

orbeckst commented Nov 16, 2018

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harlor commented Nov 16, 2018

orbeckst commented Nov 17, 2018

orbeckst commented Nov 17, 2018

orbeckst commented Nov 17, 2018

harlor commented Nov 17, 2018 • edited Loading

orbeckst commented Nov 19, 2018

harlor commented Nov 19, 2018

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Nov 19, 2018

orbeckst commented Nov 21, 2018

dotsdl commented Nov 22, 2018

harlor commented Nov 23, 2018

dotsdl commented Nov 23, 2018

dotsdl commented Nov 26, 2018

codecov-io commented Oct 21, 2018 •

edited

Loading

orbeckst commented Oct 26, 2018 •

edited

Loading

harlor Nov 1, 2018 •

edited

Loading

harlor commented Nov 1, 2018 •

edited

Loading

harlor commented Nov 17, 2018 •

edited

Loading