Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAR Estimator #60

Merged
merged 13 commits into from
Jan 11, 2019
Merged

BAR Estimator #60

merged 13 commits into from
Jan 11, 2019

Conversation

harlor
Copy link
Contributor

@harlor harlor commented Oct 17, 2018

This version seems to reproduce results i get using alchemical-analysis.

@codecov-io
Copy link

codecov-io commented Oct 17, 2018

Codecov Report

Merging #60 into master will increase coverage by 0.14%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #60      +/-   ##
==========================================
+ Coverage   98.18%   98.33%   +0.14%     
==========================================
  Files           9       10       +1     
  Lines         497      539      +42     
  Branches      100      105       +5     
==========================================
+ Hits          488      530      +42     
  Misses          4        4              
  Partials        5        5
Impacted Files Coverage Δ
src/alchemlyb/estimators/__init__.py 100% <100%> (ø) ⬆️
src/alchemlyb/estimators/bar_.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd70f41...d572e9e. Read the comment docs.

@mrshirts
Copy link

Can you explain a little more what is different? Also, be specific as to what tests you did to compare the alchemical-analysis and alchemlyb

Note that the error estimate generated by BAR ARE correlated and so it's not a great idea to compute the overall uncertainty by adding them in the square. Correlations are very hard to compute between neighboring BAR calculations. Bootstrapping the entire data set is almost certainly the best way to do this.

@harlor
Copy link
Contributor Author

harlor commented Oct 17, 2018

I used different input files to test this (As explained in #61 (comment))

Indeed - unfortunately I am currently just calculating the sum of the squares. I guess bootstrapping just requires to have a sufficient amount of uncorrelated samples in u_kln? As suggested in choderalab/pymbar#304 (comment) do you think 10 samples are sufficient to do an error estimation? Of course this is something that should be adjustable but to have a default value for this would still be nice.

@mrshirts
Copy link

I used different input files to test this (As explained in #61 (comment))

Again, document which input files used so that it's reproducible.

Indeed - unfortunately I am currently just calculating the sum of the squares.

There's no other way to do it with BAR currently that I know if.

As suggested in choderalab/pymbar#304 (comment) do you think 10 samples are sufficient to do an error estimation? Of course this is something that should be adjustable but to have a default value for this would still be nice.

It probably makes sense to do bootstrapping as a general utility (perhaps inside alchemlyb, as perhaps a wrapper for all methods?), since it's useful for ALL calculations, not just BAR.

200 is much better for bootstrap than 10. At least 50 is needed.

Note that bootstrapping is NOT running 10 independent runs. See http://www.alchemistry.org/wiki/Analyzing_Simulation_Results#Bootstrap_Sampling

It's less accurate than running 200 independent samples, and includes more bias, but is statistically well behaved.

@dotsdl
Copy link
Member

dotsdl commented Oct 19, 2018

@harlor, thanks for this contribution! We've been wanting a BAR implementation for some time (#28), so this will be great to finally land. I'll have time to review sometime this weekend.

@harlor
Copy link
Contributor Author

harlor commented Oct 19, 2018

Thank you @dotsdl!

@mrshirts

It probably makes sense to do bootstrapping as a general utility (perhaps inside alchemlyb, as perhaps a wrapper for all methods?), since it's useful for ALL calculations, not just BAR.

ok, maybe we can then just return nan-values for unknown uncertainties in the bar esimator here? Returning uncertainties that are known to be underestimated is not a good idea I guess.

@dotsdl
Copy link
Member

dotsdl commented Nov 3, 2018

@harlor to move this one forward, we'll need some tests added to test_fep_estimators.py. You can create a class TestBAR similar to TestMBAR that tests the results from this estimator, which at the moment is an almost one-liner. Feel free to add additional tests as needed to help define the correct functionality of the BAR estimator.

@harlor
Copy link
Contributor Author

harlor commented Nov 26, 2018

@dotsdl I added tests for the BAR estimator's value and its uncertainties - I also added tests for the uncertainty of the MBAR estimator and compared the results with alchemical-analysis where I find exact same values by using:
alchemical_analysis.py -g -p lambda_ -t 300 -i 0 -u kBT -r 6

@mrshirts Instead of calculating underestimated uncertainties, the estimator now just gives NaN for unknown uncertainties.

@orbeckst
Copy link
Member

The predict() method only contains pass and is not tested. Can't we just drop it? Or does it have to be there – @dotsdl ?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am covering the code/interface aspects.

I only have a minor issue related to predict() and coverage.

@mrshirts do you sign off on the science?


return self

def predict(self, u_ln):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop it?

(It's neither documented nor tested.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep then there should be a test that shows that it does nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is comes from the implementation of MBAR which I used as a starting point. I don't find any usages of this method in the project so let's drop it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's in the MBAR class then we should also remove it from there (in a different PR).

dout.append(d_deltas[i:i + j + 1].sum())
# Other uncertainties are unknown at this point
else:
dout.append(float('NaN'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use np.NaN directly, given that we already use numpy. Not sure what's more Pythonic although float("nan") seems to be the way to go without numpy.

Bogus performance "argument" in favor of numpy:

In [17]: %time float("NaN")
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 9.06 µs
Out[17]: nan

In [18]: %time np.NaN
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 7.87 µs
Out[18]: nan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i changed it to the numpy version. Mainly because it avoids the explicit data type conversion.

@orbeckst orbeckst requested a review from mrshirts December 14, 2018 20:30
@orbeckst
Copy link
Member

Looks good to me from the pure code/api side of things.

Quick approval from @mrshirts (and/or @davidlmobley ) regarding the veracity of the implementation would be highly appreciated! – Essentially, are you satisfied with the evidence that @harlor has shown to demonstrate that this implementation produces correct answers (i.e., same answer as alchemical-analysis)?

@orbeckst
Copy link
Member

orbeckst commented Jan 7, 2019

Ping @mrshirts @davidlmobley .

@orbeckst
Copy link
Member

I haven't heard anything against merging this PR so I am now merging it. If anything else comes up later, please open a new issue.

Thanks @harlor !

@orbeckst orbeckst merged commit 4f1509d into alchemistry:master Jan 11, 2019
@davidlmobley
Copy link

Yes, I think this is fine. Sorry for the delay; huge series of deadlines.

@hbaumann do you have a set of data from equilibrium free energy calculations you can run through this to doublecheck results seem correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants