-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BAR Estimator #60
BAR Estimator #60
Conversation
Codecov Report
@@ Coverage Diff @@
## master #60 +/- ##
==========================================
+ Coverage 98.18% 98.33% +0.14%
==========================================
Files 9 10 +1
Lines 497 539 +42
Branches 100 105 +5
==========================================
+ Hits 488 530 +42
Misses 4 4
Partials 5 5
Continue to review full report at Codecov.
|
Can you explain a little more what is different? Also, be specific as to what tests you did to compare the alchemical-analysis and alchemlyb Note that the error estimate generated by BAR ARE correlated and so it's not a great idea to compute the overall uncertainty by adding them in the square. Correlations are very hard to compute between neighboring BAR calculations. Bootstrapping the entire data set is almost certainly the best way to do this. |
I used different input files to test this (As explained in #61 (comment)) Indeed - unfortunately I am currently just calculating the sum of the squares. I guess bootstrapping just requires to have a sufficient amount of uncorrelated samples in u_kln? As suggested in choderalab/pymbar#304 (comment) do you think 10 samples are sufficient to do an error estimation? Of course this is something that should be adjustable but to have a default value for this would still be nice. |
Again, document which input files used so that it's reproducible.
There's no other way to do it with BAR currently that I know if.
It probably makes sense to do bootstrapping as a general utility (perhaps inside alchemlyb, as perhaps a wrapper for all methods?), since it's useful for ALL calculations, not just BAR. 200 is much better for bootstrap than 10. At least 50 is needed. Note that bootstrapping is NOT running 10 independent runs. See http://www.alchemistry.org/wiki/Analyzing_Simulation_Results#Bootstrap_Sampling It's less accurate than running 200 independent samples, and includes more bias, but is statistically well behaved. |
Thank you @dotsdl!
ok, maybe we can then just return nan-values for unknown uncertainties in the bar esimator here? Returning uncertainties that are known to be underestimated is not a good idea I guess. |
@harlor to move this one forward, we'll need some tests added to test_fep_estimators.py. You can create a class |
@dotsdl I added tests for the BAR estimator's value and its uncertainties - I also added tests for the uncertainty of the MBAR estimator and compared the results with alchemical-analysis where I find exact same values by using: @mrshirts Instead of calculating underestimated uncertainties, the estimator now just gives NaN for unknown uncertainties. |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am covering the code/interface aspects.
I only have a minor issue related to predict()
and coverage.
@mrshirts do you sign off on the science?
src/alchemlyb/estimators/bar_.py
Outdated
|
||
return self | ||
|
||
def predict(self, u_ln): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop it?
(It's neither documented nor tested.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you keep then there should be a test that shows that it does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is comes from the implementation of MBAR which I used as a starting point. I don't find any usages of this method in the project so let's drop it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's in the MBAR class then we should also remove it from there (in a different PR).
src/alchemlyb/estimators/bar_.py
Outdated
dout.append(d_deltas[i:i + j + 1].sum()) | ||
# Other uncertainties are unknown at this point | ||
else: | ||
dout.append(float('NaN')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use np.NaN
directly, given that we already use numpy. Not sure what's more Pythonic although float("nan") seems to be the way to go without numpy.
Bogus performance "argument" in favor of numpy:
In [17]: %time float("NaN")
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 9.06 µs
Out[17]: nan
In [18]: %time np.NaN
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 7.87 µs
Out[18]: nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok i changed it to the numpy version. Mainly because it avoids the explicit data type conversion.
Looks good to me from the pure code/api side of things. Quick approval from @mrshirts (and/or @davidlmobley ) regarding the veracity of the implementation would be highly appreciated! – Essentially, are you satisfied with the evidence that @harlor has shown to demonstrate that this implementation produces correct answers (i.e., same answer as |
Ping @mrshirts @davidlmobley . |
I haven't heard anything against merging this PR so I am now merging it. If anything else comes up later, please open a new issue. Thanks @harlor ! |
Yes, I think this is fine. Sorry for the delay; huge series of deadlines. @hbaumann do you have a set of data from equilibrium free energy calculations you can run through this to doublecheck results seem correct? |
This version seems to reproduce results i get using alchemical-analysis.