Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time convergence #168

Merged
merged 21 commits into from
Oct 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ Enhancements
Various malformations of input files are detected, where lambdas do not
monotonically increase or decrease, or are inconsistent or missing.
- Convenience function for decorrelating u_nk and dHdl (PR #163).
- Time convergence function forward_backward_convergence
introduced and plot_convergence now takes dataframe from
forward_backward_convergence as input (PR #168).

Fixes
- Subsampling now works with bounds and step (PR #167, issue #166).
Expand Down
5 changes: 2 additions & 3 deletions docs/api_principles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The library is structured as follows, following a similar style to
│   └── ...
├── postprocessors
│   ├── ...
│   └── units.py
│   └── units.py
├── visualisation
│   ├── convergence.py
│   ├── dF_state.py
Expand All @@ -83,8 +83,7 @@ The :mod:`~alchemlyb.estimators` module features classes *a la* **scikit-learn**
MBAR, BAR, and thermodynamic integration (TI) as the major methods are all implemented.
Correct error estimates require the use of time series with independent samples.

The :mod:`~alchemlyb.convergence` submodule will feature convenience functions/classes for doing convergence analysis using a given dataset and a chosen estimator, though the form of this is not yet thought-out.
However, the `gist a41e5756a58e1775e3e3a915f07bfd37`_ shows an example for how this can be done already in practice.
The :mod:`~alchemlyb.convergence` submodule will feature convenience functions/classes for doing convergence analysis using a given dataset and a chosen estimator.

The :mod:`~alchemlyb.postprocessing` submodule contains functions to calculate new quantities or express data in different units.

Expand Down
34 changes: 34 additions & 0 deletions docs/convergence.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Using functions to estimate Convergence
=======================================

For a result to be valid, we need to ensure that longer simulation time
would not result in different results. Various functions will be provided in
this module to estimate the convergence of the estimate and help user determine
the simulation end point.

Time Convergence
----------------
One way of determining the simulation end point is to compute and plot the
forward and backward convergence of the estimate using
:func:`~alchemlyb.convergence.forward_backward_convergence` and
:func:`~alchemlyb.visualisation.plot_convergence`. ::

>>> from alchemtest.gmx import load_benzene
>>> from alchemlyb.parsing.gmx import extract_u_nk
>>> from alchemlyb.visualisation import plot_convergence
>>> from alchemlyb.convergence import forward_backward_convergence

>>> bz = load_benzene().data
>>> data_list = [extract_u_nk(xvg, T=300) for xvg in bz['Coulomb']]
>>> df = forward_backward_convergence(data_list, 'mbar')
>>> ax = plot_convergence(df)
>>> ax.figure.savefig('dF_t.pdf')

Will give a plot looks like this

.. figure:: images/dF_t.png

A convergence plot of showing that the forward and backward has converged
fully.

.. autofunction:: alchemlyb.convergence.forward_backward_convergence
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ Contributions are very welcome. If you have bug reports or feature requests or q
parsing
preprocessing
estimators
convergence
postprocessing
visualisation

Expand Down
10 changes: 7 additions & 3 deletions docs/visualisation/alchemlyb.visualisation.plot_convergence.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,13 @@ Plot the Forward and Backward Convergence

The function :func:`~alchemlyb.visualisation.plot_convergence` allows
the user to visualise the convergence by plotting the free energy change
computed using the equilibrated snapshots between the proper target time frames
in both forward (data points are stored in `forward` and `forward_error`) and
reverse (data points are stored in `backward` and `backward_error`) directions.
computed using the equilibrated snapshots between the proper target time
frames. The data could be provided as a Dataframe from
:func:`alchemlyb.convergence.forward_backward_convergence` or provided
explicitly in both forward (data points are stored in `forward` and
`forward_error`) and reverse (data points are stored in `backward` and
`backward_error`) directions.

The unit in the y axis could be labelled to other units by setting *units*,
which by default is :math:`kT`. The user can pass :class:`matplotlib.axes.Axes` into
the function to have the convergence drawn on a specific axes.
Expand Down
1 change: 1 addition & 0 deletions src/alchemlyb/convergence/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .convergence import forward_backward_convergence
115 changes: 115 additions & 0 deletions src/alchemlyb/convergence/convergence.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
import pandas as pd
import logging
import numpy as np

from ..estimators import MBAR, BAR, TI
from .. import concat


def forward_backward_convergence(df_list, estimator='mbar', num=10):
''' The forward and backward convergence of the free energy estimate.
Generate the free energy estimate as a function of time in both
directions, with the specified number of points in the time. For example,
setting `num` to 10 would give the forward convergence which is the free
energy estimate from the first 10%, 20%, 30% .. of the data. The
Backward would give the estimate from the last 10%, 20%, 30% .. of the
data.
Parameters
----------
df_list : list
List of DataFrame of either dHdl or u_nk.
estimator : {'mbar', 'bar', 'ti'}
Name of the estimators.
num : int
The number of time points.
Returns
-------
DataFrame
The DataFrame with convergence data. ::
Forward Forward_Error Backward Backward_Error data_fraction
0 3.016442 0.052748 3.065176 0.051036 0.1
1 3.078106 0.037170 3.078567 0.036640 0.2
2 3.072561 0.030186 3.047357 0.029775 0.3
3 3.048325 0.026070 3.057527 0.025743 0.4
4 3.049769 0.023359 3.037454 0.023001 0.5
5 3.034078 0.021260 3.040484 0.021075 0.6
6 3.043274 0.019642 3.032495 0.019517 0.7
7 3.035460 0.018340 3.036670 0.018261 0.8
8 3.042032 0.017319 3.046597 0.017233 0.9
9 3.044149 0.016405 3.044385 0.016402 1.0
.. versionadded:: 0.6.0
'''
logger = logging.getLogger('alchemlyb.convergence.'
'forward_backward_convergence')
logger.info('Start convergence analysis.')
logger.info('Check data availability.')

if estimator.lower() == 'mbar':
logger.info('Use MBAR estimator for convergence analysis.')
estimator_fit = MBAR().fit
elif estimator.lower() == 'bar':
logger.info('Use BAR estimator for convergence analysis.')
estimator_fit = BAR().fit
elif estimator.lower() == 'ti':
logger.info('Use TI estimator for convergence analysis.')
estimator_fit = TI().fit
else:
raise ValueError(
'{} is not a valid estimator.'.format(estimator))
orbeckst marked this conversation as resolved.
Show resolved Hide resolved

logger.info('Begin forward analysis')
forward_list = []
forward_error_list = []
for i in range(1, num + 1):
logger.info('Forward analysis: {:.2f}%'.format(100 * i / num))
sample = []
for data in df_list:
sample.append(data[:len(data) // num * i])
sample = concat(sample)
result = estimator_fit(sample)
forward_list.append(result.delta_f_.iloc[0, -1])
if estimator.lower() == 'bar':
error = np.sqrt(sum(
[result.d_delta_f_.iloc[i, i + 1] ** 2
for i in range(len(result.d_delta_f_) - 1)]))
forward_error_list.append(error)
else:
forward_error_list.append(result.d_delta_f_.iloc[0, -1])
logger.info('{:.2f} +/- {:.2f} kT'.format(forward_list[-1],
forward_error_list[-1]))

logger.info('Begin backward analysis')
backward_list = []
backward_error_list = []
for i in range(1, num + 1):
logger.info('Backward analysis: {:.2f}%'.format(100 * i / num))
sample = []
for data in df_list:
sample.append(data[-len(data) // num * i:])
sample = concat(sample)
result = estimator_fit(sample)
backward_list.append(result.delta_f_.iloc[0, -1])
if estimator.lower() == 'bar':
error = np.sqrt(sum(
[result.d_delta_f_.iloc[i, i + 1] ** 2
for i in range(len(result.d_delta_f_) - 1)]))
backward_error_list.append(error)
else:
backward_error_list.append(result.d_delta_f_.iloc[0, -1])
logger.info('{:.2f} +/- {:.2f} kT'.format(backward_list[-1],
backward_error_list[-1]))

convergence = pd.DataFrame(
{'Forward': forward_list,
'Forward_Error': forward_error_list,
'Backward': backward_list,
'Backward_Error': backward_error_list,
'data_fraction': [i / num for i in range(1, num + 1)]})
convergence.attrs = df_list[0].attrs
return convergence
43 changes: 43 additions & 0 deletions src/alchemlyb/tests/test_convergence.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import pytest

from alchemtest.gmx import load_benzene
from alchemlyb.parsing import gmx
from alchemlyb.convergence import forward_backward_convergence

@pytest.fixture()
def gmx_benzene():
dataset = load_benzene()
return [gmx.extract_dHdl(dhdl, T=300) for dhdl in dataset['data']['Coulomb']], \
[gmx.extract_u_nk(dhdl, T=300) for dhdl in dataset['data']['Coulomb']]

def test_convergence_ti(gmx_benzene):
dHdl, u_nk = gmx_benzene
convergence = forward_backward_convergence(dHdl, 'TI')
assert convergence.shape == (10, 5)
assert convergence.iloc[0, 0] == pytest.approx(3.07, 0.01)
assert convergence.iloc[0, 2] == pytest.approx(3.11, 0.01)
assert convergence.iloc[-1, 0] == pytest.approx(3.09, 0.01)
assert convergence.iloc[-1, 2] == pytest.approx(3.09, 0.01)

def test_convergence_mbar(gmx_benzene):
dHdl, u_nk = gmx_benzene
convergence = forward_backward_convergence(u_nk, 'MBAR')
assert convergence.shape == (10, 5)
assert convergence.iloc[0, 0] == pytest.approx(3.02, 0.01)
assert convergence.iloc[0, 2] == pytest.approx(3.06, 0.01)
assert convergence.iloc[-1, 0] == pytest.approx(3.05, 0.01)
assert convergence.iloc[-1, 2] == pytest.approx(3.04, 0.01)

def test_convergence_bar(gmx_benzene):
dHdl, u_nk = gmx_benzene
convergence = forward_backward_convergence(u_nk, 'BAR')
assert convergence.shape == (10, 5)
assert convergence.iloc[0, 0] == pytest.approx(3.02, 0.01)
assert convergence.iloc[0, 2] == pytest.approx(3.06, 0.01)
assert convergence.iloc[-1, 0] == pytest.approx(3.05, 0.01)
assert convergence.iloc[-1, 2] == pytest.approx(3.04, 0.01)

def test_convergence_wrong_estimator(gmx_benzene):
dHdl, u_nk = gmx_benzene
with pytest.raises(ValueError, match="{} is not a valid estimator".format("www")):
convergence = forward_backward_convergence(u_nk, 'www')
9 changes: 9 additions & 0 deletions src/alchemlyb/tests/test_visualisation.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from alchemlyb.visualisation.ti_dhdl import plot_ti_dhdl
from alchemlyb.visualisation.dF_state import plot_dF_state
from alchemlyb.visualisation import plot_convergence
from alchemlyb.convergence import forward_backward_convergence

def test_plot_mbar_omatrix():
'''Just test if the plot runs'''
Expand Down Expand Up @@ -126,6 +127,14 @@ def test_plot_dF_state():
assert isinstance(fig, matplotlib.figure.Figure)
plt.close(fig)

def test_plot_convergence_dataframe():
bz = load_benzene().data
data_list = [extract_u_nk(xvg, T=300) for xvg in bz['Coulomb']]
df = forward_backward_convergence(data_list, 'mbar')
ax = plot_convergence(df)
assert isinstance(ax, matplotlib.axes.Axes)
plt.close(ax.figure)

def test_plot_convergence():
bz = load_benzene().data
data_list = [extract_u_nk(xvg, T=300) for xvg in bz['Coulomb']]
Expand Down
57 changes: 45 additions & 12 deletions src/alchemlyb/visualisation/convergence.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,35 @@
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.font_manager import FontProperties as FP
import numpy as np

def plot_convergence(forward, forward_error, backward, backward_error,
units='kT', ax=None):
from ..postprocessors.units import get_unit_converter

def plot_convergence(*data, units='kT', ax=None):
"""Plot the forward and backward convergence.
The input could be the result from
:func:`~alchemlyb.convergence.forward_backward_convergence` or it could
be given explicitly as `forward`, `forward_error`, `backward`,
`backward_error`.
`forward`: A list of free energy estimate from the first X% of data,
where `forward_error` is the corresponding error.
`backward`: A list of free energy estimate from the last X% of data.,
where `backward_error` is the corresponding error.
These four array_like objects should have the same
shape and can be used as input for the
:func:`matplotlib.pyplot.errorbar`.
Parameters
----------
forward : List
A list of free energy estimate from the first X% of data.
forward_error : List
A list of error from the first X% of data.
backward : List
A list of free energy estimate from the last X% of data.
backward_error : List
A list of error from the last X% of data.
data : Dataframe or 4 array_like objects
Output Dataframe from
:func:`~alchemlyb.convergence.forward_backward_convergence`.
Or given explicitly as `forward`, `forward_error`, `backward`,
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
`backward_error` see :ref:`plot_convergence <plot_convergence>`.
units : str
The label for the unit of the estimate. Default: "kT"
ax : matplotlib.axes.Axes
Expand All @@ -32,12 +46,31 @@ def plot_convergence(forward, forward_error, backward, backward_error,
The code is taken and modified from
`Alchemical Analysis <https://github.com/MobleyLab/alchemical-analysis>`_.
The units variable is for labelling only. Changing it doesn't change the
unit of the underlying variable.
If `data` is not an :class:pandas.Dataframe` produced by
:func:`~alchemlyb.convergence.forward_backward_convergence`,
the unit will be adjusted according to the units
variable. Otherwise, the units variable is for labelling only.
Changing it doesn't change the unit of the underlying variable.
orbeckst marked this conversation as resolved.
Show resolved Hide resolved
.. versionchanged:: 0.6.0
data now takes in dataframe
.. versionadded:: 0.4.0
"""
if len(data) == 1 and isinstance(data[0], pd.DataFrame):
dataframe = get_unit_converter(units)(data[0])
forward = dataframe['Forward'].to_numpy()
forward_error = dataframe['Forward_Error'].to_numpy()
backward = dataframe['Backward'].to_numpy()
backward_error = dataframe['Backward_Error'].to_numpy()
else:
try:
forward, forward_error, backward, backward_error = data
except ValueError: # pragma: no cover
raise ValueError('Ensure all four of forward, forward_error, '
'backward, backward_error are supplied.')

orbeckst marked this conversation as resolved.
Show resolved Hide resolved
if ax is None: # pragma: no cover
fig, ax = plt.subplots(figsize=(8, 6))

Expand Down