Time convergence (#168)

Implement the time convergence function. Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
alchemistry · Oct 22, 2021 · 592f38b · 592f38b
1 parent 5f81d70
commit 592f38b
Show file tree

Hide file tree

Showing 10 changed files with 260 additions and 18 deletions.
diff --git a/CHANGES b/CHANGES
@@ -26,6 +26,9 @@ Enhancements
     Various malformations of input files are detected, where lambdas do not
     monotonically increase or decrease, or are inconsistent or missing.
   - Convenience function for decorrelating u_nk and dHdl (PR #163).
+  - Time convergence function forward_backward_convergence
+    introduced and plot_convergence now takes dataframe from
+    forward_backward_convergence as input (PR #168).
 
 Fixes
   - Subsampling now works with bounds and step (PR #167, issue #166).

diff --git a/docs/api_principles.rst b/docs/api_principles.rst
@@ -57,7 +57,7 @@ The library is structured as follows, following a similar style to
     │   └── ...
     ├── postprocessors
     │   ├── ...
-    │   └── units.py    
+    │   └── units.py
     ├── visualisation
     │   ├── convergence.py
     │   ├── dF_state.py
@@ -83,8 +83,7 @@ The :mod:`~alchemlyb.estimators` module features classes *a la* **scikit-learn**
 MBAR, BAR, and thermodynamic integration (TI) as the major methods are all implemented.
 Correct error estimates require the use of time series with independent samples.
 
-The :mod:`~alchemlyb.convergence` submodule will feature convenience functions/classes for doing convergence analysis using a given dataset and a chosen estimator, though the form of this is not yet thought-out.
-However, the `gist a41e5756a58e1775e3e3a915f07bfd37`_ shows an example for how this can be done already in practice.
+The :mod:`~alchemlyb.convergence` submodule will feature convenience functions/classes for doing convergence analysis using a given dataset and a chosen estimator.
 
 The :mod:`~alchemlyb.postprocessing` submodule contains functions to calculate new quantities or express data in different units.
 

diff --git a/docs/convergence.rst b/docs/convergence.rst
@@ -0,0 +1,34 @@
+Using functions to estimate Convergence
+=======================================
+
+For a result to be valid, we need to ensure that longer simulation time
+would not result in different results. Various functions will be provided in
+this module to estimate the convergence of the estimate and help user determine
+the simulation end point.
+
+Time Convergence
+----------------
+One way of determining the simulation end point is to compute and plot the
+forward and backward convergence of the estimate using
+:func:`~alchemlyb.convergence.forward_backward_convergence` and
+:func:`~alchemlyb.visualisation.plot_convergence`. ::
+
+    >>> from alchemtest.gmx import load_benzene
+    >>> from alchemlyb.parsing.gmx import extract_u_nk
+    >>> from alchemlyb.visualisation import plot_convergence
+    >>> from alchemlyb.convergence import forward_backward_convergence
+
+    >>> bz = load_benzene().data
+    >>> data_list = [extract_u_nk(xvg, T=300) for xvg in bz['Coulomb']]
+    >>> df = forward_backward_convergence(data_list, 'mbar')
+    >>> ax = plot_convergence(df)
+    >>> ax.figure.savefig('dF_t.pdf')
+
+Will give a plot looks like this
+
+.. figure:: images/dF_t.png
+
+   A convergence plot of showing that the forward and backward has converged
+   fully.
+
+.. autofunction:: alchemlyb.convergence.forward_backward_convergence
diff --git a/docs/index.rst b/docs/index.rst
@@ -70,6 +70,7 @@ Contributions are very welcome. If you have bug reports or feature requests or q
     parsing
     preprocessing
     estimators
+    convergence
     postprocessing
     visualisation
 

diff --git a/docs/visualisation/alchemlyb.visualisation.plot_convergence.rst b/docs/visualisation/alchemlyb.visualisation.plot_convergence.rst
@@ -5,9 +5,13 @@ Plot the Forward and Backward Convergence
 
 The function :func:`~alchemlyb.visualisation.plot_convergence` allows
 the user to visualise the convergence by plotting the free energy change
-computed using the equilibrated snapshots between the proper target time frames
-in both forward (data points are stored in `forward` and `forward_error`) and
-reverse (data points are stored in `backward` and `backward_error`) directions.
+computed using the equilibrated snapshots between the proper target time
+frames. The data could be provided as a Dataframe from
+:func:`alchemlyb.convergence.forward_backward_convergence` or provided
+explicitly in both forward (data points are stored in `forward` and
+`forward_error`) and reverse (data points are stored in `backward` and
+`backward_error`) directions.
+
 The unit in the y axis could be labelled to other units by setting *units*,
 which by default is :math:`kT`. The user can pass :class:`matplotlib.axes.Axes` into
 the function to have the convergence drawn on a specific axes.

diff --git a/src/alchemlyb/convergence/__init__.py b/src/alchemlyb/convergence/__init__.py
@@ -0,0 +1 @@
+from .convergence import forward_backward_convergence
diff --git a/src/alchemlyb/convergence/convergence.py b/src/alchemlyb/convergence/convergence.py
@@ -0,0 +1,115 @@
+import pandas as pd
+import logging
+import numpy as np
+
+from ..estimators import MBAR, BAR, TI
+from .. import concat
+
+
+def forward_backward_convergence(df_list, estimator='mbar', num=10):
+    ''' The forward and backward convergence of the free energy estimate.
+
+    Generate the free energy estimate as a function of time in both
+    directions, with the specified number of points in the time. For example,
+    setting `num` to 10 would give the forward convergence which is the free
+    energy estimate from the first 10%, 20%, 30% .. of the data. The
+    Backward would give the estimate from the last 10%, 20%, 30% .. of the
+    data.
+
+    Parameters
+    ----------
+    df_list : list
+        List of DataFrame of either dHdl or u_nk.
+    estimator : {'mbar', 'bar', 'ti'}
+        Name of the estimators.
+    num : int
+        The number of time points.
+
+    Returns
+    -------
+    DataFrame
+        The DataFrame with convergence data. ::
+
+                Forward  Forward_Error  Backward  Backward_Error  data_fraction
+            0  3.016442       0.052748  3.065176        0.051036            0.1
+            1  3.078106       0.037170  3.078567        0.036640            0.2
+            2  3.072561       0.030186  3.047357        0.029775            0.3
+            3  3.048325       0.026070  3.057527        0.025743            0.4
+            4  3.049769       0.023359  3.037454        0.023001            0.5
+            5  3.034078       0.021260  3.040484        0.021075            0.6
+            6  3.043274       0.019642  3.032495        0.019517            0.7
+            7  3.035460       0.018340  3.036670        0.018261            0.8
+            8  3.042032       0.017319  3.046597        0.017233            0.9
+            9  3.044149       0.016405  3.044385        0.016402            1.0
+
+
+    .. versionadded:: 0.6.0
+    '''
+    logger = logging.getLogger('alchemlyb.convergence.'
+                               'forward_backward_convergence')
+    logger.info('Start convergence analysis.')
+    logger.info('Check data availability.')
+
+    if estimator.lower() == 'mbar':
+        logger.info('Use MBAR estimator for convergence analysis.')
+        estimator_fit = MBAR().fit
+    elif estimator.lower() == 'bar':
+        logger.info('Use BAR estimator for convergence analysis.')
+        estimator_fit = BAR().fit
+    elif estimator.lower() == 'ti':
+        logger.info('Use TI estimator for convergence analysis.')
+        estimator_fit = TI().fit
+    else:
+        raise ValueError(
+            '{} is not a valid estimator.'.format(estimator))
+
+    logger.info('Begin forward analysis')
+    forward_list = []
+    forward_error_list = []
+    for i in range(1, num + 1):
+        logger.info('Forward analysis: {:.2f}%'.format(100 * i / num))
+        sample = []
+        for data in df_list:
+            sample.append(data[:len(data) // num * i])
+        sample = concat(sample)
+        result = estimator_fit(sample)
+        forward_list.append(result.delta_f_.iloc[0, -1])
+        if estimator.lower() == 'bar':
+            error = np.sqrt(sum(
+                [result.d_delta_f_.iloc[i, i + 1] ** 2
+                 for i in range(len(result.d_delta_f_) - 1)]))
+            forward_error_list.append(error)
+        else:
+            forward_error_list.append(result.d_delta_f_.iloc[0, -1])
+        logger.info('{:.2f} +/- {:.2f} kT'.format(forward_list[-1],
+                                                  forward_error_list[-1]))
+
+    logger.info('Begin backward analysis')
+    backward_list = []
+    backward_error_list = []
+    for i in range(1, num + 1):
+        logger.info('Backward analysis: {:.2f}%'.format(100 * i / num))
+        sample = []
+        for data in df_list:
+            sample.append(data[-len(data) // num * i:])
+        sample = concat(sample)
+        result = estimator_fit(sample)
+        backward_list.append(result.delta_f_.iloc[0, -1])
+        if estimator.lower() == 'bar':
+            error = np.sqrt(sum(
+                [result.d_delta_f_.iloc[i, i + 1] ** 2
+                 for i in range(len(result.d_delta_f_) - 1)]))
+            backward_error_list.append(error)
+        else:
+            backward_error_list.append(result.d_delta_f_.iloc[0, -1])
+        logger.info('{:.2f} +/- {:.2f} kT'.format(backward_list[-1],
+                                                  backward_error_list[-1]))
+
+    convergence = pd.DataFrame(
+        {'Forward': forward_list,
+         'Forward_Error': forward_error_list,
+         'Backward': backward_list,
+         'Backward_Error': backward_error_list,
+         'data_fraction': [i / num for i in range(1, num + 1)]})
+    convergence.attrs = df_list[0].attrs
+    return convergence
diff --git a/src/alchemlyb/tests/test_convergence.py b/src/alchemlyb/tests/test_convergence.py
@@ -0,0 +1,43 @@
+import pytest
+
+from alchemtest.gmx import load_benzene
+from alchemlyb.parsing import gmx
+from alchemlyb.convergence import forward_backward_convergence
+
+@pytest.fixture()
+def gmx_benzene():
+    dataset = load_benzene()
+    return [gmx.extract_dHdl(dhdl, T=300) for dhdl in dataset['data']['Coulomb']], \
+           [gmx.extract_u_nk(dhdl, T=300) for dhdl in dataset['data']['Coulomb']]
+
+def test_convergence_ti(gmx_benzene):
+    dHdl, u_nk = gmx_benzene
+    convergence = forward_backward_convergence(dHdl, 'TI')
+    assert convergence.shape == (10, 5)
+    assert convergence.iloc[0, 0] == pytest.approx(3.07, 0.01)
+    assert convergence.iloc[0, 2] == pytest.approx(3.11, 0.01)
+    assert convergence.iloc[-1, 0] == pytest.approx(3.09, 0.01)
+    assert convergence.iloc[-1, 2] == pytest.approx(3.09, 0.01)
+
+def test_convergence_mbar(gmx_benzene):
+    dHdl, u_nk = gmx_benzene
+    convergence = forward_backward_convergence(u_nk, 'MBAR')
+    assert convergence.shape == (10, 5)
+    assert convergence.iloc[0, 0] == pytest.approx(3.02, 0.01)
+    assert convergence.iloc[0, 2] == pytest.approx(3.06, 0.01)
+    assert convergence.iloc[-1, 0] == pytest.approx(3.05, 0.01)
+    assert convergence.iloc[-1, 2] == pytest.approx(3.04, 0.01)
+
+def test_convergence_bar(gmx_benzene):
+    dHdl, u_nk = gmx_benzene
+    convergence = forward_backward_convergence(u_nk, 'BAR')
+    assert convergence.shape == (10, 5)
+    assert convergence.iloc[0, 0] == pytest.approx(3.02, 0.01)
+    assert convergence.iloc[0, 2] == pytest.approx(3.06, 0.01)
+    assert convergence.iloc[-1, 0] == pytest.approx(3.05, 0.01)
+    assert convergence.iloc[-1, 2] == pytest.approx(3.04, 0.01)
+
+def test_convergence_wrong_estimator(gmx_benzene):
+    dHdl, u_nk = gmx_benzene
+    with pytest.raises(ValueError, match="{} is not a valid estimator".format("www")):
+        convergence = forward_backward_convergence(u_nk, 'www')
diff --git a/src/alchemlyb/tests/test_visualisation.py b/src/alchemlyb/tests/test_visualisation.py
@@ -12,6 +12,7 @@
 from alchemlyb.visualisation.ti_dhdl import plot_ti_dhdl
 from alchemlyb.visualisation.dF_state import plot_dF_state
 from alchemlyb.visualisation import plot_convergence
+from alchemlyb.convergence import forward_backward_convergence
 
 def test_plot_mbar_omatrix():
     '''Just test if the plot runs'''
@@ -126,6 +127,14 @@ def test_plot_dF_state():
     assert isinstance(fig, matplotlib.figure.Figure)
     plt.close(fig)
 
+def test_plot_convergence_dataframe():
+    bz = load_benzene().data
+    data_list = [extract_u_nk(xvg, T=300) for xvg in bz['Coulomb']]
+    df = forward_backward_convergence(data_list, 'mbar')
+    ax = plot_convergence(df)
+    assert isinstance(ax, matplotlib.axes.Axes)
+    plt.close(ax.figure)
+
 def test_plot_convergence():
     bz = load_benzene().data
     data_list = [extract_u_nk(xvg, T=300) for xvg in bz['Coulomb']]

diff --git a/src/alchemlyb/visualisation/convergence.py b/src/alchemlyb/visualisation/convergence.py
@@ -1,21 +1,35 @@
 import matplotlib.pyplot as plt
+import pandas as pd
 from matplotlib.font_manager import FontProperties as FP
 import numpy as np
 
-def plot_convergence(forward, forward_error, backward, backward_error,
-                     units='kT', ax=None):
+from ..postprocessors.units import get_unit_converter
+
+def plot_convergence(*data, units='kT', ax=None):
     """Plot the forward and backward convergence.
 
+    The input could be the result from
+    :func:`~alchemlyb.convergence.forward_backward_convergence` or it could
+    be given explicitly as `forward`, `forward_error`, `backward`,
+    `backward_error`.
+
+    `forward`: A list of free energy estimate from the first X% of data,
+    where `forward_error` is the corresponding error.
+    
+    `backward`: A list of free energy estimate from the last X% of data.,
+    where `backward_error` is the corresponding error.
+
+    These four array_like objects should have the same
+    shape and can be used as input for the
+    :func:`matplotlib.pyplot.errorbar`.
+
     Parameters
     ----------
-    forward : List
-        A list of free energy estimate from the first X% of data.
-    forward_error : List
-        A list of error from the first X% of data.
-    backward : List
-        A list of free energy estimate from the last X% of data.
-    backward_error : List
-        A list of error from the last X% of data.
+    data : Dataframe or 4 array_like objects
+        Output Dataframe from
+        :func:`~alchemlyb.convergence.forward_backward_convergence`.
+        Or given explicitly as `forward`, `forward_error`, `backward`,
+        `backward_error` see :ref:`plot_convergence <plot_convergence>`.
     units : str
         The label for the unit of the estimate. Default: "kT"
     ax : matplotlib.axes.Axes
@@ -32,12 +46,31 @@ def plot_convergence(forward, forward_error, backward, backward_error,
     The code is taken and modified from
     `Alchemical Analysis <https://github.com/MobleyLab/alchemical-analysis>`_.
 
-    The units variable is for labelling only. Changing it doesn't change the
-    unit of the underlying variable.
+    If `data` is not an :class:pandas.Dataframe` produced by
+    :func:`~alchemlyb.convergence.forward_backward_convergence`,
+    the unit will be adjusted according to the units
+    variable. Otherwise, the units variable is for labelling only.
+    Changing it doesn't change the unit of the underlying variable.
+
 
+    .. versionchanged:: 0.6.0
+        data now takes in dataframe
 
     .. versionadded:: 0.4.0
     """
+    if len(data) == 1 and isinstance(data[0], pd.DataFrame):
+        dataframe = get_unit_converter(units)(data[0])
+        forward = dataframe['Forward'].to_numpy()
+        forward_error = dataframe['Forward_Error'].to_numpy()
+        backward = dataframe['Backward'].to_numpy()
+        backward_error = dataframe['Backward_Error'].to_numpy()
+    else:
+        try:
+            forward, forward_error, backward, backward_error = data
+        except ValueError: # pragma: no cover
+            raise ValueError('Ensure all four of forward, forward_error, '
+                             'backward, backward_error are supplied.')
+
     if ax is None: # pragma: no cover
         fig, ax = plt.subplots(figsize=(8, 6))
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		from .convergence import forward_backward_convergence