alan-turing-institute · edaub · May 10, 2022 · May 19, 2021 · May 19, 2021 · May 20, 2021
diff --git a/.travis.yml b/.travis.yml
@@ -1,6 +1,7 @@
 language: python
+dist: focal
 python:
-  - "3.7"
+  - "3.9"
 install:
   - pip install 'numpy>=1.19'
   - pip install -r requirements.txt

diff --git a/docs/demos/multioutput_tutorial.rst b/docs/demos/multioutput_tutorial.rst
@@ -0,0 +1,27 @@
+.. _multioutput_tutorial:
+
+Multi-Output Tutorial
+=====================
+
+*Note: This tutorial requires Scipy version 1.4 or later to run the simulator.*
+
+This page includes an end-to-end example of using ``mogp_emulator`` to perform model calibration
+with a simulator with multiple outputs. Note that this builds on the main tutorial with a
+second output (in this case, the velocity of the projectile at the end of the simulation),
+which is able to further constrain the NROY space as described in the first tutorial.
+
+.. literalinclude:: ../../mogp_emulator/demos/multioutput_tutorial.py
+
+One thing to note about multiple outputs is that they must be run as a script with a 
+``if __name__ == __main__`` block in order to correctly use the multiprocessing
+library. This can usually be done as in the example for short scripts, while for more
+complex analyses it is usually better to define functions (as in the benchmark for
+multiple outputs).
+
+More Details
+------------
+
+More details about these steps can be found in the :ref:`methods` section, or on the following page
+that goes into :ref:`more details <methoddetails>` on the options available in this software library.
+For more on the specific implementation detials, see the various
+:ref:`implementation pages <implementation>` describing the software components.
diff --git a/docs/implementation/DimensionReduction.rst b/docs/implementation/DimensionReduction.rst
@@ -30,3 +30,4 @@ Utilities
 .. rubric:: References
 .. [Fukumizu1] https://www.ism.ac.jp/~fukumizu/software.html
 .. [FL13] Fukumizu, Kenji and Chenlei Leng. "Gradient-based kernel dimension reduction for regression." Journal of the American Statistical Association 109, no. 505 (2014): 359-370
+.. [LG17] Liu, Xiaoyu and Guillas, Serge. "Dimension Reduction for Gaussian Process Emulation: An Application to the Influence of Bathymetry on Tsunami Heights." SIAM/ASA Journal on Uncertainty Quantification 5, no. 1 (2017): 787-812 https://doi.org/10.1137/16M1090648
diff --git a/docs/implementation/ExperimentalDesign.rst b/docs/implementation/ExperimentalDesign.rst
@@ -42,4 +42,19 @@ The ``LatinHypercubeDesign`` Class
     :members:
     :inherited-members:
 
+    .. automethod:: __init__
+
+**********************************
+The ``MaxiMinLHC`` Class
+**********************************
+
+.. _MaxiMinLHC:
+
+.. automodule:: mogp_emulator.ExperimentalDesign.MaxiMinLHC
+    :noindex:
+
+.. autoclass:: mogp_emulator.ExperimentalDesign.MaxiMinLHC
+    :members:
+    :inherited-members:
+
     .. automethod:: __init__
diff --git a/docs/implementation/implementation.rst b/docs/implementation/implementation.rst
@@ -11,6 +11,7 @@ mogp_emulator Implementation Details
    GaussianProcessGPU
    MultiOutputGP
    fitting
+   validation
    MeanFunction
    formula
    Kernel

diff --git a/docs/implementation/validation.rst b/docs/implementation/validation.rst
@@ -0,0 +1,9 @@
+.. _validation:
+
+**********************************
+The ``validation`` Module
+**********************************
+
+.. automodule:: mogp_emulator.validation
+    :members:
+    :noindex:
diff --git a/docs/index.rst b/docs/index.rst
@@ -32,6 +32,7 @@ details, and some included benchmarks.
              be used are:
 
    demos/gp_demos
+   demos/multioutput_tutorial
    demos/gp_kernel_demos
    demos/mice_demos
    demos/historymatch_demos

diff --git a/docs/intro/tutorial.rst b/docs/intro/tutorial.rst
@@ -58,7 +58,7 @@ the two input parameters of the drag coefficient :math:`C` and the initial veloc
 and returning a single value, which is :math:`x` at the end of the simulation.
 
 .. literalinclude:: ../../mogp_emulator/demos/projectile.py
-   :lines: 1-3,12-43,46-
+   :lines: 1-80
 
 Parameter Space
 ~~~~~~~~~~~~~~~
@@ -103,7 +103,7 @@ distributions are fairly common due to their simplicity.
 To construct our experimental design and draw samples from it, we do the following:
 
 .. literalinclude:: ../../mogp_emulator/demos/tutorial.py
-   :lines: 1-4,24-28
+   :lines: 1-5,25-29
 
 This constructs an instance of :ref:`LatinHypercubeDesign <LatinHypercubeDesign>`, and
 creates the underlying distributions by providing a list of tuples. Each tuple gives the
@@ -131,11 +131,13 @@ by passing the GP object to the ``fit_GP_MAP`` function, which returns the same
 GP object but with the parameter values estimated.
 
 .. literalinclude:: ../../mogp_emulator/demos/tutorial.py
-   :lines: 33-37
+   :lines: 34-40
 
-While the function is called ``fit_GP_MAP`` (MAP means Maximum A Posteriori),
-in this case we have not provided any prior information on the parameter values,
-so it results in MLE.
+By default, if no priors are specified for the hyperparameters then defaults
+are chosen. In particular, for correlation lengths, default priors are fit
+that attempt to put most of the distribution mass in the range spanned by
+the input data. This tends to stabilize the fitting and improve performance,
+as fewer iterations are needed to ensure a good fit.
 
 Following fitting, we print out some of the hyperparameters that are estimated.
 First, we print out the correlation lengths estimated for each of the input
@@ -162,15 +164,19 @@ and the uncertainty. This is done with the ``predict`` method of
 :ref:`GaussianProcess <GaussianProcess>`:
 
 .. literalinclude:: ../../mogp_emulator/demos/tutorial.py
-   :lines: 44-52
+   :lines: 46-55
 
 ``predictions`` is an object containing the mean and uncertainty (variance)
 of the predictions. A GP assumes that the outputs follow a Normal Distribution,
 so we can perform validation by asking how many of our validation points mean estimates
-are within 2 standard deviations of the true value. Usually for this example this is
-about 8/10, so not quite as we would expect if it were perfectly recreating the
-function. However, we will see that this still is good enough in most cases
-for the task at hand.
+are within 2 standard deviations of the true value by computing the standard errors
+of the emulator predictions on the validation points. ``mogp_emulator`` contains
+a number of methods of automatically validating an emulator given some validation
+points, including computing standard errors (see the :ref:`validation <validation>`
+documentation for more details). Usually for this example we would expect
+about 8/10 to be within 2 standard devations, so not quite as we would expect if
+it were perfectly recreating the function. However, we will see that this still is
+good enough in most cases for the task at hand.
 
 History Matching
 ~~~~~~~~~~~~~~~~
@@ -206,7 +212,7 @@ and Monte Carlo sampling (especially in only 2 dimensions). Then, we create a
 Yet" (NROY). This is done as follows:
 
 .. literalinclude:: ../../mogp_emulator/demos/tutorial.py
-   :lines: 58-65
+   :lines: 60-68
 
 First, we set a large number of samples and draw them from the experimental design object. Then,
 We construct the :ref:`HistoryMatching <HistoryMatching>` object by giving the fit GP
@@ -226,7 +232,7 @@ surrogate model for reference. This plotting command is only executed if ``matpl
 installed:
 
 .. literalinclude:: ../../mogp_emulator/demos/tutorial.py
-   :lines: 5-10,69-
+   :lines: 6-11,71-
 
 which should make a plot that looks something like this:
 

diff --git a/mogp_emulator/DimensionReduction.py b/mogp_emulator/DimensionReduction.py
@@ -1,6 +1,7 @@
 """This module provides classes and utilities for performing dimension
-reduction.  Currently there is a single class :class:`mogp_emulator.gKDR` which implements
-the method of Fukumizu and Leng [FL13]_.
+reduction.  There is a single class :class:`mogp_emulator.gKDR` which
+implements the method of Fukumizu and Leng [FL13]_, and which can be
+used jointly with Gaussian process emulation as in [LG17]_.
 
 Example: ::
 
@@ -120,21 +121,11 @@ class gKDR(object):
 
     """Dimension reduction by the gKDR method.
 
-    See link [Fukumizu1]_ (and in particular, [FL13]_) for details of
-    the method.
-
-    Note that this is a simpler and faster method than the original
-    "KDR" method by the same authors (but with an added
-    approximation).  The KDR method will be implemented separately.
+    See [Fukumizu1]_, [FL13]_ and [LG17]_.
 
     An instance of this class is callable, with the ``__call__``
     method taking an input coordinate and mapping it to a reduced
     coordinate.
-
-    Note that this class currently implements a *direct* translation
-    of the Matlab implementation of KernelDeriv (see link above) into
-    Python/NumPy.  It is due to be replaced with a Fortran
-    implementation, but this should not affect the interface.
     """
 
     def __init__(self, X, Y, K=None, X_scale = 1.0, Y_scale = 1.0, EPS=1E-8, SGX=None, SGY=None):
@@ -378,18 +369,17 @@ def tune_parameters(cls, X, Y, train_model, cXs=None, cYs=None,
         within the kernel, minimizing the the loss from a
         Gaussian process regression:
 
-          >>> from mogp_emulator import gKDR
-          >>> from mogp_emulator import GaussianProcess
+          >>> from mogp_emulator import gKDR, GaussianProcess, fit_GP_MAP
           >>> X = ...
           >>> Y = ...
-          >>> dr, loss = gKDR.tune_parameters(X, Y, GaussianProcess.train_model)
+          >>> dr, loss = gKDR.tune_parameters(X, Y, fit_GP_MAP)
           >>> gp = GaussianProcess(dr(X), Y)
 
         Or, specifying some optional parameters for the lengthscales,
         the maximum value of `K` to use, the number of folds for
         cross-validation, and producing verbose output:
 
-          >>> dr, loss = gKDR.tune_parameters(X, Y, GaussianProcess.train_model,
+          >>> dr, loss = gKDR.tune_parameters(X, Y, fit_GP_MAP,
           ...                                 cXs = [0.5, 1.0, 2.0], cYs = [2.0],
           ...                                 maxK = 25, cross_validation_folds=4, verbose = True)
 

diff --git a/mogp_emulator/ExperimentalDesign.py b/mogp_emulator/ExperimentalDesign.py
@@ -1,5 +1,6 @@
 import numpy as np
 import scipy.stats
+from scipy.spatial.distance import pdist
 from inspect import signature
 
 class ExperimentalDesign(object):
@@ -235,13 +236,13 @@ def _draw_samples(self, n_samples):
         """
         raise NotImplementedError
 
-    def sample(self, n_samples):
+    def sample(self, n_samples, **kwargs):
         """
         Draw parameter samples from the experimental design
 
         This method implements drawing parameter samples from the experimental design. The method does
         this by calling the ``_draw_samples`` method to obtain samples from the :math:`[0,1]^n` hypercube,
-        where :math:`n` is the number of parameters. The ``sample``method then transforms these samples
+        where :math:`n` is the number of parameters. The ``sample`` method then transforms these samples
         drawn from the low level method to the actual parameter values using the PPF functions provided
         when initilizing the object. Note that this method also checks that all parameter values are
         finite; if any ``NaN`` values are returned, an error will be raised.
@@ -250,6 +251,9 @@ def sample(self, n_samples):
         using a different protocol only needs to change the ``_draw_samples`` method. This makes it
         simpler to define new designs, as only a single method needs to be altered.
 
+        Also accepts a ``kwargs`` argument to allow other derived classes to implement additional
+        keyword arguments.
+
         :param n_samples: Number of samples to be drawn from the design (must be a positive integer)
         :type n_samples: int
         :returns: Samples drawn from the design parameter space as a numpy array with shape
@@ -261,16 +265,13 @@ def sample(self, n_samples):
         assert n_samples > 0, "number of samples must be positive"
 
         sample_values = np.zeros((n_samples, self.get_n_parameters()))
-        random_draws = self._draw_samples(n_samples)
+        random_draws = self._draw_samples(n_samples, **kwargs)
 
         assert np.all(random_draws >= 0.) and np.all(random_draws <= 1.), "error in generating random samples"
 
         for (dist, index) in zip(self.distributions, range(self.get_n_parameters())):
-            try:
-                sample_values[:,index] = dist(random_draws[:,index])
-            except:
-                for sample_index in range(n_samples):
-                    sample_values[sample_index, index] = dist(random_draws[sample_index,index])
+            for sample_index in range(n_samples):
+                sample_values[sample_index, index] = dist(random_draws[sample_index,index])
 
         assert np.all(np.isfinite(sample_values)), "error due to non-finite values of parameters"
 
@@ -406,7 +407,7 @@ def __init__(self, *args):
         self.method = "Monte Carlo"
         super().__init__(*args)
 
-    def _draw_samples(self, n_samples):
+    def _draw_samples(self, n_samples, **kwargs):
         """
         Low level method for drawing random samples from a Monte Carlo design
 
@@ -546,7 +547,7 @@ def __init__(self, *args):
         self.method = "Latin Hypercube"
         super().__init__(*args)
 
-    def _draw_samples(self, n_samples):
+    def _draw_samples(self, n_samples, **kwargs):
         """
         Low level method for drawing random samples from a Latin Hypercube design
 
@@ -560,7 +561,7 @@ def _draw_samples(self, n_samples):
 
         :param n_samples: Number of samples to be drawn from the design (must be a positive integer)
         :type n_samples: int
-        :returns: Random Monte Carlo samples drawn from the :math:`[0,1]^n` hypercube as a numpy
+        :returns: Random samples drawn from the :math:`[0,1]^n` hypercube as a numpy
                   array with shape ``(n_samples, n_parameters)``
         :rtype: ndarray
         """
@@ -580,4 +581,94 @@ def _draw_samples(self, n_samples):
 
         assert np.all(random_samples >= 0.) and np.all(random_samples <= 1.), "error in generating latin hypercube samples"
 
-        return random_samples
+        return random_samples
+
+class MaxiMinLHC(LatinHypercubeDesign):
+    def __init__(self, *args):
+        """
+        Class representing a one-shot design of experiments with uncorrelated parameters using
+        MaxiMin Latin Hypercube Sampling
+
+        This class provides an implementation for a class for designing experiments to sample
+        the parameter space of a complex model using MaxiMin Latin Hypercube sampling. MaxiMin
+        LHCs repeatedly draw samples from the base LHC design, keeping the realization that
+        maximizes the minimum pairwise distance between all design points. Because of this,
+        MaxiMin designs tend to spread their samples closer to the edge of the parameter
+        space and in many cases result in more accurate sampling than a single LHC draw.
+
+        The parameter space can be specified in a variety of ways, but essentially the user must
+        provide a Probability Point Function (PPF, or inverse of the Cumulative Distribution Function)
+        for each input parameter. Each PPF function takes a single numeric input and maps from
+        the interval :math:`[0,1]` to the desired parameter distribution value for a given parameter,
+        and each parameter has a separate function describing its distribution. Note that this makes
+        the assumption of no correlations between any of the parameter values (a future version may
+        implement an experimental design where there are such parameter correlations). Once the
+        design is initialized, a desired number of samples can be drawn from the design, returning
+        an array holding the desired number of samples from the parameter space.
+
+        Internally, the class holds the set of PPFs for all of the parameter values, and samples are
+        drawn by calling the ``sample`` method. To draw the samples, the ``_draw_samples`` is used
+        to generate a series of points in the :math:`[0,1]^n` hypercube using MaxiMin Latin Hypercube
+        sampling, where :math:`n` is the number of paramters. This set of samples from the Latin
+        Hypercube is then mapped to the parameter space using the given PPF functions.
+
+        Unlike Monte Carlo sampling, Latin Hypercube designs attempt to sample more uniformly from the
+        parameter space. Latin Hypercube sampling ensures that each sample is drawn from a different
+        part of the space for each parameter. For example, if four samples are drawn, then for each
+        parameter, one sample is guaranteed to be drawn from each quartile of the distribution. This
+        ensures a more uniform sampling when compared on Monte Carlo sampling, but requires slightly
+        more computation to generate the samples. Note however, that for very large numbers of parameters,
+        Latin Hypercubes still may not sample very efficiently. This is due to the fact that the size of
+        the parameter space grows exponentially with the number of dimensions, so a fixed number of
+        samples will sample the space more poorly as the number of parameters increases.
+        """
+        self.method = "MaxiMinLHC"
+        super().__init__(*args)
+
+    def _draw_samples(self, n_samples, n_tries=1000, **kwargs):
+        """
+        Sampling method for MaxiMin LHCs
+
+        Iterates over multiple LHCs and return the one that maximizes the
+        minimum distance between pairs of points. The number of iterations
+        over which to look for this MaxiMin critera can be specified by
+        ``n_tries``.
+
+        Distances are computed by ``scipy.spatial.distance.pdist``. Any
+        additional ``kwargs`` passed here will be sent on to the ``pdist``
+        function.
+
+        :param n_samples: Number of samples to be drawn from the design (must be a positive integer)
+        :type n_samples: int
+        :param n_tries: Number of LHC realizations to use in maximizing
+                        the MaxiMin criteria
+        :type n_tries: int
+        :param **kwargs: Keyword arguments to be passed to the
+                         ``scipy.spatial.distance.pdist`` function.
+        :returns: Random Monte Carlo samples drawn from the :math:`[0,1]^n` hypercube as a numpy
+                  array with shape ``(n_samples, n_parameters)``
+        :rtype: ndarray
+        """
+
+        n_samples = int(n_samples)
+        assert n_samples > 0, "number of samples must be positive"
+        assert n_tries > 0, "n_tries must be a positive integer"
+        n_tries = int(n_tries)
+
+        n_parameters = self.get_n_parameters()
+
+        best_samples = np.empty((n_samples, n_parameters))
+        max_dist = -np.inf
+
+        for i in range(n_tries):
+            random_samples = super()._draw_samples(n_samples)
+            min_dist = np.min(pdist(random_samples, **kwargs))
+            if min_dist > max_dist:
+                max_dist = min_dist
+                best_samples = random_samples
+
+        assert np.all(best_samples >= 0.0) and np.all(
+            best_samples <= 1.0
+        ), "error in generating latin hypercube samples"
+
+        return best_samples