Fix the syntax for adding multiple ensemble members from the same dataset (#678)

SarahAlidoost · bouweandela · web-flow · commit cba916612e73 · 2020-06-22T12:12:11.000+02:00
* fix the syntax for multiple ensembles, add a note, fix some typo

* Fix multiple expansion

* update the doc about multiple ensembles

* Update doc/recipe/overview.rst

Co-authored-by: Bouwe Andela &lt;b.andela@esciencecenter.nl&gt;

Co-authored-by: Bouwe Andela &lt;b.andela@esciencecenter.nl&gt;
diff --git a/doc/recipe/overview.rst b/doc/recipe/overview.rst
@@ -38,7 +38,7 @@ the following:
     documentation:
       description: |
         Recipe to produce time series figures of the derived variable, the
-        Atlantic meriodinal overturning circulation (AMOC).
+        Atlantic meridional overturning circulation (AMOC).
         This recipe also produces transect figures of the stream functions for
         the years 2001-2004.
 
@@ -102,8 +102,8 @@ Here it is an example concatenating the `historical` experiment with `rcp85`
     datasets:
       - {dataset: CanESM2, project: CMIP5, exp: [historical, rcp85], ensemble: r1i1p1, start_year: 2001, end_year: 2004}
 
-It is also possible to define the ensemble as a list, although it is useful only
-case the two experiments have different ensemble names
+It is also possible to define the ensemble as a list when the two experiments have different ensemble names.
+In this case, the specified datasets are concatenated into a single cube:
 
 .. code-block:: yaml
 
@@ -113,22 +113,24 @@ case the two experiments have different ensemble names
 ESMValTool also supports a simplified syntax to add multiple ensemble members from the same dataset.
 In the ensemble key, any element in the form `(x:y)` will be replaced with all numbers from x to y (both inclusive),
 adding a dataset entry for each replacement. For example, to add ensemble members r1i1p1 to r10i1p1
-you can use the following abreviatted syntax:
+you can use the following abbreviated syntax:
 
 .. code-block:: yaml
 
     datasets:
-      - {dataset: CanESM2, project: CMIP5, exp: historical, ensemble: r(1:10)i1p1, start_year: 2001, end_year: 2004}
+      - {dataset: CanESM2, project: CMIP5, exp: historical, ensemble: "r(1:10)i1p1", start_year: 2001, end_year: 2004}
 
 It can be included multiple times in one definition. For example, to generate the datasets definitions
 for the ensemble members r1i1p1 to r5i1p1 and from r1i2p1 to r5i1p1 you can use:
 
 .. code-block:: yaml
 
     datasets:
-      - {dataset: CanESM2, project: CMIP5, exp: historical, ensemble: r(1:5)i(1:2)p1, start_year: 2001, end_year: 2004}
+      - {dataset: CanESM2, project: CMIP5, exp: historical, ensemble: "r(1:5)i(1:2)p1", start_year: 2001, end_year: 2004}
 
 Please, bear in mind that this syntax can only be used in the ensemble tag.
+Also, note that the combination of multiple experiments and ensembles, like
+exp: [historical, rcp85], ensemble: [r1i1p1, "r(2:3)i1p1"] is not supported and will raise an error.
 
 Note that this section is not required, as datasets can also be provided in the
 Diagnostics_ section.
@@ -140,7 +142,7 @@ Diagnostics_ section.
 Recipe section: ``preprocessors``
 =================================
 
-The preprocessor section of the recipe includes one or more preprocesors, each
+The preprocessor section of the recipe includes one or more preprocessors, each
 of which may call the execution of one or several preprocessor functions.
 
 Each preprocessor section includes:
diff --git a/esmvalcore/_recipe.py b/esmvalcore/_recipe.py
@@ -1015,20 +1015,31 @@ def _expand_ensemble(variables):
         """
         expanded = []
         regex = re.compile(r'\(\d+:\d+\)')
+
+        def expand_ensemble(variable):
+            ens = variable.get('ensemble', "")
+            match = regex.search(ens)
+            if match:
+                start, end = match.group(0)[1:-1].split(':')
+                for i in range(int(start), int(end) + 1):
+                    expand = deepcopy(variable)
+                    expand['ensemble'] = regex.sub(str(i), ens, 1)
+                    expand_ensemble(expand)
+            else:
+                expanded.append(variable)
+
         for variable in variables:
             ensemble = variable.get('ensemble', "")
-            if not isinstance(ensemble, str):
+            if isinstance(ensemble, (list, tuple)):
+                for elem in ensemble:
+                    if regex.search(elem):
+                        raise RecipeError(
+                            f"In variable {variable}: ensemble expansion "
+                            "cannot be combined with ensemble lists")
                 expanded.append(variable)
-                continue
-            match = regex.search(ensemble)
-            if not match:
-                expanded.append(variable)
-                continue
-            start, end = match.group(0)[1:-1].split(':')
-            for i in range(int(start), int(end) + 1):
-                expand = deepcopy(variable)
-                expand['ensemble'] = regex.sub(str(i), ensemble, 1)
-                expanded.append(expand)
+            else:
+                expand_ensemble(variable)
+
         return expanded
 
     def _initialize_variables(self, raw_variable, raw_datasets):
diff --git a/tests/unit/test_recipe.py b/tests/unit/test_recipe.py
@@ -0,0 +1,42 @@
+import pytest
+
+from esmvalcore._recipe import Recipe
+from esmvalcore._recipe_checks import RecipeError
+
+
+class TestRecipe:
+    def test_expand_ensemble(self):
+
+        datasets = [
+            {
+                'dataset': 'XYZ',
+                'ensemble': 'r(1:2)i(2:3)p(3:4)',
+            },
+        ]
+
+        expanded = Recipe._expand_ensemble(datasets)
+
+        ensembles = [
+            'r1i2p3',
+            'r1i2p4',
+            'r1i3p3',
+            'r1i3p4',
+            'r2i2p3',
+            'r2i2p4',
+            'r2i3p3',
+            'r2i3p4',
+        ]
+        for i, ensemble in enumerate(ensembles):
+            assert expanded[i] == {'dataset': 'XYZ', 'ensemble': ensemble}
+
+    def test_expand_ensemble_nolist(self):
+
+        datasets = [
+            {
+                'dataset': 'XYZ',
+                'ensemble': ['r1i1p1', 'r(1:2)i1p1']
+            },
+        ]
+
+        with pytest.raises(RecipeError):
+            Recipe._expand_ensemble(datasets)