rewrite the tutorials (#302)

IAMconsortium · Dec 23, 2019 · e84eeb4 · e84eeb4
1 parent 57be418
commit e84eeb4
Show file tree

Hide file tree

Showing 25 changed files with 1,829 additions and 8,402 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -37,7 +37,7 @@ script:
   - make test
   # only test docs once to make sure everything works on most recent python
   - cd doc
-  - if [[ "${PYENV}" == "py37" && "${TRAVIS_OS_NAME}" != 'windows' ]]; then conda install --yes kealib==1.4.7; make html; fi
+  - if [[ "${PYENV}" == "py37" && "${TRAVIS_OS_NAME}" != 'windows' ]]; then make html; fi
   - cd ..
 
 after_success:

diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
-pyam: a Python toolkit for Integrated Assessment Modeling
-=========================================================
+pyam: analysis and visualization of integrated-assessment scenarios
+===================================================================
 
 **Documentation on [Read the Docs](https://pyam-iamc.readthedocs.io)**
 
@@ -8,43 +8,55 @@ pyam: a Python toolkit for Integrated Assessment Modeling
 Overview and scope
 ------------------
 
-The ``pyam`` package provides a range of diagnostic tools and functions
-for analyzing and working with IAMC-format timeseries data.
+The open-source Python package ``pyam`` provides a suite of tools and functions
+for analyzing and visualizing input data (i.e., assumptions/parametrization) 
+and results (model output) of integrated-assessment scenarios.
 
-Features:
-- Summary of models, scenarios, variables, and regions included in a snapshot.
-- Display of timeseries data as pandas.DataFrame with IAMC-specific filtering
-  options.
-- Simple visualization and plotting functions.
-- Diagnostic checks for non-reported variables or timeseries data to identify
-  outliers and potential reporting issues.
-- Categorization of scenarios according to timeseries data or meta-identifiers
-  for further analysis.
+Key features:
 
-The package can be used with timeseries data that follows the data template
-convention of the [Integrated Assessment Modeling Consortium](http://www.globalchange.umd.edu/iamc/) (IAMC).
-An illustrative example is shown below;
-see [data.ene.iiasa.ac.at/database](http://data.ene.iiasa.ac.at/database/)
-for more information.
+ - Simple analysis of timeseries data in the IAMC format
+   (more about it [here](https://pyam-iamc.readthedocs.io/en/stable/data.html))
+   with an interface similar in feel and style to the widely
+   used [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)
+ - Advanced visualization and plotting functions
+   (see the [gallery](https://pyam-iamc.readthedocs.io/en/stable/examples/index.html))
+ - Diagnostic checks for scripted validation of scenario data and results
 
-| **model**    | **scenario** | **region** | **variable**   | **unit** | **2005** | **2010** | **2015** |
-|--------------|--------------|------------|----------------|----------|----------|----------|----------|
-| MESSAGE V.4  | AMPERE3-Base | World      | Primary Energy | EJ/y     |    454.5 |    479.6 |      ... |
-| ...          | ...          | ...        | ...            | ...      |      ... |      ... |      ... |
+Data model
+----------
 
+An illustrative example of the timeseries format developed by the
+[Integrated Assessment Modeling Consortium](http://www.globalchange.umd.edu/iamc/) (IAMC)
+is shown below.
+The row is taken from the [IAMC 1.5°C scenario explorer](https://data.ene.iiasa.ac.at/iamc-1.5c-explorer),
+showing a scenario from the [CD-LINKS](https://www.cd-links.org) project.
+[Read the docs](https://pyam-iamc.readthedocs.io/en/stable/data.html)
+for more information on the IAMC format and the ``pyam`` data model.
 
-Tutorial
---------
+| **model** | **scenario** | **region** | **variable**   | **unit** | **2005** | **2010** | **2015** |
+|-----------|--------------|------------|----------------|----------|----------|----------|----------|
+| MESSAGE   | CD-LINKS 400 | World      | Primary Energy | EJ/y     |    462.5 |    500.7 |      ... |
+| ...       | ...          | ...        | ...            | ...      |      ... |      ... |      ... |
 
-A comprehensive tutorial for the basic functions is included
-in [the first tutorial](doc/source/tutorials/pyam_first_steps.ipynb)
-using a partial snapshot of the IPCC AR5 scenario database.
+
+Tutorials
+---------
+
+An introduction to the basic functions is shown
+in [the "first-steps" notebook](doc/source/tutorials/pyam_first_steps.ipynb).
+
+All tutorials are available in rendered format (i.e., with output) as part of
+the [online documentation](https://pyam-iamc.readthedocs.io/en/stable/tutorials.html).
+The source code of the tutorials notebooks is available
+in the folder [doc/source/tutorials](doc/source/tutorials) of this repository.
 
 Documentation
 -------------
 
-The documentation pages can be built locally.
-See the instruction in [doc/README](doc/README.md).
+The complete documentation is hosted on [Read the Docs](https://pyam-iamc.readthedocs.io).
+
+The documentation pages can be built locally,
+refer to the instruction in [doc/README](doc/README.md).
 
 Authors
 -------
@@ -56,7 +68,7 @@ and Daniel Huppmann ([@danielhuppmann](https://github.com/danielhuppmann/)).
 License
 -------
 
-Copyright 2017-2018 IIASA Energy Program
+Copyright 2017-2019 IIASA Energy Program
 
 The ``pyam`` package is licensed
 under the Apache License, Version 2.0 (the "License");
@@ -91,7 +103,7 @@ conda activate pyam  # may be  simply `source activate pyam` or just `activate p
 make -B virtual-environment
 ```
 
-To check everything has installed correctly,
+To check everything has installed correctly, run
 
 ```
 pytest tests

diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
@@ -1,6 +1,7 @@
 
 # Next Release
 
+- [#302](https://github.com/IAMconsortium/pyam/pull/302) Rework the tutorials
 - [#301](https://github.com/IAMconsortium/pyam/pull/301) Bugfix when using `to_excel()` with a `pd.ExcelWriter`
 - [#297](https://github.com/IAMconsortium/pyam/pull/297) Add `empty` attribute, better error for `timeseries()` on empty dataframe 
 - [#295](https://github.com/IAMconsortium/pyam/pull/295) Include `meta` table when writing to or reading from `xlsx` files

diff --git a/appveyor.yml b/appveyor.yml
@@ -26,7 +26,7 @@ install:
   - conda --version
   - activate testing
   - conda install -y numpy pandas pyyaml xlrd xlsxwriter seaborn==0.9.0 six requests jupyter nbconvert proj4==5.2.0 pywin32
-  - conda install -y -c conda-forge matplotlib==3.0.3 libiconv gdal fiona "geopandas<0.5.0" cartopy cython pyproj==1.9.6
+  - conda install -y -c conda-forge matplotlib==3.0.3 pyproj==1.9.6
 
 build: false
 

diff --git a/doc/source/_static/iamc_template.png b/doc/source/_static/iamc_template.png
diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -54,7 +54,7 @@ Key features:
  - Simple analysis of timeseries data in the IAMC format (more about it `here`_)
    with an interface similar in feel and style to the widely
    used `pandas.DataFrame`_
- - Advanced visualization and plotting function (see the `gallery`_)
+ - Advanced visualization and plotting functions (see the `gallery`_)
  - Diagnostic checks for scripted validation of scenario data and results
 
 The source code for |pyam| is available on `Github`_.

diff --git a/doc/source/tutorials/_static/AMPERE-Logo.png b/doc/source/tutorials/_static/AMPERE-Logo.png
diff --git a/doc/source/tutorials/_static/EMF-Logo_v2.1.png b/doc/source/tutorials/_static/EMF-Logo_v2.1.png
diff --git a/doc/source/tutorials/_static/IAMC_logo.jpg b/doc/source/tutorials/_static/IAMC_logo.jpg
diff --git a/doc/source/tutorials/_static/IIASA_logo.png b/doc/source/tutorials/_static/IIASA_logo.png
diff --git a/doc/source/tutorials/_static/cdlinks_logo.png b/doc/source/tutorials/_static/cdlinks_logo.png
diff --git a/doc/source/tutorials/aggregating_variables_and_plotting_with_negative_values.ipynb b/doc/source/tutorials/aggregating_variables_and_plotting_with_negative_values.ipynb
diff --git a/doc/source/tutorials/checking_consistency.ipynb b/doc/source/tutorials/checking_consistency.ipynb
@@ -0,0 +1,236 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Checking consistency of a scenario ensemble\n",
+    "\n",
+    "It has happened in previous model comparison exercises that the reported data was not internally consistent. This can be due to incomplete variable hierarchies, reporting templates incompatible with model specifications, or user error.\n",
+    "\n",
+    "In this tutorial, we show how to make the most of **pyam** to check that a scenario ensemble (or just a single scenario) is complete and that timeseries data \"add up\" across regions and along the variable tree (i.e., that the sum of values of the subcategories such as `Primary Energy|*` are identical to the values of the category `Primary Energy`).\n",
+    "\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
+    "    This feature of the <b>pyam</b> package currently only supports \"consistency\"\n",
+    "    in the sense of a strictly hierarchical variable tree\n",
+    "    (with subcategories summing up to the category value)\n",
+    "    and subregions of depth 1 adding up the \"World\" region.\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import pyam"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We start with a hypothetical tutorial data set, which is constructed to highlight the individual validation features below.\n",
+    "\n",
+    "The scenario below has two inconsistencies:\n",
+    "\n",
+    "1. In year `2010` and regions `region_b` & `World`, the values of coal and wind do not add up to the total `Primary Energy` value\n",
+    "2. In year `2020` in the `World` region, the value of `Primary Energy` and `Primary Energy|Coal` is not the sum of `region_a` and `region_b` <br />\n",
+    "   (but the sum of wind and coal to `Primary Energy` in each sub-region  is correct)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tutorial_df = pd.DataFrame([\n",
+    "    ['World', 'Primary Energy', 'EJ/y', 7, 15],\n",
+    "    ['World', 'Primary Energy|Coal', 'EJ/y', 4, 11],\n",
+    "    ['World', 'Primary Energy|Wind', 'EJ/y', 2, 4],\n",
+    "    ['region_a', 'Primary Energy', 'EJ/y', 4, 8],\n",
+    "    ['region_a', 'Primary Energy|Coal', 'EJ/y', 2, 6],\n",
+    "    ['region_a', 'Primary Energy|Wind', 'EJ/y', 2, 2],\n",
+    "    ['region_b', 'Primary Energy', 'EJ/y', 3, 6],\n",
+    "    ['region_b', 'Primary Energy|Coal', 'EJ/y', 2, 4],\n",
+    "    ['region_b', 'Primary Energy|Wind', 'EJ/y', 0, 2],\n",
+    "],\n",
+    "    columns=['region', 'variable', 'unit', 2010, 2020]\n",
+    ")\n",
+    "\n",
+    "df = pyam.IamDataFrame(data=tutorial_df, model='model_a', scenario='scen_a')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "With the [check_internal_consistency()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_internal_consistency) feature, we can check the internal consistency of a scenario ensemble (i.e., an `IamDataFrame` instance).\n",
+    "If this method returns `None`, the database is internally consistent (i.e. the total variables are the sum of the sectoral breakdowns and the regional breakdown).\n",
+    "\n",
+    "In the rest of this tutorial, we give you a chance to better understand this method. We go through what it is actually doing and show you the kind of output you can expect."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Checking that variables are the sum of their components\n",
+    "\n",
+    "We are going to use the [check_aggregate()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_aggregate) method of the `IamDataFrame`\n",
+    "to check that the components of a variable add up to its total.\n",
+    "This method takes [np.is_close()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.isclose.html) arguments as keyword arguments. We show our recommended settings here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "np_isclose_args = {\n",
+    "    'equal_nan': True,\n",
+    "    'rtol': 1e-03,\n",
+    "    'atol': 1e-05,\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The [check_aggregate()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_aggregate) function allows us to quickly verify whether a given variable is the sum of its sectoral components (e.g. `Primary Energy` should be equal to `Primary Energy|Coal` plus `Primary Energy|Wind`). The validation is performed separately for each region.\n",
+    "\n",
+    "This section illustrates the first constructed inconsistency in this scenario. The returned [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) indicates where the aggregate is not equal to the sum of components."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.check_aggregate('Primary Energy', **np_isclose_args)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In practice, it would now be up to the user to determine the cause of the inconsistency (or confirm that this is expected for some reason)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Checking multiple variables\n",
+    "\n",
+    "We can now construct a loop over all variables in this `IamDataFrame`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for variable in df.variables():\n",
+    "    df.check_aggregate(variable, **np_isclose_args)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The log tells us the same message as in the previous example, and it shows that the other two variables (coal and wind) cannot be assessed because they have no subcategories.\n",
+    "\n",
+    "<div class=\"alert alert-block alert-info\">\n",
+    "Note that the detailed output (i.e., where the aggregation validation fails) is not shown in a notebook when calling the function within a loop.<br />\n",
+    "    Read <a href=\"https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/\">this page</a> for helpful tips and tricks when working with Jupyter notebooks.\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Checking that timeseries subregions sum to aggregate regions\n",
+    "\n",
+    "Similarly to checking that the sum of a variable's components give the declared total shown above, we can check that summing over subregions returns the value of a region.\n",
+    "\n",
+    "To do this, we use the [check_aggregate_region](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_aggregate_region) function. By default, this method checks that all the regions in the dataframe sum to `World`. \n",
+    "\n",
+    "Using this function allows us to quickly check if a regional total for a single variable is equal to the sum of its regional values.\n",
+    "This section illustrates the second constructed inconsistency in this scenario. \n",
+    "The returned [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) indicates where the timeseries at the `region='World'` level is not equal to the sum of regional components."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.check_aggregate_region('Primary Energy', **np_isclose_args)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Checking complete internal consistency of a scenario (ensemble)\n",
+    "\n",
+    "The previous sections illustrated two functions to validate specific variables across their subcategories or regional breakdown. These two functions are combined in the [check_internal_consistency()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_internal_consistency) feature.\n",
+    "\n",
+    "If we have an internally consistent scenario ensemble (or single scenario), the function will return `None`; otherwise, it will return a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) indicating all detected inconsistencies.\n",
+    "\n",
+    "<div class=\"alert alert-block alert-warning\">\n",
+    "    Note that at the moment, this method assumes that all the regions sum to the <b>World</b> region. See <a href=\"https://github.com/IAMconsortium/pyam/issues/106\">this issue</a> for more information.\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.check_internal_consistency()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The output of this function reports both types of illustrative inconsistencies in the scenario constructed for this tutorial."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}