Skip to content

Commit

Permalink
rewrite the tutorials (#302)
Browse files Browse the repository at this point in the history
  • Loading branch information
danielhuppmann authored Dec 23, 2019
1 parent 57be418 commit e84eeb4
Show file tree
Hide file tree
Showing 25 changed files with 1,829 additions and 8,402 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ script:
- make test
# only test docs once to make sure everything works on most recent python
- cd doc
- if [[ "${PYENV}" == "py37" && "${TRAVIS_OS_NAME}" != 'windows' ]]; then conda install --yes kealib==1.4.7; make html; fi
- if [[ "${PYENV}" == "py37" && "${TRAVIS_OS_NAME}" != 'windows' ]]; then make html; fi
- cd ..

after_success:
Expand Down
74 changes: 43 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
pyam: a Python toolkit for Integrated Assessment Modeling
=========================================================
pyam: analysis and visualization of integrated-assessment scenarios
===================================================================

**Documentation on [Read the Docs](https://pyam-iamc.readthedocs.io)**

Expand All @@ -8,43 +8,55 @@ pyam: a Python toolkit for Integrated Assessment Modeling
Overview and scope
------------------

The ``pyam`` package provides a range of diagnostic tools and functions
for analyzing and working with IAMC-format timeseries data.
The open-source Python package ``pyam`` provides a suite of tools and functions
for analyzing and visualizing input data (i.e., assumptions/parametrization)
and results (model output) of integrated-assessment scenarios.

Features:
- Summary of models, scenarios, variables, and regions included in a snapshot.
- Display of timeseries data as pandas.DataFrame with IAMC-specific filtering
options.
- Simple visualization and plotting functions.
- Diagnostic checks for non-reported variables or timeseries data to identify
outliers and potential reporting issues.
- Categorization of scenarios according to timeseries data or meta-identifiers
for further analysis.
Key features:

The package can be used with timeseries data that follows the data template
convention of the [Integrated Assessment Modeling Consortium](http://www.globalchange.umd.edu/iamc/) (IAMC).
An illustrative example is shown below;
see [data.ene.iiasa.ac.at/database](http://data.ene.iiasa.ac.at/database/)
for more information.
- Simple analysis of timeseries data in the IAMC format
(more about it [here](https://pyam-iamc.readthedocs.io/en/stable/data.html))
with an interface similar in feel and style to the widely
used [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)
- Advanced visualization and plotting functions
(see the [gallery](https://pyam-iamc.readthedocs.io/en/stable/examples/index.html))
- Diagnostic checks for scripted validation of scenario data and results

| **model** | **scenario** | **region** | **variable** | **unit** | **2005** | **2010** | **2015** |
|--------------|--------------|------------|----------------|----------|----------|----------|----------|
| MESSAGE V.4 | AMPERE3-Base | World | Primary Energy | EJ/y | 454.5 | 479.6 | ... |
| ... | ... | ... | ... | ... | ... | ... | ... |
Data model
----------

An illustrative example of the timeseries format developed by the
[Integrated Assessment Modeling Consortium](http://www.globalchange.umd.edu/iamc/) (IAMC)
is shown below.
The row is taken from the [IAMC 1.5°C scenario explorer](https://data.ene.iiasa.ac.at/iamc-1.5c-explorer),
showing a scenario from the [CD-LINKS](https://www.cd-links.org) project.
[Read the docs](https://pyam-iamc.readthedocs.io/en/stable/data.html)
for more information on the IAMC format and the ``pyam`` data model.

Tutorial
--------
| **model** | **scenario** | **region** | **variable** | **unit** | **2005** | **2010** | **2015** |
|-----------|--------------|------------|----------------|----------|----------|----------|----------|
| MESSAGE | CD-LINKS 400 | World | Primary Energy | EJ/y | 462.5 | 500.7 | ... |
| ... | ... | ... | ... | ... | ... | ... | ... |

A comprehensive tutorial for the basic functions is included
in [the first tutorial](doc/source/tutorials/pyam_first_steps.ipynb)
using a partial snapshot of the IPCC AR5 scenario database.

Tutorials
---------

An introduction to the basic functions is shown
in [the "first-steps" notebook](doc/source/tutorials/pyam_first_steps.ipynb).

All tutorials are available in rendered format (i.e., with output) as part of
the [online documentation](https://pyam-iamc.readthedocs.io/en/stable/tutorials.html).
The source code of the tutorials notebooks is available
in the folder [doc/source/tutorials](doc/source/tutorials) of this repository.

Documentation
-------------

The documentation pages can be built locally.
See the instruction in [doc/README](doc/README.md).
The complete documentation is hosted on [Read the Docs](https://pyam-iamc.readthedocs.io).

The documentation pages can be built locally,
refer to the instruction in [doc/README](doc/README.md).

Authors
-------
Expand All @@ -56,7 +68,7 @@ and Daniel Huppmann ([@danielhuppmann](https://github.com/danielhuppmann/)).
License
-------

Copyright 2017-2018 IIASA Energy Program
Copyright 2017-2019 IIASA Energy Program

The ``pyam`` package is licensed
under the Apache License, Version 2.0 (the "License");
Expand Down Expand Up @@ -91,7 +103,7 @@ conda activate pyam # may be simply `source activate pyam` or just `activate p
make -B virtual-environment
```

To check everything has installed correctly,
To check everything has installed correctly, run

```
pytest tests
Expand Down
1 change: 1 addition & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@

# Next Release

- [#302](https://github.com/IAMconsortium/pyam/pull/302) Rework the tutorials
- [#301](https://github.com/IAMconsortium/pyam/pull/301) Bugfix when using `to_excel()` with a `pd.ExcelWriter`
- [#297](https://github.com/IAMconsortium/pyam/pull/297) Add `empty` attribute, better error for `timeseries()` on empty dataframe
- [#295](https://github.com/IAMconsortium/pyam/pull/295) Include `meta` table when writing to or reading from `xlsx` files
Expand Down
2 changes: 1 addition & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ install:
- conda --version
- activate testing
- conda install -y numpy pandas pyyaml xlrd xlsxwriter seaborn==0.9.0 six requests jupyter nbconvert proj4==5.2.0 pywin32
- conda install -y -c conda-forge matplotlib==3.0.3 libiconv gdal fiona "geopandas<0.5.0" cartopy cython pyproj==1.9.6
- conda install -y -c conda-forge matplotlib==3.0.3 pyproj==1.9.6

build: false

Expand Down
Binary file modified doc/source/_static/iamc_template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Key features:
- Simple analysis of timeseries data in the IAMC format (more about it `here`_)
with an interface similar in feel and style to the widely
used `pandas.DataFrame`_
- Advanced visualization and plotting function (see the `gallery`_)
- Advanced visualization and plotting functions (see the `gallery`_)
- Diagnostic checks for scripted validation of scenario data and results

The source code for |pyam| is available on `Github`_.
Expand Down
Binary file removed doc/source/tutorials/_static/AMPERE-Logo.png
Binary file not shown.
Binary file removed doc/source/tutorials/_static/EMF-Logo_v2.1.png
Binary file not shown.
Binary file modified doc/source/tutorials/_static/IAMC_logo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed doc/source/tutorials/_static/IIASA_logo.png
Binary file not shown.
Binary file added doc/source/tutorials/_static/cdlinks_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

236 changes: 236 additions & 0 deletions doc/source/tutorials/checking_consistency.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Checking consistency of a scenario ensemble\n",
"\n",
"It has happened in previous model comparison exercises that the reported data was not internally consistent. This can be due to incomplete variable hierarchies, reporting templates incompatible with model specifications, or user error.\n",
"\n",
"In this tutorial, we show how to make the most of **pyam** to check that a scenario ensemble (or just a single scenario) is complete and that timeseries data \"add up\" across regions and along the variable tree (i.e., that the sum of values of the subcategories such as `Primary Energy|*` are identical to the values of the category `Primary Energy`).\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
" This feature of the <b>pyam</b> package currently only supports \"consistency\"\n",
" in the sense of a strictly hierarchical variable tree\n",
" (with subcategories summing up to the category value)\n",
" and subregions of depth 1 adding up the \"World\" region.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import pyam"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We start with a hypothetical tutorial data set, which is constructed to highlight the individual validation features below.\n",
"\n",
"The scenario below has two inconsistencies:\n",
"\n",
"1. In year `2010` and regions `region_b` & `World`, the values of coal and wind do not add up to the total `Primary Energy` value\n",
"2. In year `2020` in the `World` region, the value of `Primary Energy` and `Primary Energy|Coal` is not the sum of `region_a` and `region_b` <br />\n",
" (but the sum of wind and coal to `Primary Energy` in each sub-region is correct)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tutorial_df = pd.DataFrame([\n",
" ['World', 'Primary Energy', 'EJ/y', 7, 15],\n",
" ['World', 'Primary Energy|Coal', 'EJ/y', 4, 11],\n",
" ['World', 'Primary Energy|Wind', 'EJ/y', 2, 4],\n",
" ['region_a', 'Primary Energy', 'EJ/y', 4, 8],\n",
" ['region_a', 'Primary Energy|Coal', 'EJ/y', 2, 6],\n",
" ['region_a', 'Primary Energy|Wind', 'EJ/y', 2, 2],\n",
" ['region_b', 'Primary Energy', 'EJ/y', 3, 6],\n",
" ['region_b', 'Primary Energy|Coal', 'EJ/y', 2, 4],\n",
" ['region_b', 'Primary Energy|Wind', 'EJ/y', 0, 2],\n",
"],\n",
" columns=['region', 'variable', 'unit', 2010, 2020]\n",
")\n",
"\n",
"df = pyam.IamDataFrame(data=tutorial_df, model='model_a', scenario='scen_a')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"With the [check_internal_consistency()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_internal_consistency) feature, we can check the internal consistency of a scenario ensemble (i.e., an `IamDataFrame` instance).\n",
"If this method returns `None`, the database is internally consistent (i.e. the total variables are the sum of the sectoral breakdowns and the regional breakdown).\n",
"\n",
"In the rest of this tutorial, we give you a chance to better understand this method. We go through what it is actually doing and show you the kind of output you can expect."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking that variables are the sum of their components\n",
"\n",
"We are going to use the [check_aggregate()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_aggregate) method of the `IamDataFrame`\n",
"to check that the components of a variable add up to its total.\n",
"This method takes [np.is_close()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.isclose.html) arguments as keyword arguments. We show our recommended settings here."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np_isclose_args = {\n",
" 'equal_nan': True,\n",
" 'rtol': 1e-03,\n",
" 'atol': 1e-05,\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [check_aggregate()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_aggregate) function allows us to quickly verify whether a given variable is the sum of its sectoral components (e.g. `Primary Energy` should be equal to `Primary Energy|Coal` plus `Primary Energy|Wind`). The validation is performed separately for each region.\n",
"\n",
"This section illustrates the first constructed inconsistency in this scenario. The returned [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) indicates where the aggregate is not equal to the sum of components."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.check_aggregate('Primary Energy', **np_isclose_args)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In practice, it would now be up to the user to determine the cause of the inconsistency (or confirm that this is expected for some reason)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Checking multiple variables\n",
"\n",
"We can now construct a loop over all variables in this `IamDataFrame`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for variable in df.variables():\n",
" df.check_aggregate(variable, **np_isclose_args)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The log tells us the same message as in the previous example, and it shows that the other two variables (coal and wind) cannot be assessed because they have no subcategories.\n",
"\n",
"<div class=\"alert alert-block alert-info\">\n",
"Note that the detailed output (i.e., where the aggregation validation fails) is not shown in a notebook when calling the function within a loop.<br />\n",
" Read <a href=\"https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/\">this page</a> for helpful tips and tricks when working with Jupyter notebooks.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking that timeseries subregions sum to aggregate regions\n",
"\n",
"Similarly to checking that the sum of a variable's components give the declared total shown above, we can check that summing over subregions returns the value of a region.\n",
"\n",
"To do this, we use the [check_aggregate_region](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_aggregate_region) function. By default, this method checks that all the regions in the dataframe sum to `World`. \n",
"\n",
"Using this function allows us to quickly check if a regional total for a single variable is equal to the sum of its regional values.\n",
"This section illustrates the second constructed inconsistency in this scenario. \n",
"The returned [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) indicates where the timeseries at the `region='World'` level is not equal to the sum of regional components."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.check_aggregate_region('Primary Energy', **np_isclose_args)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking complete internal consistency of a scenario (ensemble)\n",
"\n",
"The previous sections illustrated two functions to validate specific variables across their subcategories or regional breakdown. These two functions are combined in the [check_internal_consistency()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.check_internal_consistency) feature.\n",
"\n",
"If we have an internally consistent scenario ensemble (or single scenario), the function will return `None`; otherwise, it will return a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) indicating all detected inconsistencies.\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
" Note that at the moment, this method assumes that all the regions sum to the <b>World</b> region. See <a href=\"https://github.com/IAMconsortium/pyam/issues/106\">this issue</a> for more information.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.check_internal_consistency()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The output of this function reports both types of illustrative inconsistencies in the scenario constructed for this tutorial."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit e84eeb4

Please sign in to comment.