Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite the tutorials #302

Merged

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Dec 10, 2019

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Description in RELEASE_NOTES.md Added

Adding to RELEASE_NOTES.md (remove section after adding to RELEASE_NOTES.md)

Please add a single line in the release notes similar to the following:

- (#XX)[http://link-to-pr.com] Added feature which does something

Description of PR

This PR reworks most of the tutorials, in particular the "first-steps" tutorial. It updates the tutorial data source to the IAMC 1.5°C data for consistency with the IIASA-db tutorial, and improves consistency of the formatting and notation.

closes #298
closes #290

@coveralls
Copy link

coveralls commented Dec 10, 2019

Coverage Status

Coverage decreased (-0.4%) to 84.721% when pulling 84011be on danielhuppmann:tutorials/first_step_update into 9931f38 on IAMconsortium:master.

@danielhuppmann
Copy link
Member Author

@znicholls, can you take please a look at the updated "checking_consistency" notebook?

I created a smaller, more specific example to better highlight what is being checked and why each validation fails. I also replaced the term "database" with "scenario ensemble" to describe the content of an IamDataFrame rather than where it is stored, and made some other edits, hoping make it easier for new users...

@danielhuppmann
Copy link
Member Author

@byersiiasa, can you take a look whether the updated "first-steps" notebook satisfies the issues #290 and #298? (if you have a minute)

@jkikstra
Copy link
Collaborator

Hi @danielhuppmann. Thanks for reworking these tutorials! I think you have done a great job of making this first steps tutorial accessible.

Some quite minor points:

  1. [filtering]: 'The feature for filtering by model, scenario or region are implemented using exact string matching, where ...' should probably be changed to model, scenario, region, variable (and level), or year to be more complete and not confuse new users.
  2. [filtering]: you could add a tiny example under the 'filtering by year' description, e.g (df_selectedyears = df.filter(range(2010, 2051))). If you want, you can make it a box and then also use it to show it as a shorter timeseries next to the full display_df
  3. [numpy]: you could provide a hyperlink to the numpy package when you mention it when you're dealing with set_meta_from_data()

In general, when using pyam in practice for processing purposes, there can be a lot of converting between different forms. It would be good to make it as clear as possible when one has a wide or long format, and when it is a pyam.IamDataFrame and when it is a pandas.DataFrame. A new user would probably assume that everything that goes in as a pyam.IamDataFrame also returns one. Thus, every time that this not the case, it should be very clear (e.g. under 'displaying timeseries data' it is a bit too easy to miss right now). (this also holds for when a pandas.Series [e.g. overshoot] or even a multidimensional dataframe [e.g. df.meta] are returned, as this can cause quite some confusion because it is not always explicit enough)
In the same way, I think it is important to also emphasize what actions return an altered copy and what actions change the df itself.
Perhaps an overview table of some central functions with these two binary options (pyam/pandas, and copy/direct alteration) could be very useful for new users, and is more friendly then going into the API or testing it out.

Copy link
Collaborator

@znicholls znicholls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, small suggestions:

  • 'sum of wind and coal in that region' --> 'sum of wind and coal in region_a and region_b
  • ' is Primary Energy should be equal to Primary Energy|Coal plus Primary Energy|Coal' --> 'is Primary Energy equal to Primary Energy|Coal plus Primary Energy|Coal'
  • 'Note that the detailed output (i.e., where the aggregation validation fails) is not shown in a notebook when calling the function within a loop.', maybe add a pointer to a resource which explains how notebooks work for those who are confused by this

The one other thing you might want to add is an explainer of the handling of Bunkers. It's super confusing to everyone and having that in there might make it easier for people. Alternately it could be a separate tutorial just for 'advanced users' who have to deal with this stuff.

@francescolovat
Copy link

Hi @danielhuppmann ,

Great tutorial notebook, very illustrative and clear, even for not-very-experienced readers as myself.

Minor comments:

  • In exclude_on_fail section, the last sentence of this paragraph is not very clear:

Any scenario (by a particular model) failing the validation criteria is then marked as exclude=True. This "exclusion flag" is implemented in the meta table of the IamDataFrame, which can be used categorization and quantitative indicators (more below).

I think a preposition is missing before categorization. Moreover, I think a clear reference to the meta table section should be included instead of the parenthesis, stating sth like: "the second latest cell in XX shows the meta table" (I found it confusing talking about a meta table whose format it was not clarified until the command df.meta.head() at the end of the notebook).

  • In the Categorization assignment markdown cell I'd list some of the other possible arguments to be used by the plotting library. So the users could experiment with them (I'd have liked to do that).
  • I'd suggest to swap the order of the cells [14] and [15] (and their associated markdowns) with commands df.filter(level=1).variables() and df.filter(variable='Primary Energy*', level='1-').variables(), respectively. This is to follow the logic of the Filtering by variables and levels markdown cell, which is explaning the difference between filtering by both variable and level or with just level.

@danielhuppmann
Copy link
Member Author

response to @znicholls:

'Note that the detailed output (i.e., where the aggregation validation fails) is not shown in a notebook when calling the function within a loop.', maybe add a pointer to a resource which explains how notebooks work for those who are confused by this

Any good reference for this?

Re "handling of bunkers" - this wasn't included in the tutorial before. I'll see if I have time to extend the tutorial when I tackle #299.

@znicholls
Copy link
Collaborator

Any good reference for this?

2. Pretty Display of Variables of https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/ ?

Re "handling of bunkers" - this wasn't included in the tutorial before. I'll see if I have time to extend the tutorial when I tackle #299.

Ah ok cool makes sense.

@danielhuppmann
Copy link
Member Author

Thank you @jkikstra and @francescolovat for these great suggestions!

Highlights of my changes:

  • added a blue-highlighted cell about the expected return type of timeseries(), and some more of these cells throughout the notebook
  • moved the "filter-by-year" section, add an example
  • introduce the meta table earlier, so that it can be referenced in the validation and indicators section
  • add a second plot with the kwarg color='scenario' in the plotting example

About the suggestion by @jkikstra to create an overview table of functions and their return types - I understand that this is confusing for new users, but I'm worried about adding new parts to the documentation, because this will be difficult to maintain and keep up to date...

@byersiiasa
Copy link
Collaborator

@byersiiasa, can you take a look whether the updated "first-steps" notebook satisfies the issues #290 and #298? (if you have a minute)

thanks - yes, nicely implemented!

@gidden
Copy link
Member

gidden commented Dec 22, 2019

lgtm @danielhuppmann ! I tried restarting the mac CI that failed, not sure why it is failing or if this is a known issue.

@danielhuppmann
Copy link
Member Author

thanks @gidden!

this seems to be related to #281 - building the docs on travis installs kealib, which also installs matplotlib-base-2.2.4 on linux but matplotlib-base-3.1.2 on mac. no idea why this isn't a problem in other PRs

@danielhuppmann
Copy link
Member Author

no idea why Appveyor is now failing with ever more confusing error messages - maybe it's because now running on gidden:master...? it passed on "my" instance a few commits ago

merging to see if this resolves the issue

@danielhuppmann danielhuppmann merged commit e84eeb4 into IAMconsortium:master Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants