Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended consistency checking #268

Merged
merged 8 commits into from
Sep 26, 2019

Conversation

znicholls
Copy link
Collaborator

@znicholls znicholls commented Sep 23, 2019

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Add this feature to aggregation tutorial
  • Description in RELEASE_NOTES.md Added

Adding to RELEASE_NOTES.md (remove section after adding to RELEASE_NOTES.md)

Please add a single line in the release notes similar to the following:

- (#XX)[http://link-to-pr.com] Added feature which does something

Description of PR

Extends the checking of internal consistency of data. The use case is this: a variable is reported at the World level only, but is more than one level deeper in the hierarchy than the variable being considered. Previously, our check_aggregate functions would raise an error, now they find this variable and correctly realise that the data is internally consistent.

pyam/core.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Sep 23, 2019

Coverage Status

Coverage decreased (-0.2%) to 87.848% when pulling f107db1 on znicholls:extended-consistency-checking into 127680a on IAMconsortium:master.

@znicholls znicholls marked this pull request as ready for review September 23, 2019 07:48
@danielhuppmann
Copy link
Member

thanks @znicholls, but I'm not sure that this change makes sense - do we really want to "jump" over levels in the variable tree? Wouldn't it make more sense to require that the variable foo|bar is also defined at the region level?

pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
@znicholls
Copy link
Collaborator Author

Wouldn't it make more sense to require that the variable foo|bar is also defined at the region level?

I think it would but I also think that ship has sailed. For example, aviation and shipping are often reported at the World level only and you need to take account of that when checking regional sums. Pyam can already do that, this PR just makes it so it can take account variables which are deeper in the hierarchy. That is required in some cases (the current one for me being sorting variables into MAGICC's fossil and afolu boxes: I need to put aviation below fossil but make sure aviation is counted when making aggregates/checking consistency).

pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
@znicholls
Copy link
Collaborator Author

Not sure why windows tests are failing, will try rebasing...

@danielhuppmann
Copy link
Member

@znicholls, still not quite convinced...

For example, aviation and shipping are often reported at the World level only and you need to take account of that when checking regional sums.

That, I agree with - but if you have a structure like

  • Emissions|CO2|<sectors> defined for all regions
  • Emissions|CO2|Aviation|<sub-categories> defined at the World level
    I would still argue that it should be required that Emissions|CO2|Aviation is defined also at the World level - in which case the current implementation works (i.e., checks that Aviation is the sum of sub-categories and includes it in the validation of global CO2 emissions).

@znicholls
Copy link
Collaborator Author

That, I agree with - but if you have a structure like

Yes, but if you have a structure like

  • Emissions|CO2 (at the world and regional level)
  • Emissions|CO2|Fossil (at the world and regional level)
  • Emissions|CO2|Fossil|sub-sectors (where sub-sectors doesn't include Aviation)
  • Emissions|CO2|Fossil|Aviation (at the world level only)

then the current setup will fail when checking the aggregation of Emissions|CO2 because it will add all the Emissions|CO2|Fossil variables at the regional level and won't find the required Emissions|CO2|Fossil|Aviation at the world level

@gidden
Copy link
Member

gidden commented Sep 25, 2019

I'm +1 with @znicholls on this one. Bunkers are a consistent pain the in side for all emissions-related calculations, and I think first class support for them in pyam operations is an important feature to include.

That being said, would it make sense to add a few more tests here @znicholls? At the moment, I see that you added a shipping sector in the test data, but I don't think any tests are being applied explicitly for this feature (correct me if I'm wrong)..

Also, would it be useful to add this new capability to the existing aggregation tutorial? I guarantee that folks who don't know it exists a prior will never find it buried either in the code or this PR.

@gidden
Copy link
Member

gidden commented Sep 25, 2019

And ho-boy, it looks like windows tests are failing in our tutorial tests with ModuleNotFoundError: No module named 'win32api'. That looks like a doozy... I can try to look into it soon.

@gidden
Copy link
Member

gidden commented Sep 25, 2019

@znicholls can you try adding conda install pywin32 to appveyor.yml? I have no idea why this is needed now. Maybe Appveyor changed some default configs...

@znicholls
Copy link
Collaborator Author

At the moment, I see that you added a shipping sector in the test data, but I don't think any tests are being applied explicitly for this feature (correct me if I'm wrong)..

Adding the shipping sector made the test suite fail (before the change) so I think that's testing for this feature in some way. Having said that, happy to add whatever tests you think are sensible (I just couldn't think of other obvious ones beyond what we already have).

Also, would it be useful to add this new capability to the existing aggregation tutorial?

Will put it on the todos.

@gidden
Copy link
Member

gidden commented Sep 25, 2019

At the moment, I see that you added a shipping sector in the test data, but I don't think any tests are being applied explicitly for this feature (correct me if I'm wrong)..

Adding the shipping sector made the test suite fail (before the change) so I think that's testing for this feature in some way. Having said that, happy to add whatever tests you think are sensible (I just couldn't think of other obvious ones beyond what we already have).

Ok, music to my ears, but also not obvious! I guess I was thinking of just including a consistency check of the sector total, but maybe not needed? I leave the decision in your expert hands.

@znicholls
Copy link
Collaborator Author

Ok, music to my ears, but also not obvious! I guess I was thinking of just including a consistency check of the sector total, but maybe not needed? I leave the decision in your expert hands.

Added a couple more which hopefully make things more obvious. Over to you

pyam/core.py Outdated Show resolved Hide resolved
Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the changes, but I'm still not convinced that this is a good path forward for the check_consistency() function - it only works on an IamDataFramethat has

  • a completely hierarchical variable tree
    (it will fail as soon as you have Final Energy|<sectors> and Final Energy|<fuels>)
  • assumes summation over all variables
  • has exactly one "global" region and n subregions (but no sub-sub-regions)

@danielhuppmann
Copy link
Member

@gidden, over to you for a final review and merge

@znicholls
Copy link
Collaborator Author

I approve the changes, but I'm still not convinced that this is a good path forward for the check_consistency() function - it only works on an IamDataFramethat has

I agree that this only works for a specific use case. However that's the use case I currently have (pulling out the CMIP6 data, doing some operations on it, making sure I didn't cock anything up along the way). I think the limitations you point out are great things to go for next, I just have no idea how to implement them. Maybe we make a new issue for this?

@gidden
Copy link
Member

gidden commented Sep 26, 2019

Agreed that there are limitations here, but I see us building up a clean interface. When more features are needed, we can dig into the details, but the community writ large also needs to decide on those details (how do we define sub-sub regions, e.g.).

@danielhuppmann you know that a pet peeve of mine is the hierarchical tree anyway, so something that forces us more in that direction is not a negative at least to me =)

In any case, regarding that, I do think we can come up with some intelligent "cleaning" operations that basically takes a non-strict hierarchy, makes it strict, does an op, and then deals with additional data after. But I think that is a fight for another time..

@gidden
Copy link
Member

gidden commented Sep 26, 2019

thanks all for the robust discussion!

@gidden gidden merged commit dd7ae0c into IAMconsortium:master Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants