Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend IamDataFrame to use extra data columns and sub-annual time #132

Closed
wants to merge 29 commits into from

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Nov 2, 2018

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Description in RELEASE_NOTES.md Added

Description of PR

This PR extends an IamDataFrame to have extra (custom) columns in the data data frame, and to distinguish between

  • years (as integers, possibly with a column subannual (as string)
  • continuous time using pd.to_datetime()
    The "type" of time representation is stored in self.time_col, the additional custom columns in self.data in self.extra_cols.

One quickfix required to get stuff working: read_files() is downgraded to take only one file, not a list of files. Because this would require checking that all imported data has the same time format and same extra columns (or add them on the fly).

ToDo:

  • change "year" to self.time_col for basic functionality (e.g., in timeseries())
  • change *_IDX usage in the package to replace "year" by self.time_col and use self.extra_cols
  • figure out how to refactor append() (see issue above with read_files())
  • reimplement multiple files in read_files()

Related issue and discussion

Closes #123

@danielhuppmann
Copy link
Member Author

Help requested

@gidden and @znicholls, what would be an elegant way for running each test three times, once with the existing examples, once with a "subannual" version ("winter day" style), and once for a continuous-time example?

@danielhuppmann
Copy link
Member Author

revised as suggested by @znicholls to cover initialising an IamDataFrame from wide format with columns as datetime, see danielhuppmann#8

@danielhuppmann
Copy link
Member Author

Help requested

@znicholls, can you implement an appropriate equivalent of the pyam.utils.years_match() function for the new time column formatted as datetime? I guess that this will require some from-to filtering option (e.g., for years, I would use df.filter(year=range(2030, 2051))) to get a downselection of the time period.

@danielhuppmann
Copy link
Member Author

@znicholls
Copy link
Collaborator

@danielhuppmann I've got a bit confused here. Do you still need my help with the filter function? I'm guessing we need more tests first as they're currently all passing?

pyam/iiasa.py Outdated Show resolved Hide resolved
pyam/iiasa.py Outdated Show resolved Hide resolved
pyam/iiasa.py Outdated Show resolved Hide resolved
pyam/iiasa.py Outdated Show resolved Hide resolved
pyam/iiasa.py Outdated Show resolved Hide resolved
pyam/iiasa.py Outdated Show resolved Hide resolved
pyam/iiasa.py Outdated Show resolved Hide resolved
pyam/iiasa.py Outdated Show resolved Hide resolved
pyam/iiasa.py Outdated Show resolved Hide resolved
@danielhuppmann
Copy link
Member Author

@znicholls, yes, I still need your help with a filter function for the time column. Thanks!

Re the tests, only a few tests currently check for multiple time formats (all those using test_df). So the implementation works insofar as the IamDataFrame now takes a year format (as int) or time (as datetime-castable), with some sanity checks that it can only be one of those. Basic functions like regions(), filter() and append() work (except no filtering for time).

Any other column (string that can't be cast to int) in the input data is now kept in data, with a list of these columns stored as df.extra_cols. These can also be used as args in filter() and can be added with timeseries(iamc_index=False).

The connection to the IIASA-db's are refactored so that they work with the new implementation, including the tutorial.

@znicholls and @gidden, I suggest that you review and merge unless you see major problems. The existing use cases work as expected for the year format (ie, CMIP6 and IPCC SR15), but ensuring that all extra features like check_aggregate() and the plotting library work with both formats will be too much for one PR (and too much for me). So I'd keep the time format as "beta" feature and refactor functions as needed going forward.

@danielhuppmann danielhuppmann changed the title WIP - extend IamDataFrame to use extra data columns and sub-annual time extend IamDataFrame to use extra data columns and sub-annual time Nov 8, 2018
pyam/core.py Show resolved Hide resolved
pyam/core.py Show resolved Hide resolved
pyam/core.py Show resolved Hide resolved
pyam/core.py Show resolved Hide resolved
pyam/core.py Show resolved Hide resolved
tests/test_core.py Show resolved Hide resolved
tests/test_core.py Show resolved Hide resolved
tests/test_core.py Show resolved Hide resolved
tests/test_core.py Show resolved Hide resolved
tests/test_core.py Show resolved Hide resolved
@danielhuppmann
Copy link
Member Author

@znicholls, I removed the test for extra columns in meta, because this feature is not yet supported and we need to discuss this a bit further...

@znicholls
Copy link
Collaborator

znicholls commented Nov 29, 2018 via email

@danielhuppmann
Copy link
Member Author

@znicholls , the datetime tests seem to have problems with python 2, can you take a look?

@danielhuppmann
Copy link
Member Author

@znicholls, re having a failing test for meta and extra columns

Yep i get that. But if we mark it as xfail, the testsuite will still pass and we have an easy reminder of what we need to think about.

I'd be ok to leave it in if I was convinced that this is the ideal implementation for the long term. But I'd rather have some more discussions on that first...

@znicholls
Copy link
Collaborator

I'd be ok to leave it in if I was convinced that this is the ideal implementation for the long term. But I'd rather have some more discussions on that first...

cool, will take a look at Python2 stuff now

pyam/core.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
@danielhuppmann
Copy link
Member Author

@gidden and @znicholls, this should be good to go now...

Follow-up issues that we should discuss after this PR is merged:

  • how to treat meta by extra columns, requires further discussion
  • refactor the tests, split out into multiple test files? test_core.py is pretty long now, some tests are executed for multiple year/time formats, others are only executed on the year format
  • write a tutorial for the datetime feature, include a description in the docs

@znicholls
Copy link
Collaborator

Looks good to me.

Follow-up issues

Can we split these out into issues? You can assign me to the datetime feature tutorial.

@danielhuppmann
Copy link
Member Author

@znicholls, yes, I meant that we would add these issues after the merge - just thought that it would be clearer what still needs to be done for the larger picture when I spell it out here. Thanks for volunteering!

@znicholls
Copy link
Collaborator

just thought that it would be clearer what still needs to be done for the larger picture when I spell it out here

aha very good, one step ahead of me!

danielhuppmann and others added 24 commits December 19, 2018 23:29
* Add test of extra col init behaviour

* Add failing tests of time filtering

* Setup time filtering tests

* Pass test filter year

* Redo tests of time filtering and include super messy first steps towards implementation

* Fill out tests and reset core

* Finish implementation of time filtering, cleaning up needed

* Refactor core so apply filters can use self.time_col
@danielhuppmann
Copy link
Member Author

another rebase after merge of #162 to try to appease stickler (which should have been appeased already)

@danielhuppmann
Copy link
Member Author

closing in favour of #167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switching from years as integers to pd.datetime
4 participants