WIP: Filling out data with pyam defaults #193

gidden · 2019-02-12T09:51:57Z

Please confirm that this PR has done the following:

Tests Added
Documentation Added
Description in RELEASE_NOTES.md Added

Description of PR

The idea here is to provide a method that takes a pandas dataframe (currently assumed to be in so-called long-format) with additional observations as columns and pivot them into a 'variable' column. Then other required pyam defaults are added and a pyam.IamDataFrame is returned.

Tests etc to come.

@danielhuppmann maybe worth taking a look now, just to make sure this basically jives with what you also have been working on?

pyam/core.py

coveralls · 2019-02-12T10:07:18Z

Coverage decreased (-0.8%) to 83.42% when pulling 2a7a896 on gidden:pd-to-pyam into b665608 on IAMconsortium:master.

coveralls · 2019-02-12T10:07:19Z

Coverage decreased (-0.8%) to 83.351% when pulling 391ea23 on gidden:pd-to-pyam into a6ac0c5 on IAMconsortium:master.

pyam/core.py

danielhuppmann · 2019-02-18T08:29:40Z

Thanks @gidden for this really useful new feature! Before diving into the nitty-gritty of the review, I'd like to take this to the meta-level of possible use cases:

We have a pd.DataFrame that doesn't have all the required columns for casting, so I see three possible use cases. Assuming that column col from the standard pyam.IamDataFrame is missing, this column could be filled by three methods (as kwargs):

value={'col': 'col_prev'}: assuming that the values of col_1 are numeric, this would rename col_prev to value and add another column df[col] = col_prev (this is the current implementation, if I understand it correctly)
col=['col_prev_1', 'col_prev_2']: merge columns to form a new IAMC-compatible dataframe, i.e., df['col'] = df.apply(lambda x: '|'.join([x[i] for i in ['col_prev_1', 'col_prev_2']], axis=1)
col='foo': create a new column with the value, i.e. df[col] = 'foo'

gidden · 2019-02-18T08:51:25Z

Hey @danielhuppmann, thanks for the suggestions!

As implemented, I think this covers cases 1 and 3. Let's say you have a dataframe as follows:

a b year
-  - -----
1 2 10
3 4 20

if you then call df_to_pyam you would get

df.to_pyam()

variable value year .... (defaults)
--------- ------ -----
a           1      10
a           3      20
b           2      10
b           4      20

df.to_pyam(model='foo')

variable value year model .... (defaults)
--------- ------ ----- --------
a           1      10    foo
a           3      20    foo
b           2      10    foo
b           4      20    foo

We could add an argument for concatenation. For example:

df.to_pyam(concat_variable=['c', 'd']) # make variable name out of columns c and d

or

df.to_pyam(concat={'variable': ['c', 'd']}) # make variable name out of columns c and d

The only question here is then how to intuit the value column (my first thought is that the user would have to supply such a column already).

Finally thought: my initial decision was to force the user to provide correctly named columns. For example, my use case has a column name iso which needs to be renamed to region. I do this as

df.rename(columns={'iso': 'region'}).to_pyam()

We could do some additional logic which does this for them like

df.rename().to_pyam(region='iso')

How does that sound?

danielhuppmann · 2019-02-18T09:10:01Z

Thanks for the response!

The only question here is then how to intuit the value column (my first thought is that the user would have to supply such a column already).

I would say that in use case 2, the value column must exist.

df.to_pyam(concat={'variable': ['c', 'd']})

I don't think we need a kwarg concat, this could just be df.to_pyam(variable=['c', 'd']) without interfering with the other use cases. But I would request that your first example must be called by df.to_pyam(value=['a', 'b']), otherwise you implicitly assume that all columns must be numeric and melt-able in this way.

df.rename().to_pyam(region='iso')

Not sure why there is an empty .rename() in here, but I would think that use case 2 already supports this by having df.to_pyam(region=['iso']) (or even skipping the list) - which I think is a really more intuitive and simple approach than doing some operations a-priori with a pd.rename() and some operations within to_pyam().

I'd be happy to give this a shot and PR into your branch.

danielhuppmann · 2019-02-20T14:51:05Z

closing in favour of #199

stickler-ci reviewed Feb 12, 2019

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

pyam/core.py Outdated Show resolved Hide resolved

pyam/core.py Outdated Show resolved Hide resolved

stickler-ci reviewed Feb 18, 2019

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

pyam/core.py Outdated Show resolved Hide resolved

gidden added 4 commits February 18, 2019 08:21

initial impl for giving default pyam dataframes

00866bf

move again into core

057fdb4

small fixes

cd62127

update concat to try to cast to dataframe

95a5208

gidden force-pushed the pd-to-pyam branch from 715c570 to 95a5208 Compare February 18, 2019 07:22

appease stickler

391ea23

danielhuppmann mentioned this pull request Feb 20, 2019

Initialize an IamDataFrame from pd.DataFrame with formatting specs #199

Merged

3 tasks

danielhuppmann closed this Feb 20, 2019

gidden deleted the pd-to-pyam branch June 15, 2022 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Filling out data with pyam defaults #193

WIP: Filling out data with pyam defaults #193

gidden commented Feb 12, 2019

coveralls commented Feb 12, 2019

coveralls commented Feb 12, 2019 •

edited

Loading

danielhuppmann commented Feb 18, 2019

gidden commented Feb 18, 2019

danielhuppmann commented Feb 18, 2019

danielhuppmann commented Feb 20, 2019

WIP: Filling out data with pyam defaults #193

WIP: Filling out data with pyam defaults #193

Conversation

gidden commented Feb 12, 2019

Please confirm that this PR has done the following:

Description of PR

coveralls commented Feb 12, 2019

coveralls commented Feb 12, 2019 • edited Loading

danielhuppmann commented Feb 18, 2019

gidden commented Feb 18, 2019

danielhuppmann commented Feb 18, 2019

danielhuppmann commented Feb 20, 2019

coveralls commented Feb 12, 2019 •

edited

Loading