Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Filling out data with pyam defaults #193

Closed
wants to merge 5 commits into from

Conversation

gidden
Copy link
Member

@gidden gidden commented Feb 12, 2019

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Description in RELEASE_NOTES.md Added

Description of PR

The idea here is to provide a method that takes a pandas dataframe (currently assumed to be in so-called long-format) with additional observations as columns and pivot them into a 'variable' column. Then other required pyam defaults are added and a pyam.IamDataFrame is returned.

Tests etc to come.

@danielhuppmann maybe worth taking a look now, just to make sure this basically jives with what you also have been working on?

pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 83.42% when pulling 2a7a896 on gidden:pd-to-pyam into b665608 on IAMconsortium:master.

@coveralls
Copy link

coveralls commented Feb 12, 2019

Coverage Status

Coverage decreased (-0.8%) to 83.351% when pulling 391ea23 on gidden:pd-to-pyam into a6ac0c5 on IAMconsortium:master.

pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
@danielhuppmann
Copy link
Member

Thanks @gidden for this really useful new feature! Before diving into the nitty-gritty of the review, I'd like to take this to the meta-level of possible use cases:

We have a pd.DataFrame that doesn't have all the required columns for casting, so I see three possible use cases. Assuming that column col from the standard pyam.IamDataFrame is missing, this column could be filled by three methods (as kwargs):

  • value={'col': 'col_prev'}: assuming that the values of col_1 are numeric, this would rename col_prev to value and add another column df[col] = col_prev (this is the current implementation, if I understand it correctly)
  • col=['col_prev_1', 'col_prev_2']: merge columns to form a new IAMC-compatible dataframe, i.e., df['col'] = df.apply(lambda x: '|'.join([x[i] for i in ['col_prev_1', 'col_prev_2']], axis=1)
  • col='foo': create a new column with the value, i.e. df[col] = 'foo'

@gidden
Copy link
Member Author

gidden commented Feb 18, 2019

Hey @danielhuppmann, thanks for the suggestions!

As implemented, I think this covers cases 1 and 3. Let's say you have a dataframe as follows:

a b year
-  - -----
1 2 10
3 4 20

if you then call df_to_pyam you would get

df.to_pyam()

variable value year .... (defaults)
--------- ------ -----
a           1      10
a           3      20
b           2      10
b           4      20
df.to_pyam(model='foo')

variable value year model .... (defaults)
--------- ------ ----- --------
a           1      10    foo
a           3      20    foo
b           2      10    foo
b           4      20    foo

We could add an argument for concatenation. For example:

df.to_pyam(concat_variable=['c', 'd']) # make variable name out of columns c and d

or

df.to_pyam(concat={'variable': ['c', 'd']}) # make variable name out of columns c and d

The only question here is then how to intuit the value column (my first thought is that the user would have to supply such a column already).

Finally thought: my initial decision was to force the user to provide correctly named columns. For example, my use case has a column name iso which needs to be renamed to region. I do this as

df.rename(columns={'iso': 'region'}).to_pyam()

We could do some additional logic which does this for them like

df.rename().to_pyam(region='iso')

How does that sound?

@danielhuppmann
Copy link
Member

Thanks for the response!

The only question here is then how to intuit the value column (my first thought is that the user would have to supply such a column already).

I would say that in use case 2, the value column must exist.

df.to_pyam(concat={'variable': ['c', 'd']})

I don't think we need a kwarg concat, this could just be df.to_pyam(variable=['c', 'd']) without interfering with the other use cases. But I would request that your first example must be called by df.to_pyam(value=['a', 'b']), otherwise you implicitly assume that all columns must be numeric and melt-able in this way.

df.rename().to_pyam(region='iso')

Not sure why there is an empty .rename() in here, but I would think that use case 2 already supports this by having df.to_pyam(region=['iso']) (or even skipping the list) - which I think is a really more intuitive and simple approach than doing some operations a-priori with a pd.rename() and some operations within to_pyam().

I'd be happy to give this a shot and PR into your branch.

@danielhuppmann
Copy link
Member

closing in favour of #199

@gidden gidden deleted the pd-to-pyam branch June 15, 2022 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants