Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour in stack_plot() #266

Closed
danielhuppmann opened this issue Sep 5, 2019 · 5 comments · Fixed by #464
Closed

Unexpected behaviour in stack_plot() #266

danielhuppmann opened this issue Sep 5, 2019 · 5 comments · Fixed by #464

Comments

@danielhuppmann
Copy link
Member

Description of the issue

I've been trying to use the stack_plot() function for some follow-up SR15 assessment, and noticed two major issues.

  1. Timeseries data starting with zero are ignored completely
  2. Multiple crossings of the zero baseline end up shown wrong

Illustration

import pandas as pd
import pyam
df = pyam.IamDataFrame(pd.DataFrame([
    ['a', 1, 2, 3, 4],
    ['b', 0, 1, 2, 3],
    ['c', -1, 1, -1, 1],
    ['d', 1, 1, 1, -1]
    ],
    columns=['variable', 2010, 2020, 2030, 2040],
), model='model_a', scenario='scen_a', region='World', unit='some_unit')
df.stack_plot()

image

More insights

In the source of stack_plot(), there is this docstring...

    # Sort lines so that negative timeseries are on the right, positive
    # timeseries are on the left and timeseries which go from positive to
    # negative are ordered such that the timeseries which goes negative first
    # is on the right (case of timeseries which go from negative to positive
    # is an edge case we haven't thought about as it's unlikely to apply to
    # us).
@znicholls
Copy link
Collaborator

Nice pick up @danielhuppmann !

  1. Timeseries data starting with zero are ignored completely

I think this can be fixed by altering

pos_cols = [c for c in _df if (_df[c] > 0).all()]
to pos_cols = [c for c in _df if (_df[c] >= 0).all()] (>= rather than >)

2. Multiple crossings of the zero baseline end up shown wrong

As you've seen in the comments, this case where a timeseries goes positive, then negative, then back again is a case we didn't work through. We were not sure how you'd even want to plot this (as you probably have to jump over another timeseries so you lose the continuity we were trying to get). New functionality for you to play with!

@danielhuppmann
Copy link
Member Author

Thanks for the assist on fixing issue 1, @znicholls! Issue 2 is indeed a tricky beast...

@znicholls
Copy link
Collaborator

Issue 2 is indeed a tricky beast

You could just have an if clause, based on something like

crosses = np.argwhere(pos_to_neg | neg_to_pos)
, which basically checks whether we're in the case where the lines are hopping back and forth across zero in an awkward way. If they are, you could skip all the steps which try to make the plot pretty and just plot it in a way which works and shows all the data. That will still require some mucking around but I think separating the strategies is a good idea as they are trying to achieve different things and apply in different cases.

@danielhuppmann
Copy link
Member Author

I was thinking that splitting all timeseries data into a positive and negative component might do the trick... But making it pretty afterwards will be a pain.

@znicholls
Copy link
Collaborator

znicholls commented Sep 7, 2019

I was thinking that splitting all timeseries data into a positive and negative component might do the trick... But making it pretty afterwards will be a pain.

Yep I think that's probably best. Keeping track of colours and joining dots as they cross back and forth will be hard. If you do it with loops initially I think it should be easier to implement and then it can be refactored to vector operations afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants