Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add method and weight feature to (region) aggregation #305

Merged

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Dec 17, 2019

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Description in RELEASE_NOTES.md Added

Description of PR

This PR extends the functions aggregate(), aggregate_region() and the respective check-versions by a method and weight options (weights only for region aggregation). The functionality is based on the branch peterkolp:region_aggregation_mip_feature.

The PR also changes the default of aggregate_region() regarding the treatment of components at the region-level as discussed in #299.

tl; dr
Before, the default behaviour of aggregate_region() was to add all sub-categories of variable that were not present in any subregion, and deactivating this addition required to set components=[]
Now, the default is components=False (not adding sub-categories at the region-level) and components=True activates the automatic detection of sub-categories.
Setting a specific list of variables to be added via components=<list> is unchanged.

closes #299

pyam/core.py Outdated Show resolved Hide resolved
tests/test_feature_aggregate.py Outdated Show resolved Hide resolved
tests/test_feature_aggregate.py Outdated Show resolved Hide resolved
tests/test_feature_aggregate.py Outdated Show resolved Hide resolved
tests/test_feature_aggregate.py Outdated Show resolved Hide resolved
tests/test_feature_aggregate.py Outdated Show resolved Hide resolved
tests/test_feature_aggregate.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
pyam/core.py Outdated Show resolved Hide resolved
@danielhuppmann
Copy link
Member Author

This PR is directed into IAMconsortium:tests/cleanup to highlight the differences of the aggregation-features to that prep-work - I'll rebase after a first round of review and merging of #304.

Once this PR is merged, I'll implemented the feature that variable can be a list of strings rather than only one string to speed up the processing.

@byersiiasa @gidden @znicholls @peterkolp @zikolach @volker-krey - any volunteers for a review?

return df.groupby(cols)['value'].agg(_get_method_func(method))


def _aggregate_weight(df, weight, method):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is for weighted average, then surely method is always sum/np.sum? or is this just meant as a placeholder for future methods?

Consider also instead using np.nansum such that if a value is missing / nan, the calculation won't return nan

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The method arg is partly anticipating more supported functions in the future, partly for checking that the chosen method is indeed sum in that function (moving some not-so important stuff to lower-level functions, also for reusing _aggregate_weight() in other functions in the future and not needing to implement that check in multiple higher-level functions.

  2. The data table cannot have nan (they are removed at initialisation), so this point is moot. If a data value or the weight value is missing (i.e., inconsistent series index's), a ValueError is raised.

raise ValueError('inconsistent index between variable and weight')

cols = META_IDX + ['year']
return (_data * _weight).groupby(cols).sum() / _weight.groupby(cols).sum()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you put .agg(method) here instead of .sum()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you also want to limit inputs to the KNOWN_FUNCS ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only summation is allowed (for the time being) anyway, so imho no need to be more complicated...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we don't need anything more complicated here, but maybe worth putting in some comments about what would need to be changed in the future if more than sum is supported?

Copy link
Collaborator

@byersiiasa byersiiasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not tested but lgtm

@gidden
Copy link
Member

gidden commented Dec 22, 2019

Restarted CI (mac py36 only)

Copy link
Member

@gidden gidden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments in line, but in general features look great. Just wanted to make sure all features were covered in the tests.

raise ValueError('inconsistent index between variable and weight')

cols = META_IDX + ['year']
return (_data * _weight).groupby(cols).sum() / _weight.groupby(cols).sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we don't need anything more complicated here, but maybe worth putting in some comments about what would need to be changed in the future if more than sum is supported?

pyam/utils.py Outdated
@@ -33,6 +33,8 @@
+ ['{}{}'.format(i, j) for i, j in itertools.product(
string.ascii_uppercase, string.ascii_uppercase)]))

KNOWN_FUNCS = {'min': np.min, 'max': np.max, 'avg': np.mean, 'sum': np.sum}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a 'mean' synonym as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

],
columns=idx + ['value']
).set_index(idx).value
obs = df.aggregate_region('Price|Carbon', method='max')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't see it, but is there a test that explicitly tests the weight option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one (and the line above it tests that not adding the weight yields an unexpected answer)

assert df.check_aggregate_region('Price|Carbon', weight=v) is None

@danielhuppmann danielhuppmann changed the base branch from tests/cleanup to master December 23, 2019 11:32
@danielhuppmann
Copy link
Member Author

Thanks @gidden for the comments! Implemented the one minor change & answered the question about the test inline.

About the "how to support other methods that sum with weights" (seems I can't answer inline) - it's not clear to me what a "weighted max" (or similar) would be expected to do, so no idea what to write as a comment. Happy to discuss implementation when an actual use case comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggest to change default behaviour of aggegate_region() for components
4 participants