add `method` and `weight` feature to (region) aggregation #305

danielhuppmann · 2019-12-17T08:09:51Z

Please confirm that this PR has done the following:

Tests Added
Documentation Added
Description in RELEASE_NOTES.md Added

Description of PR

This PR extends the functions aggregate(), aggregate_region() and the respective check-versions by a method and weight options (weights only for region aggregation). The functionality is based on the branch peterkolp:region_aggregation_mip_feature.

The PR also changes the default of aggregate_region() regarding the treatment of components at the region-level as discussed in #299.

tl; dr
Before, the default behaviour of aggregate_region() was to add all sub-categories of variable that were not present in any subregion, and deactivating this addition required to set components=[]
Now, the default is components=False (not adding sub-categories at the region-level) and components=True activates the automatic detection of sub-categories.
Setting a specific list of variables to be added via components=<list> is unchanged.

closes #299

pyam/core.py

tests/test_feature_aggregate.py

pyam/core.py

danielhuppmann · 2019-12-17T08:36:42Z

This PR is directed into IAMconsortium:tests/cleanup to highlight the differences of the aggregation-features to that prep-work - I'll rebase after a first round of review and merging of #304.

Once this PR is merged, I'll implemented the feature that variable can be a list of strings rather than only one string to speed up the processing.

@byersiiasa @gidden @znicholls @peterkolp @zikolach @volker-krey - any volunteers for a review?

byersiiasa · 2019-12-17T11:46:41Z

pyam/core.py

+    return df.groupby(cols)['value'].agg(_get_method_func(method))
+
+
+def _aggregate_weight(df, weight, method):


if this is for weighted average, then surely method is always sum/np.sum? or is this just meant as a placeholder for future methods?

Consider also instead using np.nansum such that if a value is missing / nan, the calculation won't return nan

The method arg is partly anticipating more supported functions in the future, partly for checking that the chosen method is indeed sum in that function (moving some not-so important stuff to lower-level functions, also for reusing _aggregate_weight() in other functions in the future and not needing to implement that check in multiple higher-level functions.

The data table cannot have nan (they are removed at initialisation), so this point is moot. If a data value or the weight value is missing (i.e., inconsistent series index's), a ValueError is raised.

byersiiasa · 2019-12-17T11:46:58Z

pyam/core.py

+        raise ValueError('inconsistent index between variable and weight')
+
+    cols = META_IDX + ['year']
+    return (_data * _weight).groupby(cols).sum() / _weight.groupby(cols).sum()


would you put .agg(method) here instead of .sum()

do you also want to limit inputs to the KNOWN_FUNCS ?

only summation is allowed (for the time being) anyway, so imho no need to be more complicated...

Agree that we don't need anything more complicated here, but maybe worth putting in some comments about what would need to be changed in the future if more than sum is supported?

byersiiasa

not tested but lgtm

gidden · 2019-12-22T18:27:40Z

Restarted CI (mac py36 only)

gidden

A few comments in line, but in general features look great. Just wanted to make sure all features were covered in the tests.

gidden · 2019-12-22T18:32:51Z

pyam/core.py

+        raise ValueError('inconsistent index between variable and weight')
+
+    cols = META_IDX + ['year']
+    return (_data * _weight).groupby(cols).sum() / _weight.groupby(cols).sum()


Agree that we don't need anything more complicated here, but maybe worth putting in some comments about what would need to be changed in the future if more than sum is supported?

gidden · 2019-12-22T18:34:13Z

pyam/utils.py

@@ -33,6 +33,8 @@
                          + ['{}{}'.format(i, j) for i, j in itertools.product(
                              string.ascii_uppercase, string.ascii_uppercase)]))

+KNOWN_FUNCS = {'min': np.min, 'max': np.max, 'avg': np.mean, 'sum': np.sum}


maybe add a 'mean' synonym as well?

gidden · 2019-12-22T18:40:15Z

tests/test_feature_aggregate.py

+    ],
+        columns=idx + ['value']
+    ).set_index(idx).value
+    obs = df.aggregate_region('Price|Carbon', method='max')


Maybe I don't see it, but is there a test that explicitly tests the weight option?

We have one (and the line above it tests that not adding the weight yields an unexpected answer)

assert df.check_aggregate_region('Price|Carbon', weight=v) is None

danielhuppmann · 2019-12-23T12:00:01Z

Thanks @gidden for the comments! Implemented the one minor change & answered the question about the test inline.

About the "how to support other methods that sum with weights" (seems I can't answer inline) - it's not clear to me what a "weighted max" (or similar) would be expected to do, so no idea what to write as a comment. Happy to discuss implementation when an actual use case comes up.

danielhuppmann added 21 commits December 13, 2019 11:15

use full TEST_DF for unit tests

10cf203

use full TEST_DF for unit tests specific for 'year' feature

85afde4

replace meta_df by test_df across all tests

08b894a

appease stickler

4636f90

merge relevant changes from peterkolp:region_aggregation_mip_feature

c22dca0

docstring clean-up

7ef3516

set compenents=False as default in [check_]aggregate_region()

9969c74

fix method docstring, add weights kwarg

6dc3228

move internal function _all_other_regions()

978829b

update docstring

bf5813c

speed-up of aggregate_region (no cloning of IamDataFrame)

02adebc

fix a kwarg default, add docstrings

4a5e359

speed up aggregate_region() even more

fa152e2

add feature to do weighted average over regions

4e7fdc2

refactor kwarg and auxiliary function to weight

92b292c

update docstrings (preparing for new test data)

08ce709

add unit test for check_aggregate_region()

55f21c0

add test for method kwarg in check_aggregate_region()

14de79f

raise if variable & weight of inconsistent index in aggregate_region()

a3901a1

make full-agg-feature test data complete

9688aa1

add tests for aggregate()

63a5d8a

stickler-ci reviewed Dec 17, 2019

View reviewed changes

danielhuppmann mentioned this pull request Dec 17, 2019

[WIP] Region aggregation mip feature #244

Closed

3 tasks

appease stickler

ba71cc7

stickler-ci reviewed Dec 17, 2019

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

appease stickler again

2d50535

stickler-ci reviewed Dec 17, 2019

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

third time stickler

c582003

byersiiasa reviewed Dec 17, 2019

View reviewed changes

danielhuppmann mentioned this pull request Dec 19, 2019

[WIP] allow passing list of variables to [check_]aggregate[_region]() functions #306

Closed

3 tasks

gidden approved these changes Dec 22, 2019

View reviewed changes

danielhuppmann changed the base branch from tests/cleanup to master December 23, 2019 11:32

danielhuppmann added 3 commits December 23, 2019 12:41

merge from master

b727cee

add mean to KNOWN_FUNCS (review comment by @gidden)

ee630aa

add to release notes

01637b7

danielhuppmann merged commit 5194285 into IAMconsortium:master Dec 23, 2019

danielhuppmann deleted the feature/aggregate_weights branch December 23, 2019 14:59

This was referenced Dec 31, 2019

clean-up of aggregation features #315

Merged

more concise implementation of weighted average danielhuppmann/pyam#12

Closed

danielhuppmann mentioned this pull request Mar 29, 2020

Bulk aggregate #355

Open

danielhuppmann mentioned this pull request May 17, 2020

consistent return type in aggregate family of functions #255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `method` and `weight` feature to (region) aggregation #305

add `method` and `weight` feature to (region) aggregation #305

danielhuppmann commented Dec 17, 2019 •

edited

Loading

danielhuppmann commented Dec 17, 2019

byersiiasa Dec 17, 2019

danielhuppmann Dec 17, 2019

byersiiasa Dec 17, 2019

byersiiasa Dec 17, 2019

danielhuppmann Dec 17, 2019

gidden Dec 22, 2019

byersiiasa left a comment

gidden commented Dec 22, 2019

gidden left a comment

gidden Dec 22, 2019

gidden Dec 22, 2019

danielhuppmann Dec 23, 2019

gidden Dec 22, 2019

danielhuppmann Dec 23, 2019

danielhuppmann commented Dec 23, 2019

		return df.groupby(cols)['value'].agg(_get_method_func(method))


		def _aggregate_weight(df, weight, method):

add method and weight feature to (region) aggregation #305

add method and weight feature to (region) aggregation #305

Conversation

danielhuppmann commented Dec 17, 2019 • edited Loading

Please confirm that this PR has done the following:

Description of PR

danielhuppmann commented Dec 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

byersiiasa left a comment

Choose a reason for hiding this comment

gidden commented Dec 22, 2019

gidden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielhuppmann commented Dec 23, 2019

add `method` and `weight` feature to (region) aggregation #305

add `method` and `weight` feature to (region) aggregation #305

danielhuppmann commented Dec 17, 2019 •

edited

Loading