-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add method
and weight
feature to (region) aggregation
#305
add method
and weight
feature to (region) aggregation
#305
Conversation
This PR is directed into Once this PR is merged, I'll implemented the feature that @byersiiasa @gidden @znicholls @peterkolp @zikolach @volker-krey - any volunteers for a review? |
return df.groupby(cols)['value'].agg(_get_method_func(method)) | ||
|
||
|
||
def _aggregate_weight(df, weight, method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is for weighted average, then surely method is always sum/np.sum? or is this just meant as a placeholder for future methods?
Consider also instead using np.nansum
such that if a value is missing / nan, the calculation won't return nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
The
method
arg is partly anticipating more supported functions in the future, partly for checking that the chosen method is indeed sum in that function (moving some not-so important stuff to lower-level functions, also for reusing_aggregate_weight()
in other functions in the future and not needing to implement that check in multiple higher-level functions. -
The
data
table cannot havenan
(they are removed at initialisation), so this point is moot. If a data value or the weight value is missing (i.e., inconsistent series index's), aValueError
is raised.
raise ValueError('inconsistent index between variable and weight') | ||
|
||
cols = META_IDX + ['year'] | ||
return (_data * _weight).groupby(cols).sum() / _weight.groupby(cols).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you put .agg(method)
here instead of .sum()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you also want to limit inputs to the KNOWN_FUNCS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only summation is allowed (for the time being) anyway, so imho no need to be more complicated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we don't need anything more complicated here, but maybe worth putting in some comments about what would need to be changed in the future if more than sum is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not tested but lgtm
Restarted CI (mac py36 only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments in line, but in general features look great. Just wanted to make sure all features were covered in the tests.
raise ValueError('inconsistent index between variable and weight') | ||
|
||
cols = META_IDX + ['year'] | ||
return (_data * _weight).groupby(cols).sum() / _weight.groupby(cols).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we don't need anything more complicated here, but maybe worth putting in some comments about what would need to be changed in the future if more than sum is supported?
pyam/utils.py
Outdated
@@ -33,6 +33,8 @@ | |||
+ ['{}{}'.format(i, j) for i, j in itertools.product( | |||
string.ascii_uppercase, string.ascii_uppercase)])) | |||
|
|||
KNOWN_FUNCS = {'min': np.min, 'max': np.max, 'avg': np.mean, 'sum': np.sum} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a 'mean'
synonym as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
], | ||
columns=idx + ['value'] | ||
).set_index(idx).value | ||
obs = df.aggregate_region('Price|Carbon', method='max') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I don't see it, but is there a test that explicitly tests the weight
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have one (and the line above it tests that not adding the weight yields an unexpected answer)
assert df.check_aggregate_region('Price|Carbon', weight=v) is None
Thanks @gidden for the comments! Implemented the one minor change & answered the question about the test inline. About the "how to support other methods that sum with weights" (seems I can't answer inline) - it's not clear to me what a "weighted max" (or similar) would be expected to do, so no idea what to write as a comment. Happy to discuss implementation when an actual use case comes up. |
Please confirm that this PR has done the following:
Description of PR
This PR extends the functions
aggregate()
,aggregate_region()
and the respective check-versions by amethod
andweight
options (weights only for region aggregation). The functionality is based on the branch peterkolp:region_aggregation_mip_feature.The PR also changes the default of
aggregate_region()
regarding the treatment of components at the region-level as discussed in #299.closes #299