Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add numeric_only to resampler methods #46792

Merged
merged 7 commits into from
Apr 23, 2022

Conversation

lorentzbao
Copy link
Contributor

@lorentzbao lorentzbao commented Apr 17, 2022

After examining the reported issue #46442, I found the issue exists in all the following docs.
https://pandas.pydata.org/docs/reference/api/pandas.core.resample.Resampler.sum.html
https://pandas.pydata.org/docs/reference/api/pandas.core.resample.Resampler.prod.html
https://pandas.pydata.org/docs/reference/api/pandas.core.resample.Resampler.min.html
https://pandas.pydata.org/docs/reference/api/pandas.core.resample.Resampler.max.html
https://pandas.pydata.org/docs/reference/api/pandas.core.resample.Resampler.first.html
https://pandas.pydata.org/docs/reference/api/pandas.core.resample.Resampler.last.html

The problem seems to be caused by the fact that the docstrings of sum, prod, min, max, first, last methods
in the Resampler class borrow the _groupby_agg_method_template template from GroupBy class. I think one possible fix is
creating a _resample_agg_method_template and using this template to create docstrings for the methods mentioned above.

I am a first-time contributor. Appreciate any suggestions and ideas for fixing this issue.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2022

can you show the rendered doc-string and point to the current one for comparison

@jreback jreback added Docs Resample resample method labels Apr 17, 2022
@jreback jreback mentioned this pull request Apr 17, 2022
4 tasks
@lorentzbao
Copy link
Contributor Author

lorentzbao commented Apr 18, 2022

Thanks @jreback. The re-rendered doc-string of pandas.core.resample.Resampler.first looks as follows.

Resampler.first(_method='first', min_count=0, *args, **kwargs)[source]

Compute first of group values.

Parameters
min_count: int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Returns
Series or DataFrame

Computed first of values within each group.

===========================================================================
For comparison, the current doc-string of pandas.core.resample.Resampler.first looks as follows.
https://pandas.pydata.org/docs/reference/api/pandas.core.resample.Resampler.first.html

Resampler.first(_method='first', min_count=0, *args, **kwargs)[source]

Compute first of group values.

Parameters
numeric_only: bool, default False

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.

min_count: int, default -1

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Returns
Series or DataFrame

Computed first of values within each group.

@jreback
Copy link
Contributor

jreback commented Apr 18, 2022

cc @rhshadrach

@rhshadrach
Copy link
Member

Thanks for the PR @lorentzbao; this is indeed the path forward mentioned in the linked issue. However, I did not realize that we'd want to deprecate the dropping of nuisance columns in resample similar to what we're doing with with Series/DataFrame/groupby. See #46560 for details (especially the two linked issues there). For this, it seems necessary to add numeric_only to these methods, so that when the deprecation is enforced users have an easy way to recover the 1.x behavior (namely, by specifying numeric_only=True).

Do you think you could add numeric_only to the methods here instead of removing the argument from the documentation?

@lorentzbao
Copy link
Contributor Author

lorentzbao commented Apr 19, 2022

Thanks for the PR @lorentzbao; this is indeed the path forward mentioned in the linked issue. However, I did not realize that we'd want to deprecate the dropping of nuisance columns in resample similar to what we're doing with with Series/DataFrame/groupby. See #46560 for details (especially the two linked issues there). For this, it seems necessary to add numeric_only to these methods, so that when the deprecation is enforced users have an easy way to recover the 1.x behavior (namely, by specifying numeric_only=True).

Do you think you could add numeric_only to the methods here instead of removing the argument from the documentation?

Really appreciate your comments @rhshadrach. Let me go through the links you shared. I think I could have a try adding numeric_only to these methods.

@rhshadrach
Copy link
Member

I should have also mentioned - see #46708 for similar examples where this argument was added to frame methods.

@lorentzbao lorentzbao changed the title DOC: Remove unused argument from resampler docstrings ENH: Add numeric_only to resampler methods Apr 19, 2022
@lorentzbao
Copy link
Contributor Author

lorentzbao commented Apr 22, 2022

I should have also mentioned - see #46708 for similar examples where this argument was added to frame methods.

Thanks @rhshadrach. I have added numeric_only to resample sum, prod, min, max, first, last methods, and created a test for this implementation. I'm not sure if I'm on the right track. Appreciate any comments or suggestions.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking real good; I think the only test to add (in addition to request below) would be Series raising if numeric_only=True is passed.

pandas/core/resample.py Outdated Show resolved Hide resolved
pandas/tests/resample/test_resample_api.py Outdated Show resolved Hide resolved
pandas/tests/resample/test_resample_api.py Outdated Show resolved Hide resolved
pandas/tests/resample/test_resample_api.py Outdated Show resolved Hide resolved
@rhshadrach rhshadrach added Enhancement Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply and removed Docs labels Apr 23, 2022
@rhshadrach rhshadrach added this to the 1.5 milestone Apr 23, 2022
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach
Copy link
Member

rhshadrach commented Apr 23, 2022

ping @jreback

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm failures look u related

@jreback jreback merged commit 4cf8d55 into pandas-dev:main Apr 23, 2022
@jreback
Copy link
Contributor

jreback commented Apr 23, 2022

thanks @lorentzbao

@lorentzbao lorentzbao deleted the fix_resample_doc branch April 24, 2022 03:09
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: resample.sum documents a numeric_only argument
3 participants