-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/CLN: Decouple Series/DataFrame.transform #35964
Conversation
@@ -222,7 +223,7 @@ def test_transform(self, string_series): | |||
expected.columns = ["sqrt"] | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
result = string_series.transform([np.sqrt]) | |||
result = string_series.apply([np.sqrt]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line transform([np.sqrt])
occurs 5 lines up. I believe the original intention was to test apply
here.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
…ansform_refactor � Conflicts: � pandas/core/aggregation.py
@jreback Thanks for the review. Code has been reorganized and ready for another. For the comment about not adding things to the core files - I think the idea behind this that we should be distributing the implementation of methods in e.g. NDFrame into various submodules like aggregate instead of having one monolithic generic.py. Is that correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking really good, a few comments
pandas/core/aggregation.py
Outdated
|
||
def transform( | ||
obj: FrameOrSeries, | ||
func: Union[str, List, Dict, Callable], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have an alias for this anywhere?
pandas/core/aggregation.py
Outdated
# combine results | ||
if len(results) == 0: | ||
raise ValueError("Transform function failed") | ||
from pandas.core.reshape.concat import concat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this at the top of the function
pandas/core/generic.py
Outdated
raise ValueError("transforms cannot produce aggregated results") | ||
|
||
return result | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this? or is the doc-string used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just move the doc-string to shared_docs?
expected = f_sqrt | ||
tm.assert_frame_equal(result, expected) | ||
|
||
# list-like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would actually split these to separate tests easier to grok
|
||
# Determine which columns op will work on | ||
columns = [] | ||
for column in df: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm can we explicit list the exluded ones rather than this?
pandas/tests/frame/common.py
Outdated
@@ -29,3 +32,22 @@ def _check_mixed_int(df, dtype=None): | |||
assert df.dtypes["C"] == dtypes["C"] | |||
if dtypes.get("D"): | |||
assert df.dtypes["D"] == dtypes["D"] | |||
|
|||
|
|||
def zip_frames(frames, axis=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type args and output
expected = f_sqrt.copy() | ||
tm.assert_series_equal(result, expected) | ||
|
||
# list-like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you split this one as well
…ansform_refactor � Conflicts: � pandas/core/frame.py � pandas/tests/frame/apply/test_frame_transform.py � pandas/tests/frame/common.py � pandas/tests/series/apply/test_series_apply.py � pandas/tests/series/apply/test_series_transform.py
…ansform_refactor
@jreback changes made and green. In fixing the one test I found cumcount doesn't exist for DataFrames/Series (which, after thinking a bit, makes sense). This prompted me to add in tests for (almost) all the transformation kernels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments, but can address in a followon. thanks!
@@ -342,6 +342,7 @@ Other | |||
^^^^^ | |||
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`) | |||
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`) | |||
- Bug in :meth:`Series.transform` would give incorrect results or raise when the argument ``func`` was dictionary (:issue:`35811`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to reshaping section (generally we want almost nothing in Other)
assert not is_series | ||
return transform(obj.T, func, 0, *args, **kwargs).T | ||
|
||
if isinstance(func, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably use is_list_like here
else: | ||
func = {col: func for col in obj} | ||
|
||
if isinstance(func, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably use is_dict_like here
# GH 15931 - deprecation of renaming keys | ||
raise SpecificationError("nested renamer is not supported") | ||
|
||
results = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type results
thanks @rhshadrach |
# GH 35964 | ||
s = Series(3 * [object]) # Series that will fail on most transforms | ||
if op in ("backfill", "shift", "pad", "bfill", "ffill"): | ||
pytest.xfail("Transform function works on any datatype") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the issue here that the behavior is wrong or that the test isn't applicable to these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavior is correct, the test is not applicable. This test is about behavior when transform fails and these ops are expected to never fail.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
First step toward #35725. Currently
transform
just callsaggregate
, and so if we are to forbidaggregate
from transforming, these need to be decoupled. Other than the bugfix (#35811), the only other behavioral change is in the error messages.Assuming the bugfix #35811 is the correct behavior, docs/whatsnew also needs to be updated.
I wasn't sure if tests should be marked with #35725 or perhaps this PR #. Any guidance here?