Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Silent dropping of nuisance columns in agg_list_like #43741

Merged
merged 11 commits into from
Sep 29, 2021

Conversation

rhshadrach
Copy link
Member

Part of #43740

  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas Apply Apply, Aggregate, Transform, Map Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels Sep 25, 2021
@rhshadrach rhshadrach changed the title DEPR: fallback agg dict DEPR: Silent dropping of nuisance columns in agg_list_like Sep 25, 2021
new_res = colg.aggregate(arg)
with warnings.catch_warnings(record=True) as w:
new_res = colg.aggregate(arg)
if len(w) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all warnings that happen here going to represent a failure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah - thanks. No, probably not. Will inspect the warnings themselves.

Copy link
Member Author

@rhshadrach rhshadrach Sep 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got a bit hairy, I wasn't able to find a good way to capture and suppress only our warnings while passing through any other warnings and being able to determine if we raised. Supplying a filter within the catch_warnings context manager means we can't fell if we raised; using record=True there means all user warnings would also be suppressed.

If there are any suggestions for a better implementation, it would be appreciated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if you dont catch warnings at all here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running

def foo(x):
   return x.prod()
df = pd.DataFrame({'a': [1, 2], 'b': ['x', 'y'], 'c': ['z', 'w'], 'd': ['a', 'b']})
df.agg([foo, lambda x: x.sum()])

gives

FutureWarning: ['foo'] did not aggregate successfully.
FutureWarning: ['foo'] did not aggregate successfully.
FutureWarning: ['foo'] did not aggregate successfully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there's something like a filter="once" to prevent exactly that (and i think its the default)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - thanks. Digging into this, the only way I can see why I'm getting multiple warnings is because of https://bugs.python.org/issue29672. I haven't confirmed yet if the code within agg ever hits a separate catch_warnings; if it does that will cause repeat messages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still going to verify that the use of another catch_warnings (in a different part of the code) is the cause for the multiple messages I am seeing without the special handling in this PR.

Copy link
Member Author

@rhshadrach rhshadrach Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed it is because of the Python bug mentioned above. The code in the sample hits pandas.core.indexes.base.Index_with_infer which uses the catch_warnings context manager. Here is a MWE to demonstrate the bug (mirroring the structure of agg_list_like):

def foo():
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", ".*the Index constructor", FutureWarning)


def bar(ndim):
    if ndim == 1:
        pass
    else:
        for _ in range(3):
            bar(1)

    warnings.warn('test warning', FutureWarning)
    foo()

bar(2)

With the call to foo from within bar, three warning messages are printed. When the call to foo is removed, only one is printed.

With this, are we okay with catching the warnings as was merged @jbrockmendel?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im fine with it. a comment to this effect might be worthwhile

@jreback
Copy link
Contributor

jreback commented Sep 27, 2021

can you rebase

…pr_fallback_agg_dict

� Conflicts:
�	doc/source/whatsnew/v1.4.0.rst
@jreback jreback added this to the 1.4 milestone Sep 28, 2021
@jreback jreback merged commit 4b54c53 into pandas-dev:master Sep 29, 2021
@jreback
Copy link
Contributor

jreback commented Sep 29, 2021

thanks @rhshadrach very nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Deprecate Functionality to remove in pandas Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants