Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: remove unnecessary check needs_i8_conversion if Index subclass does not support any or all #58006

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented Mar 25, 2024

xref #54566

removed unnecessary check needs_i8_conversion if Index subclass does not support any or all.

@natmokval natmokval added Index Related to the Index class or subclasses Reduction Operations sum, mean, min, max, etc. labels Mar 25, 2024
):
# This call will raise
make_invalid_op(opname)(self)
if isinstance(self, ABCMultiIndex) or self.dtype.kind != "m":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the != "m" bit still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I think we don't need != "m" here. This check worked only together with the check needs_i8_conversion(self.dtype).

I removed != "m" and corrected tests.

I have a question: shouldn't we raise for DatetimeIndex? So far we show FutureWarning:
"'any' with datetime64 dtypes is deprecated and will raise in a future version. Use (obj != pd.Timestamp(0)).any() instead."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a deprecation in nanany and nanall that i think is intended to be enforced before this needs_i8_conversion check is removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it was my thought too.
Should I open new PR to enforce the deprecation in nanany and nanall, or can I do it in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enforced deprecation any/all with datetime64 in #58029

@natmokval natmokval marked this pull request as ready for review March 26, 2024 15:29
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
@@ -228,6 +233,8 @@ def test_logical_compat(self, simple_index):
r"'IntervalArray' with dtype interval\[.*\] does "
"not support reduction '(any|all)'"
)
if isinstance(idx, PeriodIndex):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can combine this branch with the IntervalArray branch directly preceding it and reduce the msg check to does not support reduction '(any|all)'

It is generally a good idea to try and reduce the amount of branching in tests. Having a bunch of if...else statements makes it unnecessarily harder to manage consistent behavior expectations

@natmokval
Copy link
Contributor Author

natmokval commented Mar 29, 2024

thank you @WillAyd , I combined these two branches and reduced the message checks as you suggested.

I think if we change the error message in def _reduce

raise TypeError(
f"'{type(self).__name__}' with dtype {self.dtype} "
f"does not support reduction '{name}'"
)

to

f"'{type(self).__name__}' with dtype {self.dtype} "
f"does not support operation: '{name}'"

we can reduce the amount of branching in tests. Think that's a good idea?

@@ -223,11 +223,10 @@ def test_logical_compat(self, simple_index):
assert idx.any() == idx.to_series().any()
else:
msg = "cannot perform (any|all)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can delete L225-L228? i.e. the only thing needed here to pass tests is msg = "does not support reduction '(any|all)'"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I deleted the branch with msg = "datetime64 type does not support operation: '(any|all)'", ci- green.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK I see the issue now. So you are branching because DatetimeIndex says does not support operation: '(any|all)' but the other cases say 'does not support reduction '(any|all)'

I don't want to go down a rabbit hole but having a message that differs between operation: and reduction doesn't seem that important. How hard is it to align them so you can just assign msg = "does not support operation: '(any|all)' and have the tests pass?

@@ -213,7 +213,7 @@ def test_numeric_compat(self, simple_index):
1 // idx

def test_logical_compat(self, simple_index):
if simple_index.dtype in (object, "string"):
if simple_index.dtype in (object, "string", "datetime64[ns]"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the datetime skip required here? I am not overly familiar with this part of the code base but judging by the deleted comment I didn't think this needed to change

Copy link
Contributor Author

@natmokval natmokval Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't skip datetime64 here. We need the separate branch for datetime64 because we get different error message: "datetime64 type does not support operation: '(any|all)'"

If we replace the error message

if values.dtype.kind == "M":
# GH#34479
raise TypeError("datetime64 type does not support operation: 'any'")

with the msg "does not support reduction '(any|all)'" we can combine these two checks. Can I do it in this PR or a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to align the error messages and assign msg = “.* does not support operation .*“ . I replaced reduction with operation and operation: with operation to unify error messages. Tests passed.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for aligning those - looks pretty good one minor thing

@@ -223,11 +223,7 @@ def test_logical_compat(self, simple_index):
assert idx.any() == idx.to_series().any()
else:
msg = "cannot perform (any|all)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L225 looks like it can be safely deleted

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm when green

@mroeschke mroeschke added this to the 3.0 milestone Apr 4, 2024
@mroeschke mroeschke merged commit f4232e7 into pandas-dev:main Apr 4, 2024
46 checks passed
@mroeschke
Copy link
Member

Thanks @natmokval

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
… does not support `any` or `all` (pandas-dev#58006)

* cln-remove-unnecessary-check-needs_i8_conversion

* fix tests

* remove the check self.dtype.kind, fix tests

* Update pandas/core/indexes/base.py

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* combine two branches in test_logical_compat

* correct test_logical_compat

* correct test_logical_compat

* correct test_logical_compat

* roll back the check for datetime64 in test_logical_compat

* replace reduction with operation to unify err msg

* delete unused line

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Aug 27, 2024
lithomas1 pushed a commit to lithomas1/pandas that referenced this pull request Sep 8, 2024
WillAyd added a commit to WillAyd/pandas that referenced this pull request Sep 20, 2024
jorisvandenbossche pushed a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024
jorisvandenbossche pushed a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024
jorisvandenbossche pushed a commit to WillAyd/pandas that referenced this pull request Oct 3, 2024
jorisvandenbossche pushed a commit to WillAyd/pandas that referenced this pull request Oct 7, 2024
jorisvandenbossche pushed a commit that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants