REF/PERF: move np.errstate out of core array_ops up to higher level #40396

jorisvandenbossche · 2021-03-12T10:40:11Z

Currently, we suppress numpy warnings with np.errstate(all="ignore") inside the evaluate function in core.computation.expressions, which is is only used in core.ops.array_ops._na_arithmetic_op, which in itself is only used in core.ops.array_ops arithmetic_op() and comparison_op() (where we actually called np.errstate again, duplicatively).

So, in summary, we suppress the warnings at the level of the "array op". For the ArrayManager, we call this array op many times for each column, and repeatedly calling np.errstate(all="ignore") gives a big overhead. Luckily, it is easy to suppress the warnings once at a higher level, at the DataFrame/Series level, where those array ops are called.

That's what this PR is doing: removing np.errstate(all="ignore") in the actual array ops, and adding it in all places where we currently call the array ops.

With the benchmark case of an arithmetic op with two dataframes, this gives a considerable improvement:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(1000, 1000))
df_am = df._as_manager("array")

In [2]: %timeit df_am + df_am
18.1 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)   <-- master
8.57 ms ± 167 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  <-- PR

For BM it doesn't matter for this case, since that's a single block and np.errstate would be called only once anyway. But for certain cases of df+s ops where we potentially also work column-wise for BM, it should benefit there as well.

jbrockmendel · 2021-03-12T17:33:18Z

pandas/core/ops/__init__.py

@@ -433,7 +433,8 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):

        if isinstance(other, ABCDataFrame):
            # Another DataFrame
-            new_data = self._combine_frame(other, na_op, fill_value)
+            with np.errstate(all="ignore"):
+                new_data = self._combine_frame(other, na_op, fill_value)


shouldnt we get this one for free bc it goes through _dispatch_frame_op which you've got setting np.errstaet?

Ah, yes, indeed _combine_frame eventually always goes through _dispatch_frame as well (I got here because it's a place that uses get_array_op)

jbrockmendel · 2021-03-12T17:34:02Z

pandas/core/ops/array_ops.py

@@ -265,8 +268,7 @@ def comparison_op(left: ArrayLike, right: Any, op) -> ArrayLike:
        with warnings.catch_warnings():
            # suppress warnings from numpy about element-wise comparison
            warnings.simplefilter("ignore", DeprecationWarning)
-            with np.errstate(all="ignore"):


should the catch_warnings here be done at a higher level too?

Possibly, but that might need to be done more targeted for the specific type of op (since we don't do anything similar in arithmetic_op), and thus this will be more complicated. So would like to start with np.errstate.

jbrockmendel · 2021-03-12T17:34:58Z

This is unfortunate, since it is both a) uglier to do this at a higher level and b) a really nice perf improvement

jorisvandenbossche · 2021-03-12T18:02:37Z

Is it that much uglier? I removed it in 5 places, and I added it in 4 places in _dispatch_frame_op (which in theory could be combined into a single one) and 2 places in Series ops. So it's not that big of a difference.

jorisvandenbossche · 2021-03-12T18:08:38Z

pandas/core/arrays/numpy_.py

@@ -376,7 +376,8 @@ def _cmp_method(self, other, op):
            other = other._ndarray

        pd_op = ops.get_array_op(op)
-        result = pd_op(self._ndarray, other)
+        with np.errstate(all="ignore"):
+            result = pd_op(self._ndarray, other)


BTW, we probably don't need to add this one here, since when going through a DataFrame/Series op, this will also already be catched on that level (and be don't consistently catch it in other EA ops). But this keeps current behaviour (and this not used in practice anyway)

jbrockmendel · 2021-03-15T23:39:39Z

pandas/core/frame.py

@@ -6556,7 +6556,8 @@ def _dispatch_frame_op(self, right, func, axis: Optional[int] = None):
        right = lib.item_from_zerodim(right)
        if not is_list_like(right):
            # i.e. scalar, faster than checking np.ndim(right) == 0
-            bm = self._mgr.apply(array_op, right=right)
+            with np.errstate(all="ignore"):


could just do the np.errstate once at the top of the method?

Yes, that is possible (as mentioned in #40396 (comment)). Personally, I think I prefer putting it just around the mgr operate call / array_op calls, but in the end it doesn't matter much (it's not that the other checks here could potentially gives such warning that would be incorrectly suppressed). So either way.

im fine with this, we can clean it up later if needbe

jbrockmendel · 2021-03-15T23:40:10Z

pandas/core/series.py

@@ -5087,7 +5088,8 @@ def _arith_method(self, other, op):

        lvalues = self._values
        rvalues = extract_array(other, extract_numpy=True)
-        result = ops.arithmetic_op(lvalues, rvalues, op)
+        with np.errstate(all="ignore"):
+            result = ops.arithmetic_op(lvalues, rvalues, op)


do we not need this for _logical_method?

Currently we don't suppress any warnings in ops.logical_op, so I left the current behaviour as is.

makes sense

jreback

looks reasonable

jbrockmendel · 2021-03-17T18:14:06Z

LGTM, merging this afternoon unless someone chimes in to stop me

jreback · 2021-03-17T21:10:39Z

thanks @jorisvandenbossche

…andas-dev#40396)

REF/PERF: move np.errstate out of core array_ops up to higher level

48482ba

jorisvandenbossche added Performance Memory or execution speed performance Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 12, 2021

jorisvandenbossche requested a review from jbrockmendel March 12, 2021 10:40

jbrockmendel reviewed Mar 12, 2021

View reviewed changes

remove in _combine_frame

4f4af54

jorisvandenbossche commented Mar 12, 2021

View reviewed changes

jorisvandenbossche mentioned this pull request Mar 15, 2021

Refactor - ArrayManager overview issue #39146

Closed

11 tasks

jbrockmendel reviewed Mar 15, 2021

View reviewed changes

jreback approved these changes Mar 16, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into ops-refactor-seterr

3996d09

jreback added this to the 1.3 milestone Mar 17, 2021

jreback merged commit 8588651 into pandas-dev:master Mar 17, 2021

jorisvandenbossche deleted the ops-refactor-seterr branch March 18, 2021 14:34

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

REF/PERF: move np.errstate out of core array_ops up to higher level (p…

7a6598e

…andas-dev#40396)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF/PERF: move np.errstate out of core array_ops up to higher level #40396

REF/PERF: move np.errstate out of core array_ops up to higher level #40396

jorisvandenbossche commented Mar 12, 2021 •

edited

Loading

jbrockmendel Mar 12, 2021

jorisvandenbossche Mar 12, 2021

jbrockmendel Mar 12, 2021

jorisvandenbossche Mar 12, 2021

jbrockmendel commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021 •

edited

Loading

jorisvandenbossche Mar 12, 2021

jbrockmendel Mar 15, 2021

jorisvandenbossche Mar 17, 2021

jbrockmendel Mar 17, 2021

jbrockmendel Mar 15, 2021

jorisvandenbossche Mar 17, 2021

jbrockmendel Mar 17, 2021

jreback left a comment

jbrockmendel commented Mar 17, 2021

jreback commented Mar 17, 2021

REF/PERF: move np.errstate out of core array_ops up to higher level #40396

REF/PERF: move np.errstate out of core array_ops up to higher level #40396

Conversation

jorisvandenbossche commented Mar 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Mar 12, 2021

jorisvandenbossche commented Mar 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jbrockmendel commented Mar 17, 2021

jreback commented Mar 17, 2021

jorisvandenbossche commented Mar 12, 2021 •

edited

Loading

jorisvandenbossche commented Mar 12, 2021 •

edited

Loading