Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: allow options for using bottleneck/numexpr #16157

Merged
merged 6 commits into from
Apr 27, 2017
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Accelerated operations
----------------------

pandas has support for accelerating certain types of binary numerical and boolean operations using
the ``numexpr`` library (starting in 0.11.0) and the ``bottleneck`` libraries.
the ``numexpr`` library and the ``bottleneck`` libraries.

These libraries are especially useful when dealing with large data sets, and provide large
speedups. ``numexpr`` uses smart chunking, caching, and multiple cores. ``bottleneck`` is
Expand All @@ -114,6 +114,15 @@ Here is a sample (using 100 column x 100,000 row ``DataFrames``):
You are highly encouraged to install both libraries. See the section
:ref:`Recommended Dependencies <install.recommended_dependencies>` for more installation info.

These are both enabled to be used by default, you can control this by setting the options:

.. versionadded:: 0.20.0

.. code-block:: python

pd.set_option('compute.use_bottleneck', False)
pd.set_option('compute.use_numexpr', False)

.. _basics.binop:

Flexible binary operations
Expand Down
6 changes: 5 additions & 1 deletion doc/source/options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,10 @@ mode.use_inf_as_null False True means treat None, NaN, -IN
INF as null (old way), False means
None and NaN are null, but INF, -INF
are not null (new way).
compute.use_bottleneck True Use the bottleneck library to accelerate
computation if it is installed
compute.use_numexpr True Use the numexpr library to accelerate
computation if it is installed
=================================== ============ ==================================


Expand Down Expand Up @@ -538,4 +542,4 @@ Only ``'display.max_rows'`` are serialized and published.
.. ipython:: python
:suppress:

pd.reset_option('display.html.table_schema')
pd.reset_option('display.html.table_schema')
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,7 @@ Other Enhancements
- The ``display.show_dimensions`` option can now also be used to specify
whether the length of a ``Series`` should be shown in its repr (:issue:`7117`).
- ``parallel_coordinates()`` has gained a ``sort_labels`` keyword arg that sorts class labels and the colours assigned to them (:issue:`15908`)
- Options added to allow one to turn on/off using ``bottleneck`` and ``numexpr``, see :ref:`here <basics.accelerate>` (:issue:`16157`)


.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
Expand Down Expand Up @@ -1217,7 +1218,7 @@ If indicated, a deprecation warning will be issued if you reference theses modul

"pandas.lib", "pandas._libs.lib", "X"
"pandas.tslib", "pandas._libs.tslib", "X"
"pandas.computation", "pandas.core.computation", ""
"pandas.computation", "pandas.core.computation", "X"
"pandas.msgpack", "pandas.io.msgpack", ""
"pandas.index", "pandas._libs.index", ""
"pandas.algos", "pandas._libs.algos", ""
Expand Down
Empty file added pandas/computation/__init__.py
Empty file.
11 changes: 11 additions & 0 deletions pandas/computation/expressions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import warnings


def set_use_numexpr(v=True):
warnings.warn("pandas.computation.expressions.set_use_numexpr is "
"deprecated and will be removed in a future version.\n"
"you can toggle usage of numexpr via "
"pandas.get_option('compute.use_numexpr')",
FutureWarning, stacklevel=2)
from pandas import set_option
set_option('compute.use_numexpr', v)
6 changes: 5 additions & 1 deletion pandas/core/computation/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import numpy as np
from pandas.core.common import _values_from_object
from pandas.core.computation import _NUMEXPR_INSTALLED
from pandas.core.config import get_option

if _NUMEXPR_INSTALLED:
import numexpr as ne
Expand All @@ -35,6 +36,9 @@ def set_use_numexpr(v=True):
global _USE_NUMEXPR
if _NUMEXPR_INSTALLED:
_USE_NUMEXPR = v
elif v:
warnings.warn("trying to set the option to use numexpr, "
"but it is not installed!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I suggested this, but, I think this one will be triggered by default if you don't have numexpr installed? (just by importing pandas, I mean, not only when setting the option manually when you don't have it installed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, just noticed that. going to take this out.


# choose what we are going to do
global _evaluate, _where
Expand Down Expand Up @@ -156,7 +160,7 @@ def _where_numexpr(cond, a, b, raise_on_error=False):


# turn myself on
set_use_numexpr(True)
set_use_numexpr(get_option('compute.use_numexpr'))


def _has_bool_dtype(x):
Expand Down
35 changes: 34 additions & 1 deletion pandas/core/config_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,41 @@
from pandas.core.config import (is_int, is_bool, is_text, is_instance_factory,
is_one_of_factory, get_default_val,
is_callable)
from pandas.io.formats.format import detect_console_encoding
from pandas.io.formats.console import detect_console_encoding

# compute

use_bottleneck_doc = """
: bool
Use the bottleneck library to accelerate if it is installed,
the default is True
Valid values: False,True
"""


def use_bottleneck_cb(key):
from pandas.core import nanops
nanops.set_use_bottleneck(cf.get_option(key))


use_numexpr_doc = """
: bool
Use the numexpr library to accelerate computation if it is installed,
the default is True
Valid values: False,True
"""


def use_numexpr_cb(key):
from pandas.core.computation import expressions
expressions.set_use_numexpr(cf.get_option(key))


with cf.config_prefix('compute'):
cf.register_option('use_bottleneck', True, use_bottleneck_doc,
validator=is_bool, cb=use_bottleneck_cb)
cf.register_option('use_numexpr', True, use_numexpr_doc,
validator=is_bool, cb=use_numexpr_cb)
#
# options from the "display" namespace

Expand Down
5 changes: 3 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@
import pandas.core.nanops as nanops
import pandas.core.ops as ops
import pandas.io.formats.format as fmt
import pandas.io.formats.console as console
from pandas.io.formats.printing import pprint_thing
import pandas.plotting._core as gfx

Expand Down Expand Up @@ -513,7 +514,7 @@ def _repr_fits_horizontal_(self, ignore_width=False):
GH3541, GH3573
"""

width, height = fmt.get_console_size()
width, height = console.get_console_size()
max_columns = get_option("display.max_columns")
nb_columns = len(self.columns)

Expand Down Expand Up @@ -577,7 +578,7 @@ def __unicode__(self):
max_cols = get_option("display.max_columns")
show_dimensions = get_option("display.show_dimensions")
if get_option("display.expand_frame_repr"):
width, _ = fmt.get_console_size()
width, _ = console.get_console_size()
else:
width = None
self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -837,7 +837,8 @@ def _format_data(self):
"""
Return the formatted data as a unicode string
"""
from pandas.io.formats.format import get_console_size, _get_adjustment
from pandas.io.formats.console import get_console_size
from pandas.io.formats.format import _get_adjustment
display_width, _ = get_console_size()
if display_width is None:
display_width = get_option('display.width') or 80
Expand Down
32 changes: 24 additions & 8 deletions pandas/core/nanops.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
import itertools
import functools
import numpy as np
import operator
import warnings

try:
import bottleneck as bn
_USE_BOTTLENECK = True
except ImportError: # pragma: no cover
_USE_BOTTLENECK = False

import numpy as np
from pandas import compat
from pandas._libs import tslib, algos, lib
from pandas.core.dtypes.common import (
Expand All @@ -23,9 +18,30 @@
is_int_or_datetime_dtype, is_any_int_dtype)
from pandas.core.dtypes.cast import _int64_max, maybe_upcast_putmask
from pandas.core.dtypes.missing import isnull, notnull

from pandas.core.config import get_option
from pandas.core.common import _values_from_object

try:
import bottleneck as bn
_BOTTLENECK_INSTALLED = True
except ImportError: # pragma: no cover
_BOTTLENECK_INSTALLED = False

_USE_BOTTLENECK = False


def set_use_bottleneck(v=True):
# set/unset to use bottleneck
global _USE_BOTTLENECK
if _BOTTLENECK_INSTALLED:
_USE_BOTTLENECK = v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we warn here if v is False and bottleneck is not installed ?

elif v:
warnings.warn("trying to set the option to use bottleneck, "
"but it is not installed!")


set_use_bottleneck(get_option('compute.use_bottleneck'))


class disallow(object):

Expand Down
84 changes: 84 additions & 0 deletions pandas/io/formats/console.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
"""
Internal module for console introspection
"""

import sys
import locale
from pandas.util.terminal import get_terminal_size

# -----------------------------------------------------------------------------
# Global formatting options
_initial_defencoding = None


def detect_console_encoding():
"""
Try to find the most capable encoding supported by the console.
slighly modified from the way IPython handles the same issue.
"""
global _initial_defencoding

encoding = None
try:
encoding = sys.stdout.encoding or sys.stdin.encoding
except AttributeError:
pass

# try again for something better
if not encoding or 'ascii' in encoding.lower():
try:
encoding = locale.getpreferredencoding()
except Exception:
pass

# when all else fails. this will usually be "ascii"
if not encoding or 'ascii' in encoding.lower():
encoding = sys.getdefaultencoding()

# GH3360, save the reported defencoding at import time
# MPL backends may change it. Make available for debugging.
if not _initial_defencoding:
_initial_defencoding = sys.getdefaultencoding()

return encoding


def get_console_size():
"""Return console size as tuple = (width, height).

Returns (None,None) in non-interactive session.
"""
from pandas import get_option
from pandas.core import common as com

display_width = get_option('display.width')
# deprecated.
display_height = get_option('display.height', silent=True)

# Consider
# interactive shell terminal, can detect term size
# interactive non-shell terminal (ipnb/ipqtconsole), cannot detect term
# size non-interactive script, should disregard term size

# in addition
# width,height have default values, but setting to 'None' signals
# should use Auto-Detection, But only in interactive shell-terminal.
# Simple. yeah.

if com.in_interactive_session():
if com.in_ipython_frontend():
# sane defaults for interactive non-shell terminal
# match default for width,height in config_init
from pandas.core.config import get_default_val
terminal_width = get_default_val('display.width')
terminal_height = get_default_val('display.height')
else:
# pure terminal
terminal_width, terminal_height = get_terminal_size()
else:
terminal_width, terminal_height = None, None

# Note if the User sets width/Height to None (auto-detection)
# and we're in a script (non-inter), this will return (None,None)
# caller needs to deal.
return (display_width or terminal_width, display_height or terminal_height)
Loading