Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consolidated the duplicate definitions of NA values (in parsers & IO) #16589

Merged
merged 14 commits into from
Jun 13, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -225,9 +225,9 @@ NA and Missing Data Handling

na_values : scalar, str, list-like, or dict, default ``None``
Additional strings to recognize as NA/NaN. If dict passed, specific per-column
NA values. By default the following values are interpreted as NaN:
``'-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA',
'#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', ''``.
NA values. See :ref:`na values const <io.navaluesconst>` below
for a list of the values interpreted as NaN by default.

keep_default_na : boolean, default ``True``
If na_values are specified and keep_default_na is ``False`` the default NaN
values are overridden, otherwise they're appended to.
Expand Down Expand Up @@ -1030,10 +1030,11 @@ the corresponding equivalent values will also imply a missing value (in this cas
``[5.0,5]`` are recognized as ``NaN``.

To completely override the default values that are recognized as missing, specify ``keep_default_na=False``.
The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you build the docs and show a rendering of this page. I think this might generate a build warning (and may not render correctly)

@jorisvandenbossche

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have build the docs locally before commit, there were warnings, some of them might be related to python 3, i have 2.7.13 on linux ; file doc/source/style.ipynb also was a problem, but the generated html looks ok, can not attach html file to this replay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only difference that i can see is a blank line because of the new label, but it might be not at bad thing at all, but i can remove the label and reuse the existing one that is for the heading NA Values, that will be several lines above

'#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan']``. Although a 0-length string
``''`` is not included in the default ``NaN`` values list, it is still treated
as a missing value.

.. _io.navaluesconst:

The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A',
'n/a', 'NA', '#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', '']``.

.. code-block:: python

Expand Down
15 changes: 8 additions & 7 deletions pandas/_libs/parsers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ from pandas.core.algorithms import take_1d
from pandas.core.dtypes.concat import union_categoricals
from pandas import Index

import pandas.io.common as com

import time
import os

Expand Down Expand Up @@ -273,13 +275,6 @@ cdef extern from "parser/io.h":

DEFAULT_CHUNKSIZE = 256 * 1024

# common NA values
# no longer excluding inf representations
# '1.#INF','-1.#INF', '1.#INF000000',
_NA_VALUES = [b'-1.#IND', b'1.#QNAN', b'1.#IND', b'-1.#QNAN',
b'#N/A N/A', b'n/a', b'NA', b'#NA', b'NULL', b'null', b'NaN',
b'nan', b'']


cdef class TextReader:
"""
Expand Down Expand Up @@ -1380,6 +1375,12 @@ cdef asbytes(object o):
return str(o)


# common NA values
# no longer excluding inf representations
# '1.#INF','-1.#INF', '1.#INF000000',
_NA_VALUES = _ensure_encoded(list(com._NA_VALUES))


def _is_file_like(obj):
if PY3:
import io
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/io/parser/na_values.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import numpy as np
from numpy import nan

import pandas.io.parsers as parsers
import pandas.io.common as com
import pandas.util.testing as tm

from pandas import DataFrame, Index, MultiIndex
Expand Down Expand Up @@ -72,7 +72,7 @@ def test_default_na_values(self):
_NA_VALUES = set(['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN',
'#N/A', 'N/A', 'n/a', 'NA', '#NA', 'NULL', 'null',
'NaN', 'nan', '-NaN', '-nan', '#N/A N/A', ''])
assert _NA_VALUES == parsers._NA_VALUES
assert _NA_VALUES == com._NA_VALUES
nv = len(_NA_VALUES)

def f(i, v):
Expand Down