-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent handling of nan-float64 in Series.isin() #22205
Comments
The problem at core of this issue, is that pandas'
For N>10^6, pandas' notion of unique is switched to numpy's notion of unique:
|
It seems, as if I don't understand the the 'in1d- algorithm good enough, to tell what is going on, but here are my benchmark and timings:
results in
To me it seems, as if for large inputs of look-up-values, the hash-map-approach could be the better one. Actually, there is an optimization if the size of the second array is very small:
which leads to The running times of algorithms are pandas: So it looks like |
Code Sample, a copy-pastable example if possible
results in
True
, howeverresults in
False
, even mores.isin([np.nan]).any()
results inFalse
Problem description
Obviously, the result should be the same in both cases.
Expected Output
IMO
True
is more consistent with the behavior of Pandas in other functions (such aspd.unique()
).Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-53-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.28.3
numpy: 1.13.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: