Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: SparseArray doesn't recalculate indices after comparison with scalar #44956

Closed
3 tasks done
bdrum opened this issue Dec 18, 2021 · 4 comments · Fixed by #45125
Closed
3 tasks done

BUG: SparseArray doesn't recalculate indices after comparison with scalar #44956

bdrum opened this issue Dec 18, 2021 · 4 comments · Fixed by #45125
Labels
Bug Sparse Sparse Data Type
Milestone

Comments

@bdrum
Copy link
Contributor

bdrum commented Dec 18, 2021

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

s = pd.arrays.SparseArray([1,2,3,4,0,0,0],fill_value=0)
s
#[1, 2, 3, 4, 0, 0, 0]
#Fill: 0
#IntIndex
#Indices: array([0, 1, 2, 3])
s > 2
[False, False, True, True, False, False, False]
#Fill: False
#IntIndex
#Indices: array([0, 1, 2, 3])

s = pd.arrays.SparseArray([np.nan,2,3,4,0,0,0],fill_value=0)
pd.isna(s)
#[True, False, False, False, False, False, False]
#Fill: False
#IntIndex
#Indices: array([0, 1, 2, 3])

Issue Description

I've been working on one issue and noticed that SparseArray doesn't recalculate sp_index when it required. E.g. on example above. Also in some specific case when fill_value is not na, but array contains na.

If you don't mind I would like to take this issue.

Expected Behavior

I think that correct behavior is SparseArray with automatically recalculated indices, e.g.

import pandas as pd
s = pd.arrays.SparseArray([1,2,3,4,0,0,0],fill_value=0)
s>2
# should be 
#[False, False, True, True, False, False, False]
#Fill: False
#IntIndex
#Indices: array([2, 3])
# and for second case 
s = pd.arrays.SparseArray([np.nan,2,3,4,0,0,0],fill_value=0)
pd.isna(s)
#[True, False, False, False, False, False, False]
#Fill: False
#IntIndex
#Indices: array([0])

Installed Versions

INSTALLED VERSIONS ------------------ commit : 47eb219 python : 3.8.12.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : Russian_Russia.1252

pandas : 1.4.0.dev0+1415.g47eb219889
numpy : 1.21.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.23.3
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.28.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.10.1
fastparquet : None
gcsfs : 2021.10.1
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : 2021.10.1
scipy : 1.7.1
sqlalchemy : 1.4.25
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

@bdrum bdrum added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 18, 2021
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
@bdrum
Copy link
Contributor Author

bdrum commented Dec 23, 2021

SparseArray.isna has fixed #44955 as well as unary operators.

s = pd.arrays.SparseArray([np.nan,2,3,4,0,0,0],fill_value=0)
s
#[nan, 2.0, 3.0, 4.0, 0, 0, 0]
#Fill: 0
#IntIndex
#Indices: array([0, 1, 2, 3])

s.isna()
#[True, False, False, False, False, False, False]
#Fill: False
#IntIndex
#Indices: array([0])

@mroeschke
Copy link
Member

Close as appeared fixed.

@bdrum
Copy link
Contributor Author

bdrum commented Dec 28, 2021

@mroeschke Sorry, I confused you.
I fixed only one part of the issue that touched - isna (and unary operators), but binary operators still work wrong.
I will try to fix it soon.

@mroeschke mroeschke added Sparse Sparse Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 28, 2021
@mroeschke mroeschke reopened this Dec 28, 2021
@bdrum bdrum changed the title BUG: SparseArray doesn't recalculate indices in some cases BUG: SparseArray doesn't recalculate indices after comparing with scalar Jan 1, 2022
@bdrum
Copy link
Contributor Author

bdrum commented Jan 1, 2022

PR #45125 will fix only bug that occurs after comparison SA with scalar (see also #45110), but the problem is deeper (see #45126)

@bdrum bdrum changed the title BUG: SparseArray doesn't recalculate indices after comparing with scalar BUG: SparseArray doesn't recalculate indices after comparison with scalar Jan 1, 2022
@jreback jreback added this to the 1.4 milestone Jan 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants