Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Setting incompatible values into ea column raises instead of casting to object #47577

Closed
3 tasks done
phofl opened this issue Jul 1, 2022 · 5 comments
Closed
3 tasks done
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action

Comments

@phofl
Copy link
Member

phofl commented Jul 1, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

ser = pd.Series([1, 2], dtype="Int64")
ser[1] = "a"

This raises, while expanding casts to object

ser = pd.Series([1, 2], dtype="Int64")
ser[2] = "a"

Issue Description

Numpy dtypes are casting to object while ea dtypes raise. Was this a deliberate descision?

cc @jorisvandenbossche

Expected Behavior

I would probably expect this to work.

Installed Versions

INSTALLED VERSIONS

commit : 55d9dcf
python : 3.8.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-40-generic
Version : #43-Ubuntu SMP Wed Jun 15 12:54:21 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.0.dev0+1066.g55d9dcf47d
numpy : 1.22.4
pytz : 2022.1
dateutil : 2.8.2
setuptools : 62.3.2
pip : 22.1.2
Cython : 0.29.30
pytest : 7.1.2
hypothesis : 6.47.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.4
brotli :
fastparquet : 0.8.1
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.5.2
numba : 0.53.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : 1.1.7
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.8.1
snappy :
sqlalchemy : 1.4.37
tables : 3.7.0
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None
None

@phofl phofl added Bug Needs Triage Issue that has not been reviewed by a pandas team member Indexing Related to indexing on series/frames, not to indexes themselves NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 1, 2022
@simonjayhawkins
Copy link
Member

Numpy dtypes are casting to object while ea dtypes raise. Was this a deliberate descision?

#25383 somewhat related. categorical is also inconsistent imo. see #34011 (comment).

I think assigning a value to the Series should not raise unlike assigning a value to the underlying array which would be expected to raise. So it's a bit more complex than dispatching to the underlying array (unless using at).

@simonjayhawkins simonjayhawkins added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 9, 2022
@jorisvandenbossche
Copy link
Member

Numpy dtypes are casting to object while ea dtypes raise. Was this a deliberate descision?

I would say that this was "somewhat deliberate"

For the specific example of Int64, it was a deliberate decision for the MaskedArray.__setitem__ to raise for this (so basically what Simon said: for the array it is expected to raise, but if we want to have it not raise on the Series level, then this should be handled at that level).

But that coupled with the fact that historically, we have had an ExtensionBlock.setitem (initially only used internally for Categorical dtype and for external EAs) that doesn't cast to object if the setitem doesn't fail, in contrast to the main Block.setitem implementation that does. There is an older issue about exactly this: #24020

So indeed this is related to the fact that for Categorical, we also do raise for incompatible values in setitem:

In [2]: s = pd.Series(["a", "b", "c"], dtype="category")

In [3]: s[0] = 1
...
ValueError: Cannot setitem on a Categorical with a new category, set the categories first

(although with a ValueError, not a TypeError)

And so because we started using the ExtensionBlock also for the new nullable dtypes, as a result we got into this inconsistency for numeric data types that the default numpy ones don't raise and the nullable ones do raise.

This wasn't explicitly deliberate for the nullable dtypes (AFAIR), but it was for ExtensionBlock (although it is a valid question if we actually want this for all (external) EAs, and not only special case our own Categorical EA -> #24020)

(personally, because I think we should make the raising the default (#39584), and at some point I was thinking it would be good that the nullable dtypes already implement the future behaviour, I also didn't care about "fixing" this inconsistency)

@jorisvandenbossche
Copy link
Member

And of course, the fact that setitem with enlargement does cast to object (because it goes through concat code, I think), is another inconsistency within the extension dtypes.
But that is also an issue for Categorical dtype:

In [7]: s.loc[3] = 'd'

In [8]: s
Out[8]: 
0    a
1    b
2    c
3    d
dtype: object

@jbrockmendel
Copy link
Member

Is this closed by the acceptance of PDEP6?

@phofl
Copy link
Member Author

phofl commented Jun 25, 2023

Yep, thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants