Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: setting behavior inserting negative numbers in np.uint/nullable UInt Series #48867

Open
3 tasks done
mroeschke opened this issue Sep 29, 2022 · 6 comments
Open
3 tasks done
Labels
API - Consistency Internal Consistency of API/Behavior Bug Indexing Related to indexing on series/frames, not to indexes themselves NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@mroeschke
Copy link
Member

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [22]: np.__version__
Out[22]: '1.23.1'

In [23]: arr = np.array([1], dtype=np.uint8)

In [24]: arr[0] = -1

In [25]: arr
Out[25]: array([255], dtype=uint8)

In [26]: ser =  pd.Series([1], dtype=np.uint8)

In [27]: ser.iloc[0] = -1

In [28]: ser
Out[28]:
0   -1
dtype: int16

In [30]: ser_nullable = pd.Series([1], dtype="UInt8")

In [31]: ser_nullable.iloc[0] = -1

In [32]: ser_nullable
Out[32]:
0    255
dtype: UInt8

Issue Description

  • Numpy currently returns an overflowed valued but may raise in the future with NEP 50 cc @seberg

  • Series with np.uint appears to upcast the type to support -1

  • Series with nullable UInt appears to return an overflowed value

Here are the corresponding construction behavior

In [33]: np.array([-1], dtype=np.uint8)
Out[33]: array([255], dtype=uint8)

In [34]: pd.Series([-1], dtype=np.uint8)
OverflowError: Trying to coerce negative values to unsigned integers

In [35]: pd.Series([-1], dtype="UInt8")
TypeError: Cannot cast array data from dtype('int64') to dtype('uint8') according to the rule 'safe'

The above exception was the direct cause of the following exception:

TypeError: cannot safely cast non-equivalent int64 to uint8

Expected Behavior

Not sure if the existing rules here are established, but maybe given construction raises shouldn't setting too?

Installed Versions

Numpy Version: '1.23.1'

@mroeschke mroeschke added Bug Indexing Related to indexing on series/frames, not to indexes themselves API - Consistency Internal Consistency of API/Behavior NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Sep 29, 2022
@phofl
Copy link
Member

phofl commented Sep 29, 2022

Isn’t the setting behavior similar to when you set a float into an integer column? Eg we try to find a common dtype

@mroeschke
Copy link
Member Author

Ah yeah that's analogous behavior.

Here's I'm pointing out the inconsistency between non-nullable and nullable uint and what @seberg is considering in the future for NEP 50.

@seberg
Copy link
Contributor

seberg commented Sep 29, 2022

At this point, I actually don't care much about what we do here in NumPy (which may be my problem). Doing it should simplify the code/logic in the long term, though ;).

If pandas typically (or even sometimes) has the behavior of promoting columns on insertion then it would seem to me that an error in NumPy is preferable over doing the unsafe cast (from a pandas perspective at least).

@phofl
Copy link
Member

phofl commented Sep 29, 2022

The inconsistency on our side is a bit related to #47577 too.

@pllim
Copy link

pllim commented Oct 25, 2023

I think this is biting us downstream now that NEP 50 is implemented in numpy 2.0.dev.

Example log: https://github.com/spacetelescope/jdaviz/actions/runs/6643820714/job/18051683584 (that is a lot to read through so I repeated the relevant bits below)

Numpy: 2.0.0.dev0+git20231025.9f6789c
Pandas: 2.2.0.dev0+447.gaae33c036c
...
.../pandas/core/dtypes/cast.py:594: in maybe_promote
    dtype, fill_value = _maybe_promote(dtype, fill_value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dtype = dtype('int64'), fill_value = -1
...
            elif issubclass(dtype.type, np.integer):
>               if not np.can_cast(fill_value, dtype):
E               TypeError: can_cast() does not support Python ints, floats, and complex because the result used to depend on the value.
E               This change was part of adopting NEP 50, we may explicitly allow them again in the future.

.../pandas/core/dtypes/cast.py:702: TypeError

def _maybe_promote(dtype: np.dtype, fill_value=np.nan):

@mroeschke mroeschke mentioned this issue Oct 25, 2023
2 tasks
@mroeschke
Copy link
Member Author

Thanks for the report. Addressing this in #55707 and will hopefully get this in today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Indexing Related to indexing on series/frames, not to indexes themselves NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

No branches or pull requests

4 participants