Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: ExtensionDtype._dtype_with_na/_maybe_promote #45349

Open
jbrockmendel opened this issue Jan 13, 2022 · 4 comments
Open

ENH: ExtensionDtype._dtype_with_na/_maybe_promote #45349

jbrockmendel opened this issue Jan 13, 2022 · 4 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Jan 13, 2022

import numpy as np
import pandas as pd

ii = pd.interval_range(1, 10)
ser = pd.Series(ii).copy()
mask = np.zeros(ser.shape, dtype=bool)
mask[1] = True

ii.insert(1, np.nan)  # <- IntervalDtype[float64]
ii.putmask(mask, np.nan)  # <- IntervalDtype[float64]

ser.where(mask, np.nan)  # <- ValueError
ser[1] = np.nan  <- coerces to object

ATM dtypes.cast.ensure_dtype_can_hold_na is incorrect for EA dtypes that cannot hold NA values (the IntervalDtype[int] example is the only one that comes to mind).

We have special-casing for IntervalDtype in Index._find_common_type_compat which is why the Index methods above behave well. Short-term, we can move that special-casing into ensure_dtype_can_hold_na. Longer-term, we need a way for EADtype subclasses to specify this behavior.

This could be seen as a special case of xref #24246

cc @jorisvandenbossche @TomAugspurger

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 13, 2022
@mroeschke mroeschke added Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 14, 2022
@jorisvandenbossche
Copy link
Member

There is already a _can_hold_na attribute on the ExtensionDtype. It's of course not fully the same (it doesn't distinguish between different types of NAs, just whether it can hold any type of NA), but for the specific use case of IntervalDtype[int] not being able to store NAs that could be sufficient?

@jbrockmendel
Copy link
Member Author

There is already a _can_hold_na attribute on the ExtensionDtype

The question isn't "can this store an NA", it's "given that it can't, what's the closest dtype that can?"

@jbrockmendel
Copy link
Member Author

Just discovered there are a couple places in core.indexing where we call maybe_promote (which is annotated as only taking numpy dtypes) with potentially-EA dtypes. Inside maybe_promote we have special handling for CategoricalDtype (which is reached in the tests). And outside maybe_promote in the indexing code with have curr_dtype = getattr(self.obj.dtype, "numpy_dtype", self.obj.dtype) which looks like a workaround to support MaskedDtypes here.

@jorisvandenbossche
Copy link
Member

Longer-term, we need a way for EADtype subclasses to specify this behavior.

Or alternatively: longer term, every EA can hold NAs, or otherwise raise an error if we try to put NAs in it (this also relates to the proposal to preserve dtypes in setitem-like operations).

(I know this doesn't solve the specific issue for Interval[int], but if that's the currently the only one that gives a wrong result in ensure_dtype_can_hold_na, that might not be worth making a generic property for it on the EA class/dtype)

@jbrockmendel jbrockmendel changed the title ENH: ExtensionDtpye._dtype_with_na/_maybe_promote ENH: ExtensionDtype._dtype_with_na/_maybe_promote Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants