Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EA interface: add maybe_promote? #24246

Open
h-vetinari opened this issue Dec 12, 2018 · 6 comments
Open

EA interface: add maybe_promote? #24246

h-vetinari opened this issue Dec 12, 2018 · 6 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Dec 12, 2018

I'm preparing a PR that addresses the short-comings of core.dtypes.cast.maybe_promote uncovered in #23982, and there's currently only a branch in there along the following lines:

    def maybe_promote(dtype, fill_value)
    [...]
    elif is_extension_array_dtype(dtype) and isna(fill_value):
        fill_value = dtype.na_value
    [...]
    return dtype, fill_value

This does not describe any promotion rules, and (like the rest of maybe_promote) is very inconsistent about the returned missing value (see #23982).

My suggestion would be to to something like the following:

if is_extension_array_dtype(dtype):
    # dispatch to EA
    dtype.maybe_promote(fill_value)

which would let EA authors decide how to handle dtype promotions (this is relevant e.g. for #24020).
The default could be to always upcast to object, but in many cases, there would be clear conditions for what doesn't upcast (e.g. all integers in the new 'Int' dtypes).

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 17, 2018

Yeah, I think something like this will eventually be necessary.

I don't think it would be part of the user facing API though. Rather, we would call some kind of pandas._maybe_promote(array, dtype). For ExtesnsionArray array, that would dispatch to something like ExtensionArray._maybe_promote(dtype), which would return the desired dtype.

@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Dec 17, 2018
@h-vetinari
Copy link
Contributor Author

h-vetinari commented Dec 17, 2018

@TomAugspurger
I'm not saying this needs to be user-facing at all, but it should be something that EA authors implement (or fall back to always upcasting to object otherwise).

Not sure if you were missing a dot in there somewhere, but ExtensionDType._maybe_promote(fill_value=np.nan) would sound good to me.

EDIT: got confused myself about the signature...

@TomAugspurger
Copy link
Contributor

Not sure if you were missing a dot

Yep, fixed.

ExtensionDType._maybe_promote

Do you mean ExtensionArray? Or do you think this can be done at the dtype level? I was assuming ExtensionArray, but it'd be great if we could do this without looking at the actual values.

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Dec 17, 2018

@TomAugspurger

Do you mean ExtensionArray? Or do you think this can be done at the dtype level? I was assuming ExtensionArray, but it'd be great if we could do this without looking at the actual values.

There are two parameters at play:

  • a dtype, asking if it needs to be upcast to accommodate
  • fill_value (which can be scalar, np.ndarray, or an EA itself)

core.dtype.cast.maybe_promote is not a class method, hence the signature is
maybe_promote(dtype, fill_value=np.nan). So, assuming dtype is an ExtensionDType, we would indeed only have to call ExtensionDType._maybe_promote(fill_value=np.nan). The value-inspection then has to happen on the side of fill_value, at least if it's an array.

This is another (separate) aspect of the whole thing, where

maybe_promote(some_dtype, fill_value=SomeExtensionArray)

could dispatch to SomeExtensionArray._maybe_promote(some_dtype) (or a method with a different name, due to the effectively inverted signature) to see if the EA can upcast some_dtype to itself, or if it needs to go to object.

@jbrockmendel
Copy link
Member

I think this is now (or at least in 2.0 will be) unnecessary. Once deprecations are enforced IIRC we'll be able to rewrite maybe_promote as something like:

def maybe_promote(left:ArrayLike, right):
    dtype = infer_dtype_from(right)
    new_dtype = find_common_dtype([left, right])
    return left.astype(new_dtype)

So the EA authors implement the appropriate behavior in Dtype._get_common_type

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Dec 21, 2021
@jbrockmendel jbrockmendel removed the Closing Candidate May be closeable, needs more eyeballs label Jan 5, 2022
@jbrockmendel
Copy link
Member

On second thought, I'm coming around on this. The infer_dtype_from+find_common_type pattern leaves something to be desired in a few cases:

  1. right is a small python int and right is ndarray[uint8], infer_dtype_from will infer to int64 and lead us to upcast more than necessary.
  2. right = "D" and left is a Categorical with categories=["A", "B", "C"]. The current pattern will lead us to object dtype, where we may want to end up with a Categorical with categories=["A", "B", "C", "D"] (e.g. we do something like this in Categorical._replace, and ideally the upcasting logic would not live there)
  3. I think this would allow us to avoid more Categorical shenanigans in dtypes.concat.cast_to_common_type and Index._find_common_type_compat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants