ENH: EADtype._find_compatible_dtype #53106

jbrockmendel · 2023-05-05T15:51:53Z

closes EA interface: add maybe_promote? #24246
closes ENH: ExtensionDtype._dtype_with_na/_maybe_promote #45349
closes BUG: assignment with enlargement gives object dtype with ExtensionArrays #32346
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Tests copied from #52833, so this may close #52235.

Needs a lot of cleanup, in particular docs. The ArrowDtype._find_compatible_dtype implementation is quite kludgy, would appreciate help from someone more knowledgeable.

I think _find_compatible_dtype is the "correct" long-term non-kludge solution for setitem-with-expansion.

pandas/core/arrays/arrow/dtype.py

mroeschke · 2023-05-05T20:39:56Z

pandas/core/arrays/arrow/dtype.py

+
+    def _maybe_promote(self, item: Any) -> tuple[DtypeObj, Any]:
+        if isinstance(item, pa.Scalar):
+            if not item.is_valid:


Shouldn't NA always be able to be inserted into ArrowExtensionArray?

im not sure. pyarrow nulls are typed, so we could plausibly want to disallow e.g. <pyarrow.TimestampScalar: None> in a pyarrow integer dtype

mroeschke · 2023-05-09T17:56:10Z

pandas/core/indexing.py

-            elif isna(value):
-                new_dtype = None
+            new_dtype = None
+            if is_list_like(value):


Note for ArrowDtype with pa.list_ type, we would want to treat value like a scalar e.g

ser = pd.Series([[1, 1]], dtype=pd.ArrowDtype(pa.list_(pa.int64()))) ser[0] = [1, 2]

yah, getting rid of this is_list_like check causes us to incorrectly raise on numpy non-object cases when using a list value (for which we don't have any tests). Can fix that in this PR or separately, as it is a bit more invasive.

mroeschke · 2023-05-09T18:33:25Z

pandas/core/arrays/arrow/dtype.py

+            item = item.as_py()
+
+        elif item is None or item is libmissing.NA:
+            # TODO: np.nan? use is_valid_na_for_dtype


Since pyarrow supports nan vs NA, possibly we want to allow nan if pa.types.is_floating(self.pyarrow_dtype)

what to do here depends on making a decision about when/how to distinguish between np.nan and pd.NA (which i hope to finally nail down at the sprint). doing this The Right Way would involve something like implementing EA._is_valid_na_for_dtype

mroeschke · 2023-05-09T18:33:45Z

pandas/core/arrays/arrow/dtype.py

+                # TODO: ask joris for help making these checks more robust
+                if item.type == self.pyarrow_dtype:
+                    return self, item.as_py()
+                if item.type.to_pandas_dtype() == np.int64 and self.kind == "i":


Why is this needed specifically?

This was just to get the tests copied from #52833 passing.

github-actions · 2023-06-09T00:05:27Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

jbrockmendel · 2023-07-10T19:44:03Z

pandas/core/dtypes/dtypes.py

+    def _maybe_promote(self, item: Any) -> tuple[DtypeObj, Any]:
+        if isinstance(item, pa.Scalar):
+            if not item.is_valid:
+                # TODO: ask joris for help making these checks more robust


@jorisvandenbossche any thoughts here? (not time-sensitive)

jbrockmendel · 2023-07-27T21:36:13Z

Updated with some added tests, changed EADtype._maybe_promote->_find_compatible_dtype.

There are some TODOs mostly for pyarrow and related to nan-vs-NA, but I don't think any of them are blockers.

jbrockmendel · 2023-07-28T18:16:14Z

_find_compatible_dtype can be used to implement a default implementation of _is_valid_scalar_for, which can be used to implement _from_scalars (both discussed on this week's dev call).

phofl · 2023-08-31T08:30:54Z

pandas/core/dtypes/dtypes.py

@@ -2326,3 +2347,29 @@ def __from_arrow__(self, array: pa.Array | pa.ChunkedArray):
        array_class = self.construct_array_type()
        arr = array.cast(self.pyarrow_dtype, safe=True)
        return array_class(arr)
+
+    def _find_compatible_dtype(self, item: Any) -> tuple[DtypeObj, Any]:
+        if isinstance(item, pa.Scalar):


what happens if item is null? The pyarrow null

IIUC pyarrow nulls are now typed. id prefer to be strict about making these match, but dont care that much. am hoping @jorisvandenbossche will weigh in

phofl · 2023-08-31T08:32:40Z

pandas/tests/extension/test_arrow.py

+    [
+        (pa.scalar(4, type="int32"), 4, "int32[pyarrow]"),
+        (pa.scalar(4, type="int64"), 4, "int32[pyarrow]"),
+        # (pa.scalar(4.5, type="float64"), 4, "int32[pyarrow]"),


What happens here?

Also what happens with a int64 scalar and int32 dtype?

id want to follow the same logic we do for numpy dtypes, but was punting here in expectation of doing it in a follow-up (likely involving joris expressing an opinion)

ENH: EADtype._maybe_promote

a0f8c31

mroeschke reviewed May 5, 2023

View reviewed changes

pandas/core/arrays/arrow/dtype.py Outdated Show resolved Hide resolved

jbrockmendel added setitem-with-expansion ExtensionArray Extending pandas with custom dtypes or arrays. labels May 5, 2023

mroeschke reviewed May 5, 2023

View reviewed changes

jbrockmendel added 2 commits May 6, 2023 13:23

simplify check

0770a07

docstring

9be179b

mroeschke reviewed May 9, 2023

View reviewed changes

github-actions bot added the Stale label Jun 9, 2023

Merge branch 'main' into enh-maybe_promote

1271e26

jbrockmendel commented Jul 10, 2023

View reviewed changes

jbrockmendel added 3 commits July 27, 2023 13:57

Merge branch 'main' into enh-maybe_promote

48a2f6d

Better name, tests

c62dcaa

Updat edoc

c85a3eb

jbrockmendel added 2 commits July 27, 2023 15:52

Fix setitem-with-expansion for Decimal

fb0b03a

Merge branch 'main' into enh-maybe_promote

b809fd7

jbrockmendel changed the title ~~ENH: EADtype._maybe_promote~~ ENH: EADtype._find_compatible_dtype Jul 28, 2023

jbrockmendel added 2 commits July 28, 2023 16:22

Merge branch 'main' into enh-maybe_promote

deab083

Merge branch 'main' into enh-maybe_promote

b565439

jbrockmendel mentioned this pull request Jul 31, 2023

ENH: EA._from_scalars #53089

Merged

5 tasks

jbrockmendel added 3 commits August 1, 2023 07:23

Merge branch 'main' into enh-maybe_promote

c3fbbcb

Merge branch 'main' into enh-maybe_promote

995f4b4

Merge branch 'main' into enh-maybe_promote

6f481f9

phofl reviewed Aug 31, 2023

View reviewed changes

Merge branch 'main' into enh-maybe_promote

0790932

jbrockmendel added 3 commits September 14, 2023 14:26

Merge branch 'main' into enh-maybe_promote

2b9a092

Merge branch 'main' into enh-maybe_promote

a85a6d4

Merge branch 'main' into enh-maybe_promote

92227ef

jbrockmendel closed this Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: EADtype._find_compatible_dtype #53106

ENH: EADtype._find_compatible_dtype #53106

jbrockmendel commented May 5, 2023 •

edited

Loading

mroeschke May 5, 2023

jbrockmendel May 6, 2023

mroeschke May 9, 2023

jbrockmendel May 9, 2023

mroeschke May 9, 2023

jbrockmendel May 9, 2023

mroeschke May 9, 2023

jbrockmendel May 9, 2023

github-actions bot commented Jun 9, 2023

jbrockmendel Jul 10, 2023

jbrockmendel commented Jul 27, 2023

jbrockmendel commented Jul 28, 2023

phofl Aug 31, 2023

jbrockmendel Aug 31, 2023

phofl Aug 31, 2023

jbrockmendel Aug 31, 2023

ENH: EADtype._find_compatible_dtype #53106

ENH: EADtype._find_compatible_dtype #53106

Conversation

jbrockmendel commented May 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 9, 2023

Choose a reason for hiding this comment

jbrockmendel commented Jul 27, 2023

jbrockmendel commented Jul 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented May 5, 2023 •

edited

Loading