API: tighten DTA/TDA _from_sequence signature #37179

jbrockmendel · 2020-10-16T22:56:42Z

cc @jorisvandenbossche

need to decide how to handle test_from_2d_ndarray_with_dtype which this currently xfails

…rict-tda

…f-signatures-2

jreback

this looks like a substantial (but good) api change, or is this just on master as _from_sequence is actually strict? (in 1.1.x)

jbrockmendel · 2020-10-20T16:11:37Z

this looks like a substantial (but good) api change, or is this just on master as _from_sequence is actually strict? (in 1.1.x)

not just on master, this makes _from_sequence substantially stricter than 1.1.x

jorisvandenbossche · 2020-10-20T20:30:08Z

Thanks for looking at this!

Now, for the actual behavioural change, _from_sequence is still being used for eg pd.array(.., dtype="datetime64[ns"]) as well, so we first need to decide what behaviour we want for that.
Otherwise could also keep this in a _from_sequence_strict (and rename _from_sequence_not_strict to _from_sequence)

jreback · 2020-10-23T00:21:28Z

pandas/core/arrays/datetimes.py

+            pass
+        elif scalars.dtype == object:
+            if isinstance(scalars, ABCMultiIndex):
+                raise TypeError("Cannot create a DatetimeArray from MultiIndex")


this tested?

jreback · 2020-10-23T00:22:49Z

pandas/core/arrays/datetimes.py

+            #  with inferred_type as above?
+            pass
+        else:
+            msg = f"dtype {scalars.dtype} cannot be converted to datetime64[ns]"


can you ensure we have a test that hits this

jreback · 2020-10-23T00:23:00Z

pandas/core/arrays/timedeltas.py

+            # TODO: should go through from_sequence_of_strings?
+            pass
+        else:
+            raise TypeError(data.dtype)


can you ensure we have a test that hits hits

…f-signatures-2

jbrockmendel · 2020-10-24T01:59:04Z

just pushed with a couple new tests that get us to full coverage on the affected methods

…f-signatures-2

jreback

couple of comments on is this added code tested

jreback · 2020-10-31T15:40:10Z

pandas/tests/frame/test_constructors.py

@@ -2869,6 +2869,7 @@ def test_from_tzaware_mixed_object_array(self):
        ]
        assert (res.dtypes == expected_dtypes).all()

+    @pytest.mark.xfail(reason="DatetimeArray._from_sequence no longer accepts i8")


we need a note for this?

the question is if we want to disable/deprecate this or find a way to make it work

is this something we want to support or disallow?

To be clear, the question is not so much about _from_sequence, but rather about our public constructors (Series(), Index(), array()).
Currently, those support converting integers to datetime dtype. For example:

pd.Series([1, 2, 3], dtype="datetime64[ns]")

works fine (and the same for the other two).
So the question is if we want to keep this kind of behaviour. Personally I would say there is not much harm in allowing it (also given that eg numpy and pyarrow have the same behaviour), it's only when there are timezones that there could be a bit more ambiguity potentially (but eg for timedelta that's already not relevant).

I tend to agree. Any thoughts on where to catch this case?

Any thoughts on where to catch this case?

Not fully sure what you mean, but (until #33254 is resolved) doing such conversions is the responsibility of _from_sequence (for pd.array), so if we want to keep this capability, at the moment it is _from_sequence that needs to accept ints.

Not fully sure what you mean

I mean that we could make this Series construction work without loosening the _from_sequence restrictions by and instead calling [??] in this case.

so if we want to keep this capability, at the moment it is _from_sequence that needs to accept ints.

To be clear, is this what you are suggesting we do? My impression from previous conversations is that you are unambiguously in favor of this tightening.

My earlier suggestion at #37179 (comment) was to use _from_sequence/_from_sequence_strict (instead of _from_sequence_non_strict/_from_sequence), until we decide on better naming for this. So keeping the non-strict behaviour of _from_sequence for now.
That already cleans up the code (separating both behaviours more cleanly), without being blocked on API discussions.

My impression from previous conversations is that you are unambiguously in favor of this tightening.

Yes, for sure. But before we can implement that, we still need to come up with an API for the non-strict conversion as well, as that is still something we need to support. That's the whole discussion of #33254

jreback · 2020-11-04T02:58:18Z

if u can rebase; i think a couple of questions

…f-signatures-2

jbrockmendel · 2020-11-05T00:11:28Z

cc @WillAyd most recent commit made small edits to json code, can you take a look

WillAyd · 2020-11-05T16:23:24Z

lgtm - I think the explicitness here is nice

…f-signatures-2

jbrockmendel · 2020-11-10T00:03:49Z

@jorisvandenbossche any objection to this as a temporary workaround? im pretty sure It will allow us to clean up a bunch of code in .isin and .equals xref #37528

…f-signatures-2

jbrockmendel · 2020-11-20T04:06:58Z

updated per discussion so that _from_sequence remains non-strict while a new _from_sequence_strict has the hopefully-future behavior of from_sequence.

im hoping to use from_sequence strict to simplify+de-duplicate .equals and .isin

jorisvandenbossche

Thanks for the update!

General question: in the tslib conversion functions, those are all "generic" in the sense of accepting a mixture of string, datetime, Timestamp, int, etc (like array_to_datetime seems to do) ? We don't have one that is more strict on the input that could be used for this purpose? (instead of checking up front with infer_dtype)

pandas/tests/arrays/test_array.py

pandas/core/arrays/timedeltas.py

…f-signatures-2

jbrockmendel · 2020-11-20T21:10:53Z

We don't have one that is more strict on the input that could be used for this purpose?

Good idea, we can use tslibs.conversion.datetime_to_datetime64. AFAICT we dont have an equivalent for timedelta, but it wouldn't be hard to implement one. libperiod.extract_ordinals can also be adapted to do the same thing for PeriodArray (separate PR).

Will update.

jbrockmendel · 2020-11-21T17:35:04Z

mothballing to clear the queue while i address comments locally

jbrockmendel added 9 commits September 29, 2020 15:15

EA: tighten TimedeltaArray._from_sequence signature

eab1030

Merge branch 'master' of https://github.com/pandas-dev/pandas into st…

137fdf1

…rict-tda

EA: Tighten signature on DatetimeArray._from_sequence

e54ed75

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

3d2a7ab

…f-signatures-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

96a8475

…f-signatures-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

d49d819

…f-signatures-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

f693ed4

…f-signatures-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

058338a

…f-signatures-2

API: restrict DTA/TDA _from_sequence

77e7c21

jreback added the API - Consistency Internal Consistency of API/Behavior label Oct 17, 2020

jreback requested changes Oct 20, 2020

View reviewed changes

jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Oct 20, 2020

jreback requested changes Oct 23, 2020

View reviewed changes

jbrockmendel added 2 commits October 23, 2020 16:36

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

7e97d03

…f-signatures-2

test

f2a2aaf

This was referenced Oct 28, 2020

CI: 32 bit maybe_indices_to_slice #37473

Merged

BUG: isin incorrectly casting ints to datetimes #37528

Merged

jbrockmendel added 2 commits October 30, 2020 17:39

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

00d19e3

…f-signatures-2

lint fixup

bc532be

jreback requested changes Oct 31, 2020

View reviewed changes

jbrockmendel mentioned this pull request Nov 1, 2020

BUG: Improve error message when casting ExtensionDtype to datetime #37553

Closed

jbrockmendel added 4 commits November 3, 2020 19:38

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

2e612c4

…f-signatures-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

b8db5c1

…f-signatures-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

cad1104

…f-signatures-2

Use _from_sequence_of_strings where appropriate

8f1b25d

jbrockmendel added 4 commits November 5, 2020 10:14

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

fb41950

…f-signatures-2

workaround for i8 case

9f72ad8

mypy fixup

54c87da

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

d87533b

…f-signatures-2

jbrockmendel added 5 commits November 11, 2020 10:46

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

158f2b2

…f-signatures-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

b071b5f

…f-signatures-2

from_sequence -> from_sequence_strict

bffacb1

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

0153a51

…f-signatures-2

mypy fixup

5acede8

jorisvandenbossche reviewed Nov 20, 2020

View reviewed changes

pandas/tests/arrays/test_array.py Outdated Show resolved Hide resolved

pandas/core/arrays/timedeltas.py Outdated Show resolved Hide resolved

pandas/core/arrays/timedeltas.py Outdated Show resolved Hide resolved

jbrockmendel added 2 commits November 20, 2020 11:26

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

3b25043

…f-signatures-2

revert test edits

8c87760

jbrockmendel closed this Nov 21, 2020

jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Nov 21, 2020

jbrockmendel deleted the ref-signatures-2 branch November 20, 2021 23:21

Uh oh!

API: tighten DTA/TDA _from_sequence signature #37179

API: tighten DTA/TDA _from_sequence signature #37179

Uh oh!

Conversation

jbrockmendel commented Oct 16, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Oct 20, 2020

Uh oh!

jorisvandenbossche commented Oct 20, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Oct 24, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 4, 2020

Uh oh!

jbrockmendel commented Nov 5, 2020

Uh oh!

WillAyd commented Nov 5, 2020

Uh oh!

jbrockmendel commented Nov 10, 2020

Uh oh!

jbrockmendel commented Nov 20, 2020

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbrockmendel commented Nov 20, 2020

Uh oh!

jbrockmendel commented Nov 21, 2020

Uh oh!

Uh oh!

jorisvandenbossche Nov 5, 2020 •

edited

Loading