Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect dtype for concat with empty float and datetime columns #32934

Closed
TomAugspurger opened this issue Mar 23, 2020 · 6 comments · Fixed by #51816
Closed

Incorrect dtype for concat with empty float and datetime columns #32934

TomAugspurger opened this issue Mar 23, 2020 · 6 comments · Fixed by #51816
Assignees
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@TomAugspurger
Copy link
Contributor

Code Sample, a copy-pastable example if possible

In [40]: a = pd.DataFrame({"A": pd.array(['2000'], dtype='datetime64[ns]')})

In [41]: b = pd.DataFrame({"A": pd.array([1.0], dtype='float64')})

In [42]: pd.concat([a.iloc[:0], b.iloc[:0]])['A'].dtype
Out[42]: dtype('<M8[ns]')

Problem description

The return dtype of concatentating a float64 and datetime64[ns] should be object dtype. That's typically the case when both are non-empty

In [45]: pd.concat([a, b])['A'].dtype
Out[45]: dtype('O')

But when either is empty, we get incorrect results

In [42]: pd.concat([a.iloc[:0], b.iloc[:0]])['A'].dtype
Out[42]: dtype('<M8[ns]')

In [43]: pd.concat([a.iloc[:0], b])['A'].dtype
Out[43]: dtype('float64')

In [44]: pd.concat([a, b.iloc[:0]])['A'].dtype
Out[44]: dtype('<M8[ns]')

All of those should be object dtype.

@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Mar 23, 2020
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Mar 23, 2020
@mroeschke mroeschke added the Bug label Apr 2, 2020
@jbrockmendel
Copy link
Member

@jorisvandenbossche is this fixed by one of your recent or upcoming PRs?

@jorisvandenbossche
Copy link
Member

This is not fixed on master.

While certainly a bug, I think it is not directly related to the recent "skipping empties or not" discussion. It seems that this empty float array already gets casted to datetime64 in JoinUnit.get_reindexed_values, so before those values are passed as to_concat arrays to the contact machinery.

I opened a general issue at #39122, where I mentioned this case as example bug for DataFrame.

@mroeschke
Copy link
Member

These almost looked fixed. Could add a few tests for the cases that work and xfail the one that doesnt

In [23]: In [40]: a = pd.DataFrame({"A": pd.array(['2000'], dtype='datetime64[ns]')})
    ...:
    ...: In [41]: b = pd.DataFrame({"A": pd.array([1.0], dtype='float64')})

In [24]: In [42]: pd.concat([a.iloc[:0], b.iloc[:0]])['A'].dtype
Out[24]: dtype('O')

In [25]: In [42]: pd.concat([a.iloc[:0], b.iloc[:0]])['A'].dtype
Out[25]: dtype('O')

In [26]: In [43]: pd.concat([a.iloc[:0], b])['A'].dtype
Out[26]: dtype('O')

In [27]: In [44]: pd.concat([a, b.iloc[:0]])['A'].dtype
Out[27]: dtype('<M8[ns]')

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Jul 30, 2021
@GustavoGB
Copy link

Hello guys! Can I take this one ?

@AwaishK
Copy link

AwaishK commented Apr 5, 2022

take

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@liang3zy22
Copy link
Contributor

Take. PR already submitted as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
7 participants