Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concat empty series with mixed TZ doesn't coerce to object #22186

Closed
TomAugspurger opened this issue Aug 3, 2018 · 5 comments · Fixed by #52532
Closed

Concat empty series with mixed TZ doesn't coerce to object #22186

TomAugspurger opened this issue Aug 3, 2018 · 5 comments · Fixed by #52532
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 3, 2018

Do we intentionally special case concatenating an empty series with a non-empty series here? I don't think we should. It'd be nice to statically know the output dtype of a concat without having to look into the values.

In [2]: import pandas as pd

In [3]: pd.concat([pd.Series([], dtype='datetime64[ns, US/Central]'), pd.Series(['2018'], dtype='datetime64[ns, US/Eastern]')])
Out[3]:
0   2018-01-01 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]

In [4]: pd.concat([pd.Series([], dtype='datetime64[ns, US/Central]'), pd.Series([], dtype='datetime64[ns, US/Eastern]')])
Out[4]: Series([], dtype: object)

I'd expect both of those to be object dtype.

@TomAugspurger TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Aug 3, 2018
@mroeschke mroeschke added the Timezones Timezone data dtype label Aug 3, 2018
@TomAugspurger
Copy link
Contributor Author

Taking this over as a general concat issue

Currently concat[SparseSeries] uses the fill_value of the first array.

In [13]: pd.concat([
    ...:     pd.SparseSeries([0, 0, None], fill_value=0),
    ...:     pd.SparseSeries([0, None, 0], fill_value=np.nan)
    ...: ]).fill_value
Out[13]: 0

In [14]: pd.concat([
    ...:     pd.SparseSeries([0, 0, None], fill_value=np.nan),
    ...:     pd.SparseSeries([0, None, 0], fill_value=0)
    ...: ]).fill_value
Out[14]: nan

Are we comfortable calling that a bug? What's the expected behavior, raising?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Aug 3, 2018

I guess this raises a general design question: what dtype coercion are we comfortable with concat doing?

  1. We're clearly OK with concatenating a mix of ints and floats and getting floats.
  2. There's the open issue of concat[categorical] using union categorical
  3. We'll need to figure out concat of Period with different freqs (unless we already do that?)

@TomAugspurger
Copy link
Contributor Author

For concat([sparse, dense]) I'd propose that the result always be sparse.

Currently, we try to keep it sparse. e.g. concat([ Sparse[float64], float64 ]) is Sparse[float64].
But concat([ Sparse[float64], object] ) is object. IMO it should be Sparse[object].

@jreback
Copy link
Contributor

jreback commented Aug 3, 2018

prob should density if sparse and dense and both float

@TomAugspurger
Copy link
Contributor Author

prob should density if sparse and dense and both float

Hmm I don't think so. I think ever implicitly going from sparse to dense is problematic. And on master we currently sparsify the dense series, but we're inconsistent about it:

  • Sparse[float] & float -> Sparse[float]
  • Sparse[float] & object -> object (dense)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants