Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Series.values with tz-aware should return object array of Timestamps #15750

Closed
jreback opened this issue Mar 20, 2017 · 7 comments · Fixed by #24596
Closed

API: Series.values with tz-aware should return object array of Timestamps #15750

jreback opened this issue Mar 20, 2017 · 7 comments · Fixed by #24596
Labels
Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype Timezones Timezone data dtype

Comments

@jreback
Copy link
Contributor

jreback commented Mar 20, 2017

xref #14052
discussed at bit in the original PR: #10477

I think was a mistake to return a datetime64[ns] in UTC for a tz-aware Series. we should
simply return an object array of Timestamps, as it round-trips correctly. IOW, you don't lose the timezones.

tz-naive
In [9]: Series(pd.date_range('20130101',periods=3)).values
Out[9]: 
array(['2013-01-01T00:00:00.000000000', '2013-01-02T00:00:00.000000000',
       '2013-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

tz-aware - currently
In [10]: Series(pd.date_range('20130101',periods=3, tz='US/Eastern')).values
Out[10]: 
array(['2013-01-01T05:00:00.000000000', '2013-01-02T05:00:00.000000000',
       '2013-01-03T05:00:00.000000000'], dtype='datetime64[ns]')

what we should do

In [15]: np.array(Series(pd.date_range('20130101',periods=3, tz='US/Eastern')).tolist())
Out[15]: 
array([Timestamp('2013-01-01 00:00:00-0500', tz='US/Eastern'),
       Timestamp('2013-01-02 00:00:00-0500', tz='US/Eastern'),
       Timestamp('2013-01-03 00:00:00-0500', tz='US/Eastern')], dtype=object)

round trips are plainly wrong (current)

In [5]: arr = Series(pd.date_range('20130101',periods=3, tz='US/Eastern'))

In [6]: Series(arr.values)
Out[6]: 
0   2013-01-01 05:00:00
1   2013-01-02 05:00:00
2   2013-01-03 05:00:00
dtype: datetime64[ns]

round-trips are preserved (proposed)

In [3]: arr = np.array(Series(pd.date_range('20130101',periods=3, tz='US/Eastern')).tolist())

In [4]: Series(arr)
Out[4]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]

I don't think there is any way to transition to this and we simply have to change it. `

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype Timezones Timezone data dtype labels Mar 20, 2017
@jreback jreback added this to the 0.20.0 milestone Mar 20, 2017
@jreback
Copy link
Contributor Author

jreback commented Mar 20, 2017

cc @sdementen

@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Apr 8, 2017
@jreback
Copy link
Contributor Author

jreback commented Apr 8, 2017

going to put a PR soon to fix this.

@jorisvandenbossche
Copy link
Member

I am not sure this is worth changing again before pandas 2.0.

True, the current output of .values is not ideal. I fully agree with that. But outputting an object array is also not ideal.

I agree that roundtripping is not preserved, but is that so important? To quote from the original PR adding tz datetime block: " if you are using .values then you have to be cognizant of what you are doing". This stays the same for both of those solutions.
People relying on .values for datetime aware series, should already be aware of the current output and have to work around its limitations (eg save metadata to faithfully recreate the series again), just as they will have to do with the new output of an object array (eg convert to datetime64 array any way if it is for performance sensitive code), but only in a different way, and keeping both ways if they want to support multiple pandas versions.

@jorisvandenbossche
Copy link
Member

I see the linked issue of multiple columns -> object array of Timestamps vs single column/Series -> coerced to datetime64. I agree that is certainly not ideal as well ..

@jreback
Copy link
Contributor Author

jreback commented Apr 9, 2017

This is for sure worth fixing (way before p2). I'll move it to 0.21.0 though. This should coordinate with the change for DataFrame (which is actually much more of a bug).

@jreback jreback modified the milestones: 0.21.0, 0.20.0 Apr 9, 2017
@jreback jreback modified the milestones: 0.21.0, Interesting Issues Sep 23, 2017
@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017
@mroeschke
Copy link
Member

I believe this API change would be useful for DatetimeIndex as well:

In [7]: dt_aware = pd.date_range('20130101',periods=3, tz='US/Eastern')

In [8]: dt_aware
Out[8]:
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
               '2013-01-03 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')

In [9]: pd.DatetimeIndex(dt_aware.values)
Out[9]:
DatetimeIndex(['2013-01-01 05:00:00', '2013-01-02 05:00:00',
               '2013-01-03 05:00:00'],
              dtype='datetime64[ns]', freq=None)

as this is the root cause for #19420

# Tz info is lost within stack like so...
In [11]: dt_aware.values.ravel()
Out[11]:
array(['2013-01-01T05:00:00.000000000', '2013-01-02T05:00:00.000000000',
       '2013-01-03T05:00:00.000000000'], dtype='datetime64[ns]')

@jreback
Copy link
Contributor Author

jreback commented Feb 11, 2018

@mroeschke this is not hard to fix, more about carefully changing tests. welcome a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants