Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp and DatetimeIndex construct different timezones #23815

Closed
TomAugspurger opened this issue Nov 20, 2018 · 9 comments · Fixed by #25254
Closed

Timestamp and DatetimeIndex construct different timezones #23815

TomAugspurger opened this issue Nov 20, 2018 · 9 comments · Fixed by #25254
Labels
Datetime Datetime data dtype Docs Timezones Timezone data dtype
Milestone

Comments

@TomAugspurger
Copy link
Contributor

Question in the form of an issue: is this a bug?

In [37]: pd.Timestamp('2000', tz='US/Central').tz == pd.DatetimeIndex(['2000'], tz='US/Central').tz
Out[37]: False

This leads to strange things like

In [38]: idx = pd.date_range('20170101', periods=4, tz='US/Pacific')

In [39]: idx[0].tz == idx.tz
Out[39]: False
@TomAugspurger TomAugspurger added Datetime Datetime data dtype Timezones Timezone data dtype labels Nov 20, 2018
@TomAugspurger
Copy link
Contributor Author

DatetimeIndex (via, DatetimeArrayMixin) goes

        tz = timezones.maybe_get_tz(tz)
        result._tz = timezones.tz_standardize(tz)

Timestmamp.__new__ seems to use a tzinfo object.

So it seems like I should be using _libs.tslibs.timezones.tz_compare, rather than ==.

@jbrockmendel
Copy link
Member

Can you show the tzinfo objects? Tzinfo equality comparisons have weird semantics.

@TomAugspurger
Copy link
Contributor Author

In [2]: a = pd.Timestamp('2000', tz='US/Central')

In [3]: b = pd.DatetimeIndex(['2000'], tz='US/Central')

In [4]: a.tz
Out[4]: <DstTzInfo 'US/Central' CST-1 day, 18:00:00 STD>

In [5]: b.tz
Out[5]: <DstTzInfo 'US/Central' LMT-1 day, 18:09:00 STD>

Just to be clear, this came from a base ExtensionArray test. If this comes down to "timezones are weird" I'm happy to note that in the docs and skip the test.

@mroeschke
Copy link
Member

These are expected to be unequal due to pytz semantics.

For a pytz timezone with a Timestamp, we are only dealing with one date, so we can confidently know which side of DST border the date lies and can use the DST informed tzinfo instance (CST-1 day).

For a pytz timezone with a DatetimeIndex, we are potentially dealing with 1+ dates where a mix of dates can lie on either side of the DST border. Therefore we cannot easily use one pytz timezone instance to represent all these dates and just default to the "first" pytz instance for a timezone (LMT-1 day)

@TomAugspurger
Copy link
Contributor Author

Fun... I'll try to turn this into a new docs section in http://pandas-docs.github.io/pandas-docs-travis/timeseries.html#working-with-time-zones

Should we add tz_compare to our public API, as the recommended way to compare to tz objects?

@mroeschke
Copy link
Member

mroeschke commented Nov 20, 2018

Yeah clarifying this pytz idiosyncrasy in the docs would be a nice addition.

@jbrockmendel is the spec for tz_compare to see if two timezone objects represent the same timezone or to check if two timezone instances are the same?

@jbrockmendel
Copy link
Member

To see if they represent the same timezone.

@mroeschke
Copy link
Member

Gotcha. My impression was that tz_compare was a utility for the internals, but I guess if users may find it useful then we can expose it as a toplevel "pd.tz_compare". Probably would need a lot more tests though to nail down the spec.

@jorisvandenbossche
Copy link
Member

See also #17572 and #18595 where this has been discussed before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Docs Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants