Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: convert fixed-offset timezones to respective Etc timezone from time zone database #13738

Merged
merged 2 commits into from
Jan 16, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Jan 15, 2024

closes #12893

demo with the example reported in the issue:

tz = dt.timezone(dt.timedelta(hours=10))
df = pd.DataFrame.from_records(
    data=[(dt.datetime(2024, 1, 1, tzinfo=tz),)], columns=("naughty_date",)
)
print(df)
df.to_parquet("naughty.parquet")
print(pl.read_parquet("naughty.parquet"))

tz = dt.timezone(dt.timedelta(hours=-10))
df = pd.DataFrame.from_records(
    data=[(dt.datetime(2024, 1, 1, tzinfo=tz),)], columns=("naughty_date",)
)
print(df)
df.to_parquet("naughty.parquet")
print(pl.read_parquet("naughty.parquet"))
               naughty_date
0 2024-01-01 00:00:00+10:00
shape: (1, 1)
┌──────────────────────────┐
│ naughty_date             │
│ ---                      │
│ datetime[ns, Etc/GMT-10] │
╞══════════════════════════╡
│ 2024-01-01 00:00:00 +10  │
└──────────────────────────┘
               naughty_date
0 2024-01-01 00:00:00-10:00
shape: (1, 1)
┌──────────────────────────┐
│ naughty_date             │
│ ---                      │
│ datetime[ns, Etc/GMT+10] │
╞══════════════════════════╡
│ 2024-01-01 00:00:00 -10  │
└──────────────────────────┘

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jan 15, 2024
@@ -26,6 +30,18 @@ pub fn unix_time() -> NaiveDateTime {
NaiveDateTime::from_timestamp_opt(0, 0).unwrap()
}

#[cfg(feature = "timezones")]
const FIXED_OFFSET_PATTERN: &str = r#"(?x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this one static. Doesn't have to be inlined everywhere we use it I would say. (or aren't we going to use it more often?)

@stinodego
Copy link
Contributor

stinodego commented Jan 15, 2024

Does/should this also work for parsing regular time zone inputs, e.g. Datetime(time_zone="+10:00")? And csv files? Just wondering.

@MarcoGorelli
Copy link
Collaborator Author

Does/should this also work for parsing regular time zone inputs, e.g. Datetime(time_zone="+10:00")? And csv files? Just wondering.

For Datetime(time=...) it doesn't at the moment, no - not sure that these should be encouraged too much to be honest.
OK for I/O where the user might not have control over the dtype, but if they're instantiating a Datetime object then they can directly set the equivalent timezone-database time zone

For csv files, they don't store dtypes anyway, so it's all good (and offset-aware inputs are converted to UTC, as they are in to_datetime, and as pyarrow also does)

@stinodego
Copy link
Contributor

Right, read_csv can already read strings like 2024-02-28T00:00:00.000000+0800. Great - merging then!

@stinodego stinodego merged commit d2e98ac into pola-rs:main Jan 16, 2024
22 checks passed
r-brink pushed a commit to r-brink/polars that referenced this pull request Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to read parquet with a column containing a fixed tz offset
3 participants