Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timestamps are interpreted as having a timezone when reading a file written by deltalake #38

Closed
cpcloud opened this issue Jun 27, 2024 · 2 comments · Fixed by #68
Closed

Comments

@cpcloud
Copy link

cpcloud commented Jun 27, 2024

Here's a reproducible example showing that writing timestamp without a timezone comes back as timestamp with a timezone when using delta_scan.

Not sure whether this is expected behavior!

import datetime

import duckdb

import deltalake as dl
import pyarrow as pa


t = pa.Table.from_pydict({"ts": [datetime.datetime(2024, 6, 27, 7, 13, 13)]})
# timezone should be None
assert t["ts"].type.tz is None

# write the deltalake file
dl.write_deltalake("test.delta", t)
ddb = duckdb.connect()

ddb.install_extension("delta")
ddb.load_extension("delta")

res = ddb.sql("DESCRIBE FROM delta_scan('test.delta')")
[(coltype,)] = res.column_type.fetchall()

# should be timestamp, but comes back as TIMESTAMP WITH TIME ZONE
assert coltype == "TIMESTAMP"
@cpcloud
Copy link
Author

cpcloud commented Jul 13, 2024

@samansmink Wanted to make sure this was on your radar! Any thoughts on whether this is a bug or not?

@samansmink
Copy link
Collaborator

@cpcloud thanks for reporting! That looks like a bug indeed. Roundtripping through parquet does result in a timestamp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants