Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parquet): support read i96 timestamp from parquet file #6668

Merged
merged 8 commits into from
Jul 19, 2022

Conversation

sundy-li
Copy link
Member

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

It's compatible change:

create table t( a timestamp(0), b timestamp(3), d timestamp(6));
insert into t select now(), now(), now() from numbers(10);

create table t2( a timestamp(0), b timestamp(3), d timestamp(6));
insert into t2 select now(), now(), now() from numbers(10);

databend-query with Old/New version can read table t and t2.

Fixes #6627

@vercel
Copy link

vercel bot commented Jul 18, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Jul 19, 2022 at 8:17AM (UTC)

@sundy-li sundy-li requested review from dantengsky and b41sh July 18, 2022 03:59
@mergify mergify bot added the pr-bugfix this PR patches a bug in codebase label Jul 18, 2022
common/datavalues/src/types/type_timestamp.rs Outdated Show resolved Hide resolved
common/datavalues/src/types/data_type.rs Outdated Show resolved Hide resolved
@dantengsky
Copy link
Member

  • confirm that data(with timestamp column) generated by the old version, could be read by this PR

test_stateless_standalone_linux failed.

test_stateless_standalone_linux

03_0023_insert_into_array: [ FAIL ] - result differs with:
--- /workspace/tests/suites/0_stateless/03_dml/03_0023_insert_into_array.result 2022-07-18 04:15:39.394586214 +0000
+++ /workspace/tests/suites/0_stateless/03_dml/03_0023_insert_into_array.stdout 2022-07-18 04:19:11.488000857 +0000
@@ -1,3 +1,5 @@
+ERROR 1105 (HY000) at line 118: Code: 1104, displayText = called Result::unwrap() on an Err value: OutOfSpec("ListArray's child's DataType must match. However, the expected DataType is Timestamp(Microsecond, None) while it got Int64.").
+ERROR 1105 (HY000) at line 119: Code: 1054, displayText = Expected server error code: 1010 but got: 1002.
==Array(UInt8)==
1 [1, 2, 3]
2 [254, 255]
@@ -63,10 +65,6 @@
2021-01-01 2022-01-01
1990-12-01 2030-01-12
==Array(Timestamp)==
-1 ['2021-01-01 01:01:01.000000', '2022-01-01 01:01:01.000000']
-2 ['1990-12-01 10:11:12.000000', '2030-01-12 22:00:00.000000']
-2021-01-01 01:01:01.000000 2022-01-01 01:01:01.000000
-1990-12-01 10:11:12.000000 2030-01-12 22:00:00.000000
==Array(String)==
1 ['aa', 'bb']
2 ['cc', 'dd']

03_0023_insert_into_array_v2: [ FAIL ] - result differs with:
--- /workspace/tests/suites/0_stateless/03_dml/03_0023_insert_into_array_v2.result 2022-07-18 04:15:39.394586214 +0000
+++ /workspace/tests/suites/0_stateless/03_dml/03_0023_insert_into_array_v2.stdout 2022-07-18 04:19:15.588107257 +0000
@@ -1,3 +1,5 @@
+ERROR 1105 (HY000) at line 118: Code: 1104, displayText = called Result::unwrap() on an Err value: OutOfSpec("ListArray's child's DataType must match. However, the expected DataType is Timestamp(Microsecond, None) while it got Int64.").
+ERROR 1105 (HY000) at line 119: Code: 1054, displayText = Expected server error code: 1010 but got: 1002.
==Array(UInt8)==
1 [1, 2, 3]
2 [254, 255]
@@ -63,10 +65,6 @@
2021-01-01 2022-01-01
1990-12-01 2030-01-12
==Array(Timestamp)==
-1 ['2021-01-01 01:01:01.000000', '2022-01-01 01:01:01.000000']
-2 ['1990-12-01 10:11:12.000000', '2030-01-12 22:00:00.000000']
-2021-01-01 01:01:01.000000 2022-01-01 01:01:01.000000
-1990-12-01 10:11:12.000000 2030-01-12 22:00:00.000000
==Array(String)==
1 ['aa', 'bb']
2 ['cc', 'dd']

@BohuTANG BohuTANG requested a review from b41sh July 19, 2022 05:32
@BohuTANG
Copy link
Member

  • confirm that data(with timestamp column) generated by the old version, could be read by this PR

I think this issue needs to be a high priority #6557
Now it's a difficult for us to find the query layer compatibility issue, like this PR we need a strong test for it.

@dantengsky
Copy link
Member

I think this issue needs to be a high priority #6557 Now it's a difficult for us to find the query layer compatibility issue, like this PR we need a strong test for it.

Agree. #6557 will be addressed just after #6639 closed

Copy link
Member

@dantengsky dantengsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@dantengsky dantengsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@b41sh
Copy link
Member

b41sh commented Jul 19, 2022

LGTM

@mergify mergify bot merged commit 4956b5f into databendlabs:main Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: load parquet data to databend failed if had timestamp columns
4 participants