Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date to Timestamp cast #5140

Merged
merged 3 commits into from
Feb 2, 2023
Merged

Date to Timestamp cast #5140

merged 3 commits into from
Feb 2, 2023

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Feb 1, 2023

Which issue does this PR close?

Closes #4761
Closes #4644 .

Rationale for this change

Introduce casting between dates and timestamps datatypes. Currently DF just panics on such kind of queries

What changes are included in this PR?

This PR includes planner changes to support date to timestamp cast.

Are these changes tested?

Yes. Tests enclosed

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules labels Feb 1, 2023
@comphead
Copy link
Contributor Author

comphead commented Feb 1, 2023

@liukun4515 @crepererum @alamb please check this one.

The downside for now is additional cast created sometimes for timestamp, to convert Timestamp(_, Some("00:00")) to Timestamp(_, None). This can be solved in separate PR if you dont mind as it might again require changes in arrow-rs

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thank you @comphead

cc @waitingkuo (we are getting quite sophisticated in DataFusion for dates/times)

@@ -1742,5 +1741,55 @@ async fn test_ts_dt_binary_ops() -> Result<()> {

assert_batches_eq!(expected, &results);

//test cast path timestamp date using literals
let sql = "select '2000-01-01'::timestamp >= '2000-01-01'::date";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to add these tests to sqllogictest if you wanted -- https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/test_files/timestamps.slt

Not required but I find the sqllogictest framework much easier to work with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb I will create a separate ticket to move all timestamp tests from timestamp.rs to sqllogictest, all that can be moved of course

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much -- that would be great

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #5165

Timestamp(TimeUnit::Nanosecond, None) => {
matches!(type_from, Null | Timestamp(_, None))
Timestamp(TimeUnit::Nanosecond, _) => {
matches!(type_from, Null | Timestamp(_, _) | Date32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 (I have a draft PR in the works to support coercing from utf8 as well #5117)

break;
}
}
assert_eq!(res, Some("Projection: CAST(now() AS Timestamp(Nanosecond, None)) = CAST(currentdate() AS Timestamp(Nanosecond, None))\n EmptyRelation".to_string()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Feb 1, 2023

cc @crepererum and @liukun4515 who filed the tickets this closes

@alamb
Copy link
Contributor

alamb commented Feb 1, 2023

The downside for now is additional cast created sometimes for timestamp, to convert Timestamp(, Some("00:00")) to Timestamp(, None). This can be solved in separate PR if you dont mind as it might again require changes in arrow-rs

I think this is correct (and needed) if the timestamps in question have different timezones. It should not really effect runtime performance as I expect both to be evaluated at plan time (and we can add a constant folding rule if it is not)

@alamb
Copy link
Contributor

alamb commented Feb 1, 2023

I plan to merge this PR tomorrow unless there are other comments

Copy link
Contributor

@waitingkuo waitingkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @comphead
left the comments bellow

I think we could consider adding this coercion as well as it works in postgrseql (perhaps in another tickets)
(Utf8, Date)

postgrseql

willy=# select '2000-01-01T00:00:00' = '2000-01-01'::date;
 ?column? 
----------
 t
(1 row)

datafusion

select '2000-01-01T00:00:00' = '2000-01-01'::date;
ArrowError(CastError("Cannot cast string '2000-01-01T00:00:00' to value of Date32 type"))

Comment on lines +559 to +561
(Timestamp(_, tz), Utf8) | (Utf8, Timestamp(_, tz)) => {
Some(Timestamp(TimeUnit::Nanosecond, tz.clone()))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried this, but it didn't work

select '2000-01-01T00:00:00'::timestamp::timestamptz = '2000-01-01T00:00:00';
Internal("The type of Timestamp(Nanosecond, Some(\"+00:00\")) Eq Utf8 of binary physical should be same")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need another cast to add Utf <-> Timestamp #5117 probably addresses this cast. If not I'll create another ticket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after #5117 this still fails. I will file a follow on ticket

❯ select '2000-01-01T00:00:00'::timestamp::timestamptz = '2000-01-01T00:00:00';
Internal("The type of Timestamp(Nanosecond, Some("+00:00")) Eq Utf8 of binary physical should be same")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #5164

Comment on lines +562 to +567
(Timestamp(_, None), Date32) | (Date32, Timestamp(_, None)) => {
Some(Timestamp(TimeUnit::Nanosecond, None))
}
(Timestamp(_, _tz), Date32) | (Date32, Timestamp(_, _tz)) => {
Some(Timestamp(TimeUnit::Nanosecond, None))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that we convert Timestamptz to Timestamp and then compare?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats correct, we lose TZ for now, we need some extra efforts to make arrow-rs work with TimestampTZ,planning to do so just after this PR merged.

@alamb
Copy link
Contributor

alamb commented Feb 2, 2023

I think we could consider adding this coercion as well as it works in postgrseql (perhaps in another tickets)
(Utf8, Date)

I agree -- I can do so.

@alamb
Copy link
Contributor

alamb commented Feb 2, 2023

We need another cast to add Utf <-> Timestamp #5117 probably addresses this cast. If not I'll create another ticket

I am going to merge this PR and test using #5117 and if it doesn't work I'll file another ticket

@alamb alamb merged commit e41c4df into apache:master Feb 2, 2023
@ursabot
Copy link

ursabot commented Feb 2, 2023

Benchmark runs are scheduled for baseline = f0c0958 and contender = e41c4df. e41c4df is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor

alamb commented Feb 2, 2023

Filed #5164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
4 participants