Improve testing of optimizers using EXPLAIN #1118

matthewmturner · 2021-10-14T23:07:29Z

It would be very reassuring if these optimizations came with a more systematic testing routine. I agree that unit tests in this precise scenario are not very powerful. Currently we have a test here: https://github.com/apache/arrow-datafusion/blob/558d8ecfe2d1b0737de1e2548c5fbedc0ea17ada/datafusion/tests/sql.rs#L3020

that ensures that the count optimizer rule kicks in properly with EXPLAIN [ANALYZE]. What do you think about creating a separate integration test file called optimizers.rs that runs SQL queries and checks if they were optimized as expected?

Originally posted by @rdettai in #1063 (comment)

The text was updated successfully, but these errors were encountered:

matthewmturner · 2021-10-14T23:09:06Z

@rdettai @Dandandan @alamb
I am going to start working on this and will use a similar approach to the referenced test.
Let me know if any thoughts :)

alamb · 2021-10-15T10:22:21Z

Sounds great -- thank you @matthewmturner !

Here is another idea with additional thoughts on test cleanup if you are feeling ambitious and want to move code around: #743

In general, I think the idea of consolidating SQL optimizer "end to end" tests that use explain into datafusion/tests/optimizers.rs sounds like a great idea.

If you are inspired to do this, I suggest looking through the codebase first and consolidating existing tests into optimizers.rs.

For example, there are a bunch of "explain" tests here:

https://github.com/apache/arrow-datafusion/blob/d331fa2b87b0723eca486b1951a3f734ef6276a3/datafusion/src/optimizer/filter_push_down.rs#L699-L725

I bet you could rewrite them to actually use EXPLAIN and assert_batches_eq! and they would look a lot better.

matthewmturner · 2021-10-15T14:08:16Z

@alamb happy to do this. i'll work on consolidating into datafusion/tests/optimizer.rs first.

One quick question.

By using assert_batches_eq! isnt that comparing the final record batch output? And couldnt you have that same output regardless of whether an optimization was used? Or was the idea to just add that as an additional assertion?

UPDATE
I've had the chance to review more of the provided references (thanks for that). I'm thinking it may be better if I work on #743 first to get the right structure in place (and get myself familiar with the testing approach) and then add optimizers.rs to the tests.

Let me know if any thoughts. Thanks!

alamb · 2021-10-15T18:42:44Z

By using assert_batches_eq! isnt that comparing the final record batch output? And couldnt you have that same output regardless of whether an optimization was used? Or was the idea to just add that as an additional assertion?

I was thinking of using assesrt_batches_eq! on the output of an EXPLAIN ... query (it is more convenient for test updates, I have found, than the tests in filter pushdown that use a multi-line literal string)

matthewmturner · 2021-10-15T20:11:40Z

I was thinking of using assesrt_batches_eq! on the output of an EXPLAIN ... query (it is more convenient for test updates, I have found, than the tests in filter pushdown that use a multi-line literal string)

Ok makes sense now after i played around with it! For whatever reason I had thought the output of EXPLAIN [ANALYZE] wasnt a record batch - but in hindsight that doesnt make sense.

Thanks!

alamb · 2022-10-20T12:59:04Z

I believe this is now in place via https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/tests/integration-test.rs

matthewmturner mentioned this issue Oct 15, 2021

Ensure column names are equivalent with or without optimization #1123

Closed

alamb closed this as completed Oct 20, 2022

alamb mentioned this issue Jul 2, 2024

[Epic] Complete pulling out special SQL planning from the Sql Parser #11207

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve testing of optimizers using EXPLAIN #1118

Improve testing of optimizers using EXPLAIN #1118

matthewmturner commented Oct 14, 2021

matthewmturner commented Oct 14, 2021

alamb commented Oct 15, 2021

matthewmturner commented Oct 15, 2021 •

edited

Loading

alamb commented Oct 15, 2021

matthewmturner commented Oct 15, 2021

alamb commented Oct 20, 2022

Improve testing of optimizers using EXPLAIN #1118

Improve testing of optimizers using EXPLAIN #1118

Comments

matthewmturner commented Oct 14, 2021

matthewmturner commented Oct 14, 2021

alamb commented Oct 15, 2021

matthewmturner commented Oct 15, 2021 • edited Loading

alamb commented Oct 15, 2021

matthewmturner commented Oct 15, 2021

alamb commented Oct 20, 2022

matthewmturner commented Oct 15, 2021 •

edited

Loading