Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve testing of optimizers using EXPLAIN #1118

Closed
matthewmturner opened this issue Oct 14, 2021 · 6 comments
Closed

Improve testing of optimizers using EXPLAIN #1118

matthewmturner opened this issue Oct 14, 2021 · 6 comments

Comments

@matthewmturner
Copy link
Contributor

It would be very reassuring if these optimizations came with a more systematic testing routine. I agree that unit tests in this precise scenario are not very powerful. Currently we have a test here: https://github.com/apache/arrow-datafusion/blob/558d8ecfe2d1b0737de1e2548c5fbedc0ea17ada/datafusion/tests/sql.rs#L3020

that ensures that the count optimizer rule kicks in properly with EXPLAIN [ANALYZE]. What do you think about creating a separate integration test file called optimizers.rs that runs SQL queries and checks if they were optimized as expected?

Originally posted by @rdettai in #1063 (comment)

@matthewmturner
Copy link
Contributor Author

@rdettai @Dandandan @alamb
I am going to start working on this and will use a similar approach to the referenced test.
Let me know if any thoughts :)

@alamb
Copy link
Contributor

alamb commented Oct 15, 2021

Sounds great -- thank you @matthewmturner !

Here is another idea with additional thoughts on test cleanup if you are feeling ambitious and want to move code around: #743

In general, I think the idea of consolidating SQL optimizer "end to end" tests that use explain into datafusion/tests/optimizers.rs sounds like a great idea.

If you are inspired to do this, I suggest looking through the codebase first and consolidating existing tests into optimizers.rs.

For example, there are a bunch of "explain" tests here:

https://github.com/apache/arrow-datafusion/blob/d331fa2b87b0723eca486b1951a3f734ef6276a3/datafusion/src/optimizer/filter_push_down.rs#L699-L725

I bet you could rewrite them to actually use EXPLAIN and assert_batches_eq! and they would look a lot better.

@matthewmturner
Copy link
Contributor Author

matthewmturner commented Oct 15, 2021

@alamb happy to do this. i'll work on consolidating into datafusion/tests/optimizer.rs first.

One quick question.

By using assert_batches_eq! isnt that comparing the final record batch output? And couldnt you have that same output regardless of whether an optimization was used? Or was the idea to just add that as an additional assertion?

UPDATE
I've had the chance to review more of the provided references (thanks for that). I'm thinking it may be better if I work on #743 first to get the right structure in place (and get myself familiar with the testing approach) and then add optimizers.rs to the tests.

Let me know if any thoughts. Thanks!

@alamb
Copy link
Contributor

alamb commented Oct 15, 2021

By using assert_batches_eq! isnt that comparing the final record batch output? And couldnt you have that same output regardless of whether an optimization was used? Or was the idea to just add that as an additional assertion?

I was thinking of using assesrt_batches_eq! on the output of an EXPLAIN ... query (it is more convenient for test updates, I have found, than the tests in filter pushdown that use a multi-line literal string)

@matthewmturner
Copy link
Contributor Author

I was thinking of using assesrt_batches_eq! on the output of an EXPLAIN ... query (it is more convenient for test updates, I have found, than the tests in filter pushdown that use a multi-line literal string)

Ok makes sense now after i played around with it! For whatever reason I had thought the output of EXPLAIN [ANALYZE] wasnt a record batch - but in hindsight that doesnt make sense.

Thanks!

@alamb
Copy link
Contributor

alamb commented Oct 20, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants