Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix explain plan formatting in sqllogictest #6329

Merged
merged 6 commits into from
May 15, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 10, 2023

Which issue does this PR close?

Closes #6328 (noticed by @mustafasrepo on #6234)

Rationale for this change

For some reason, cargo test --test sqllogictests -- --complete when completing tests strips the leading whitespace sometimes. Then the comparison also ignores leading whitespace

So sometimes the explain plans captured by --compete would take a plan like:

Projection 
  Sort
    Window
     Sort

And display it like:

Projection 
  Sort
    Window
   Sort

What changes are included in this PR?

  1. ~Update the normalization code in the DataFusion sqlogictest to prefix explain plan lines with | ~
  2. Update the normalization code in the DataFusion sqlogictest to replace leading spaces with -
  3. Recapture the explain plans cargo test --test sqllogictests -- --complete

Are these changes tested?

Yes

Are there any user-facing changes?

no

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 10, 2023
physical_plan
ProjectionExec: expr=[c9@0 as c9, FIRST_VALUE(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@4 as fv1, FIRST_VALUE(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as fv2, LAG(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@5 as lag1, LAG(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 10 PRECEDING AND 1 FOLLOWING@2 as lag2, LEAD(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@6 as lead1, LEAD(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 10 PRECEDING AND 1 FOLLOWING@3 as lead2]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example I think that shows the indent being properly displayed

@mustafasrepo
Copy link
Contributor

mustafasrepo commented May 11, 2023

I have reviewed the PR. This PR is LGTM!. Now the final plan is more clear, especially when the query involves executors with multiple children (such as union, join, etc.). Previously there were test cases, an executor which has seemingly multiple children (because of indentation), whereas it has indeed single children. This PR fixes these kind of confusions. @alamb thanks for this improvement.

@mustafasrepo
Copy link
Contributor

During experimenting, I have tried to mess indentation and run the tests. Tests still pass. For instance both

|ProjectionExec: expr=[c9@0 as c9, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@2 as sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as sum2]
|  GlobalLimitExec: skip=0, fetch=5
|    BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: "SUM(aggregate_test_100.c9)", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(5)), end_bound: Following(UInt64(1)) }], mode=[Sorted]
|      BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: "SUM(aggregate_test_100.c9)", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(1)), end_bound: Following(UInt64(5)) }], mode=[Sorted]
|        SortExec: expr=[c9@0 DESC]
|          CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c9], has_header=true

and

|ProjectionExec: expr=[c9@0 as c9, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@2 as sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as sum2]
| GlobalLimitExec: skip=0, fetch=5
|  BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: "SUM(aggregate_test_100.c9)", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(5)), end_bound: Following(UInt64(1)) }], mode=[Sorted]
|      BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: "SUM(aggregate_test_100.c9)", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(1)), end_bound: Following(UInt64(5)) }], mode=[Sorted]
|     SortExec: expr=[c9@0 DESC]
|          CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c9], has_header=true

passes from the test. I guess still, we ignore white spaces at the beginning. If one pastes the result produced at the failing tests, their formatting is right. Hence if a person copy pastes actual result of the test, we will not see this behaviour. However, in the long run, if we can enforce the indentation level for the tests, it would be great. It is not necessarily this PR's concern. We can do so as a follow up Pr.

@alamb
Copy link
Contributor Author

alamb commented May 12, 2023

During experimenting, I have tried to mess indentation and run the tests. Tests still pass. For instance both

Hmm, this seems pretty bad as if the plan changes the tests won't fail. Let me see if I can find something that looks better

@alamb
Copy link
Contributor Author

alamb commented May 12, 2023

@mustafasrepo I updated the code to replace leading spaces with - rather than prefixing with |

Now when I mess up the indent (remove a - the tests fail as I expect.

What do you think?

@mustafasrepo
Copy link
Contributor

@mustafasrepo I updated the code to replace leading spaces with - rather than prefixing with |

Now when I mess up the indent (remove a - the tests fail as I expect.

What do you think?

This is great!. Now we can make sure that indentation level is exact. Thanks for this fix.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related for this PR but rather than for sqllogic improvement.
Sqllogic tests doesn't cover cases if the column name is not expected.
The same relates to datatype, but this can be solved by extra arrow_typeof

@alamb
Copy link
Contributor Author

alamb commented May 14, 2023

Sqllogic tests doesn't cover cases if the column name is not expected.

Yes, I agree this is not currently covered by the sqllogictest -- I'll file a ticket to try to work on this (#6349)

The same relates to datatype, but this can be solved by extra arrow_typeof

SQLlogictests does verify the (sql) types of the output (or example, the II in the following means the columns should be integers). It is not as fine grained as the arrow type system, but it is better than nothing

query II
SELECT * FROM foo_schema.bar;
----
1 2

@alamb alamb merged commit 62621ee into apache:main May 15, 2023
@alamb alamb deleted the alamb/fix_explain_plans branch May 15, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sqllogictest sometimes reindents plans
3 participants