Fix explain plan formatting in sqllogictest #6329

alamb · 2023-05-10T16:38:03Z

Which issue does this PR close?

Closes #6328 (noticed by @mustafasrepo on #6234)

Rationale for this change

For some reason, cargo test --test sqllogictests -- --complete when completing tests strips the leading whitespace sometimes. Then the comparison also ignores leading whitespace

So sometimes the explain plans captured by --compete would take a plan like:

Projection 
  Sort
    Window
     Sort

And display it like:

Projection 
  Sort
    Window
   Sort

What changes are included in this PR?

~Update the normalization code in the DataFusion sqlogictest to prefix explain plan lines with | ~
Update the normalization code in the DataFusion sqlogictest to replace leading spaces with -
Recapture the explain plans cargo test --test sqllogictests -- --complete

Are these changes tested?

Yes

Are there any user-facing changes?

no

alamb · 2023-05-10T16:39:34Z

datafusion/core/tests/sqllogictests/test_files/window.slt

 physical_plan
-ProjectionExec: expr=[c9@0 as c9, FIRST_VALUE(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@4 as fv1, FIRST_VALUE(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as fv2, LAG(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@5 as lag1, LAG(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 10 PRECEDING AND 1 FOLLOWING@2 as lag2, LEAD(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@6 as lead1, LEAD(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 10 PRECEDING AND 1 FOLLOWING@3 as lead2]


Here is an example I think that shows the indent being properly displayed

mustafasrepo · 2023-05-11T06:53:49Z

I have reviewed the PR. This PR is LGTM!. Now the final plan is more clear, especially when the query involves executors with multiple children (such as union, join, etc.). Previously there were test cases, an executor which has seemingly multiple children (because of indentation), whereas it has indeed single children. This PR fixes these kind of confusions. @alamb thanks for this improvement.

mustafasrepo · 2023-05-11T15:26:07Z

During experimenting, I have tried to mess indentation and run the tests. Tests still pass. For instance both

|ProjectionExec: expr=[c9@0 as c9, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@2 as sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as sum2]
|  GlobalLimitExec: skip=0, fetch=5
|    BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: "SUM(aggregate_test_100.c9)", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(5)), end_bound: Following(UInt64(1)) }], mode=[Sorted]
|      BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: "SUM(aggregate_test_100.c9)", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(1)), end_bound: Following(UInt64(5)) }], mode=[Sorted]
|        SortExec: expr=[c9@0 DESC]
|          CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c9], has_header=true

and

|ProjectionExec: expr=[c9@0 as c9, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@2 as sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as sum2]
| GlobalLimitExec: skip=0, fetch=5
|  BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: "SUM(aggregate_test_100.c9)", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(5)), end_bound: Following(UInt64(1)) }], mode=[Sorted]
|      BoundedWindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: "SUM(aggregate_test_100.c9)", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(1)), end_bound: Following(UInt64(5)) }], mode=[Sorted]
|     SortExec: expr=[c9@0 DESC]
|          CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c9], has_header=true

passes from the test. I guess still, we ignore white spaces at the beginning. If one pastes the result produced at the failing tests, their formatting is right. Hence if a person copy pastes actual result of the test, we will not see this behaviour. However, in the long run, if we can enforce the indentation level for the tests, it would be great. It is not necessarily this PR's concern. We can do so as a follow up Pr.

alamb · 2023-05-12T15:57:59Z

During experimenting, I have tried to mess indentation and run the tests. Tests still pass. For instance both

Hmm, this seems pretty bad as if the plan changes the tests won't fail. Let me see if I can find something that looks better

alamb · 2023-05-12T16:27:15Z

@mustafasrepo I updated the code to replace leading spaces with - rather than prefixing with |

Now when I mess up the indent (remove a - the tests fail as I expect.

What do you think?

mustafasrepo · 2023-05-12T21:54:51Z

@mustafasrepo I updated the code to replace leading spaces with - rather than prefixing with |

Now when I mess up the indent (remove a - the tests fail as I expect.

What do you think?

This is great!. Now we can make sure that indentation level is exact. Thanks for this fix.

comphead

Not related for this PR but rather than for sqllogic improvement.
Sqllogic tests doesn't cover cases if the column name is not expected.
The same relates to datatype, but this can be solved by extra arrow_typeof

alamb · 2023-05-14T11:08:05Z

Sqllogic tests doesn't cover cases if the column name is not expected.

Yes, I agree this is not currently covered by the sqllogictest -- I'll file a ticket to try to work on this (#6349)

The same relates to datatype, but this can be solved by extra arrow_typeof

SQLlogictests does verify the (sql) types of the output (or example, the II in the following means the columns should be integers). It is not as fine grained as the arrow type system, but it is better than nothing

query II
SELECT * FROM foo_schema.bar;
----
1 2

Fix explain plan formatting in sqllogictest

29fcac4

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 10, 2023

alamb commented May 10, 2023

View reviewed changes

alamb mentioned this pull request May 10, 2023

sqllogictest sometimes reindents plans #6328

Closed

alamb requested a review from mustafasrepo May 10, 2023 17:49

alamb mentioned this pull request May 10, 2023

Port remainder of window.rs to sqllogictest #6234

Merged

mustafasrepo approved these changes May 11, 2023

View reviewed changes

Merge remote-tracking branch 'apache/main' into alamb/fix_explain_plans

64f1460

alamb added 2 commits May 12, 2023 11:15

Use - to ensure plan indent is maintained

706cc0f

Refine

66da7a4

comphead reviewed May 13, 2023

View reviewed changes

Merge remote-tracking branch 'apache/main' into alamb/fix_explain_plans

336740a

alamb mentioned this pull request May 14, 2023

Sqllogictests doesn't cover cases if the column name is not expected. #6349

Open

Update tets

a37d920

alamb merged commit 62621ee into apache:main May 15, 2023

alamb deleted the alamb/fix_explain_plans branch May 15, 2023 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix explain plan formatting in sqllogictest #6329

Fix explain plan formatting in sqllogictest #6329

alamb commented May 10, 2023 •

edited

Loading

alamb May 10, 2023

mustafasrepo commented May 11, 2023 •

edited

Loading

mustafasrepo commented May 11, 2023

alamb commented May 12, 2023

alamb commented May 12, 2023 •

edited

Loading

mustafasrepo commented May 12, 2023

comphead left a comment

alamb commented May 14, 2023 •

edited

Loading

		physical_plan
		ProjectionExec: expr=[c9@0 as c9, FIRST_VALUE(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@4 as fv1, FIRST_VALUE(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 1 PRECEDING AND 5 FOLLOWING@1 as fv2, LAG(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@5 as lag1, LAG(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 10 PRECEDING AND 1 FOLLOWING@2 as lag2, LEAD(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@6 as lead1, LEAD(aggregate_test_100.c9,Int64(2),Int64(10101)) ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] ROWS BETWEEN 10 PRECEDING AND 1 FOLLOWING@3 as lead2]

Fix explain plan formatting in sqllogictest #6329

Fix explain plan formatting in sqllogictest #6329

Conversation

alamb commented May 10, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb May 10, 2023

Choose a reason for hiding this comment

mustafasrepo commented May 11, 2023 • edited Loading

mustafasrepo commented May 11, 2023

alamb commented May 12, 2023

alamb commented May 12, 2023 • edited Loading

mustafasrepo commented May 12, 2023

comphead left a comment

Choose a reason for hiding this comment

alamb commented May 14, 2023 • edited Loading

alamb commented May 10, 2023 •

edited

Loading

mustafasrepo commented May 11, 2023 •

edited

Loading

alamb commented May 12, 2023 •

edited

Loading

alamb commented May 14, 2023 •

edited

Loading