Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42791][SQL] Create a new golden file test framework for analysis #40449

Closed
wants to merge 18 commits into from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Mar 15, 2023

What changes were proposed in this pull request?

This PR updates the SQLQueryTestSuite to also consume the same input SQL queries from the input files in second pass and then perform analysis and generate the string representation of the analyzed plans, in a separate new directory. It works much the same way as the previous SQLQueryTestSuite behavior except that it now also produces analyzed plan string instead of just query results in the output golden files.

Why are the changes needed?

This framework will help us guard against bugs in future development by showing a clear signal when analyzer updates result in changes to the query plans for a body of test queries. PR authors and reviewers will be able to see the diffs for all changed plans during the review, and any unintentional plan changes will act as a signal for PR authors to adjust their code to prevent the changes from happening.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This test adds a new test suite and initial golden file. The new test suite passes as of this PR.

commit

commit

commit

commit
@github-actions github-actions bot added the SQL label Mar 15, 2023
@dtenedor
Copy link
Contributor Author

Hi @gengliangwang this should be ready for a first look!

@gengliangwang
Copy link
Member

@dtenedor Since we already have SQLQueryTestSuite which has good basic Spark SQL features coverage, shall we combine both? E.g. let SQLQueryTestSuite show analyzed plan/optimized plan/execution results for comparison, and port more tests from zetasql
What is the reason for having a new analysis test only?

@dtenedor
Copy link
Contributor Author

@gengliangwang Sure, I was thinking about this too. We can reuse the same input SQL query files if we want, and just generate and test against different analyzer test output files. Let me update the PR to do that and I can ping the thread again.

@dtenedor
Copy link
Contributor Author

@gengliangwang from past experience we will want to keep the query plans separate from the SQL results, otherwise the SQL results become hard to read. I will put the analyzer results in separate files.

@gengliangwang
Copy link
Member

I will put the analyzer results in separate files.

Sounds great! Thanks for the work!

@dtenedor
Copy link
Contributor Author

@gengliangwang alright I made this change, please look again when you are ready.

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gengliangwang for your review!

@dtenedor dtenedor requested a review from gengliangwang March 16, 2023 21:42
@gengliangwang
Copy link
Member

LGTM except for minor comments. Thanks for the work!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, cc @sunchao

}

/** Returns all the files (not directories) in a directory, recursively. */
protected def listFilesRecursively(path: File): Seq[File] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these 3 method are just copied from the existing code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had moved these methods from their original place elsewhere in this file. I moved them back now to simplify the diff.

def numSegments: Int
}

/** A single SQL query's output. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** A single SQL query's output. */
/** A single SQL query's execution output. */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

newLine.startsWith("--") && !newLine.startsWith("--QUERY-DELIMITER")
}
/** A test case. */
protected trait TestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -338,6 +338,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper
conf.trim -> value.substring(1).trim
})

val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we redefine the variable here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this.

protected def runSqlTestCase(
testCase: TestCase,
listTestCases: Seq[TestCase],
runQueries: (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we pass it as a parameter? We can directly invoke runQueries in runSqlTestCase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I removed this and just call runQueries directly now.

Comment on lines 494 to 495
val newTestCase =
if (file.getAbsolutePath.startsWith(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val newTestCase =
if (file.getAbsolutePath.startsWith(
val newTestCase = if (file.getAbsolutePath.startsWith(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* output of running a query.
*/
def readGoldenFileAndCompareResults(
resultFile: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 4 spaces indentation for parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan for your review! Addressed comments.

}

/** Returns all the files (not directories) in a directory, recursively. */
protected def listFilesRecursively(path: File): Seq[File] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had moved these methods from their original place elsewhere in this file. I moved them back now to simplify the diff.

protected def runSqlTestCase(
testCase: TestCase,
listTestCases: Seq[TestCase],
runQueries: (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I removed this and just call runQueries directly now.

* output of running a query.
*/
def readGoldenFileAndCompareResults(
resultFile: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def numSegments: Int
}

/** A single SQL query's output. */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 494 to 495
val newTestCase =
if (file.getAbsolutePath.startsWith(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -338,6 +338,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper
conf.trim -> value.substring(1).trim
})

val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this.

newLine.startsWith("--") && !newLine.startsWith("--QUERY-DELIMITER")
}
/** A test case. */
protected trait TestCase {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6747bee Mar 20, 2023
@cloud-fan
Copy link
Contributor

I'm AFK at this time. @gengliangwang can you help to revert it if @dtenedor can't fix the failure soon?

@LuciferYang
Copy link
Contributor

Seems would be ok to regenerate the golden files

@LuciferYang
Copy link
Contributor

#40492

@dtenedor
Copy link
Contributor Author

Looks like some extra tests got added just as this was getting merged! Thanks @LuciferYang for this fix 👍

MaxGekk pushed a commit that referenced this pull request Mar 20, 2023
…end`

### What changes were proposed in this pull request?
This pr re-generates golden files for `array_prepend` functions. It seems that the newly added case in #38947 is missing from the golden files due to lack of rebase when merging #40449.

### Why are the changes needed?
 Re-generates golden files for `array_prepend` functions to Pass GitHub Actions

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GitHub Actions
- Manually checked with Scala 2.13

Closes #40492 from LuciferYang/SPARK-42791-FOLLOWUP.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
HyukjinKwon pushed a commit that referenced this pull request Mar 22, 2023
… for all input files

### What changes were proposed in this pull request?

This PR enables the new golden file test framework for analysis for all input files.

Background:
* In #40449 we added the ability to exercise the analyzer on the SQL queries in existing golden files in the `sql/core/src/test/resources/sql-tests/inputs` directory, writing separate output test files in the new `sql/core/src/test/resources/sql-tests/analyzer-results` directory in additional to the original output directory for full end-to-end query execution results.
* That PR also added an allowlist of input files to include in this new dual-run mode.
* In this PR, we remove that allowlist exercise the new dual-run mode for all the input files. We also extend the analyzer testing to support separate test cases in ANSI-mode, TimestampNTZ, and UDFs.

### Why are the changes needed?

This improves test coverage and helps prevent against accidental regressions in the future as we edit the code.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR adds testing only.

Closes #40496 from dtenedor/add-all-test-files.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants