-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42791][SQL] Create a new golden file test framework for analysis #40449
Conversation
Hi @gengliangwang this should be ready for a first look! |
@dtenedor Since we already have |
@gengliangwang Sure, I was thinking about this too. We can reuse the same input SQL query files if we want, and just generate and test against different analyzer test output files. Let me update the PR to do that and I can ping the thread again. |
@gengliangwang from past experience we will want to keep the query plans separate from the SQL results, otherwise the SQL results become hard to read. I will put the analyzer results in separate files. |
Sounds great! Thanks for the work! |
@gengliangwang alright I made this change, please look again when you are ready. |
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gengliangwang for your review!
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala
Outdated
Show resolved
Hide resolved
LGTM except for minor comments. Thanks for the work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cc @sunchao
} | ||
|
||
/** Returns all the files (not directories) in a directory, recursively. */ | ||
protected def listFilesRecursively(path: File): Seq[File] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these 3 method are just copied from the existing code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had moved these methods from their original place elsewhere in this file. I moved them back now to simplify the diff.
def numSegments: Int | ||
} | ||
|
||
/** A single SQL query's output. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** A single SQL query's output. */ | |
/** A single SQL query's execution output. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
newLine.startsWith("--") && !newLine.startsWith("--QUERY-DELIMITER") | ||
} | ||
/** A test case. */ | ||
protected trait TestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move it back to the original place? https://github.com/apache/spark/pull/40449/files#diff-212b35a074b1eeb518aff4c47130c7ab9cc23856ea6ea62f083af4452c637b9aL178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -338,6 +338,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper | |||
conf.trim -> value.substring(1).trim | |||
}) | |||
|
|||
val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we redefine the variable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this.
protected def runSqlTestCase( | ||
testCase: TestCase, | ||
listTestCases: Seq[TestCase], | ||
runQueries: ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we pass it as a parameter? We can directly invoke runQueries
in runSqlTestCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I removed this and just call runQueries
directly now.
val newTestCase = | ||
if (file.getAbsolutePath.startsWith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val newTestCase = | |
if (file.getAbsolutePath.startsWith( | |
val newTestCase = if (file.getAbsolutePath.startsWith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* output of running a query. | ||
*/ | ||
def readGoldenFileAndCompareResults( | ||
resultFile: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 4 spaces indentation for parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cloud-fan for your review! Addressed comments.
} | ||
|
||
/** Returns all the files (not directories) in a directory, recursively. */ | ||
protected def listFilesRecursively(path: File): Seq[File] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had moved these methods from their original place elsewhere in this file. I moved them back now to simplify the diff.
protected def runSqlTestCase( | ||
testCase: TestCase, | ||
listTestCases: Seq[TestCase], | ||
runQueries: ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I removed this and just call runQueries
directly now.
* output of running a query. | ||
*/ | ||
def readGoldenFileAndCompareResults( | ||
resultFile: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
def numSegments: Int | ||
} | ||
|
||
/** A single SQL query's output. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
val newTestCase = | ||
if (file.getAbsolutePath.startsWith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -338,6 +338,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper | |||
conf.trim -> value.substring(1).trim | |||
}) | |||
|
|||
val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this.
newLine.startsWith("--") && !newLine.startsWith("--QUERY-DELIMITER") | ||
} | ||
/** A test case. */ | ||
protected trait TestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
thanks, merging to master! |
I'm AFK at this time. @gengliangwang can you help to revert it if @dtenedor can't fix the failure soon? |
Seems would be ok to regenerate the golden files |
Looks like some extra tests got added just as this was getting merged! Thanks @LuciferYang for this fix 👍 |
…end` ### What changes were proposed in this pull request? This pr re-generates golden files for `array_prepend` functions. It seems that the newly added case in #38947 is missing from the golden files due to lack of rebase when merging #40449. ### Why are the changes needed? Re-generates golden files for `array_prepend` functions to Pass GitHub Actions ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manually checked with Scala 2.13 Closes #40492 from LuciferYang/SPARK-42791-FOLLOWUP. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
… for all input files ### What changes were proposed in this pull request? This PR enables the new golden file test framework for analysis for all input files. Background: * In #40449 we added the ability to exercise the analyzer on the SQL queries in existing golden files in the `sql/core/src/test/resources/sql-tests/inputs` directory, writing separate output test files in the new `sql/core/src/test/resources/sql-tests/analyzer-results` directory in additional to the original output directory for full end-to-end query execution results. * That PR also added an allowlist of input files to include in this new dual-run mode. * In this PR, we remove that allowlist exercise the new dual-run mode for all the input files. We also extend the analyzer testing to support separate test cases in ANSI-mode, TimestampNTZ, and UDFs. ### Why are the changes needed? This improves test coverage and helps prevent against accidental regressions in the future as we edit the code. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR adds testing only. Closes #40496 from dtenedor/add-all-test-files. Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR updates the
SQLQueryTestSuite
to also consume the same input SQL queries from the input files in second pass and then perform analysis and generate the string representation of the analyzed plans, in a separate new directory. It works much the same way as the previousSQLQueryTestSuite
behavior except that it now also produces analyzed plan string instead of just query results in the output golden files.Why are the changes needed?
This framework will help us guard against bugs in future development by showing a clear signal when analyzer updates result in changes to the query plans for a body of test queries. PR authors and reviewers will be able to see the diffs for all changed plans during the review, and any unintentional plan changes will act as a signal for PR authors to adjust their code to prevent the changes from happening.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This test adds a new test suite and initial golden file. The new test suite passes as of this PR.