Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Reduce end-to-end test time #109

Merged
merged 13 commits into from
Feb 29, 2024
Merged
4 changes: 2 additions & 2 deletions spark/src/test/scala/org/apache/comet/CometCastSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -90,13 +90,13 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper {
Range(0, len).map(_ => chars.charAt(r.nextInt(chars.length))).mkString
}

private def fuzzCastFromString(chars: String, maxLen: Int, toType: DataType) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is added by accident (from make format). I'll remove it later.

private def fuzzCastFromString(chars: String, maxLen: Int, toType: DataType): Unit = {
val r = new Random(0)
val inputs = Range(0, 10000).map(_ => genString(r, chars, maxLen))
castTest(inputs.toDF("a"), toType)
}

private def castTest(input: DataFrame, toType: DataType) {
private def castTest(input: DataFrame, toType: DataType): Unit = {
withTempPath { dir =>
val df = roundtripParquet(input, dir)
.withColumn("converted", col("a").cast(toType))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -537,23 +537,22 @@ class CometAggregateSuite extends CometTestBase with AdaptiveSparkPlanHelper {
withSQLConf(CometConf.COMET_BATCH_SIZE.key -> batchSize.toString) {

// Test all combinations of different aggregation & group-by types
(1 to 4).foreach { col =>
(1 to 14).foreach { gCol =>
(1 to 14).foreach { gCol =>
(1 to 4).foreach { col =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unrelated question: why 1 to 4? seems like _1 to _4 are both integer types.

We probably want to test other types like float/double, decimal etc?

But this should be addressed in another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this only covers integer types. The other types like float/double, decimal are covered by other tests in the same suite. We could add them here but it might cause explosion in the total test time.

withView("v") {
sql(s"CREATE TEMP VIEW v AS SELECT _g$gCol, _$col FROM tbl ORDER BY _$col")
checkSparkAnswer(s"SELECT _g$gCol, FIRST(_$col) FROM v GROUP BY _g$gCol")
sunchao marked this conversation as resolved.
Show resolved Hide resolved
checkSparkAnswer(s"SELECT _g$gCol, LAST(_$col) FROM v GROUP BY _g$gCol")
}
checkSparkAnswer(s"SELECT _g$gCol, SUM(_$col) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(
s"SELECT _g$gCol, SUM(DISTINCT _$col) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(s"SELECT _g$gCol, COUNT(_$col) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(
s"SELECT _g$gCol, COUNT(DISTINCT _$col) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(
s"SELECT _g$gCol, MIN(_$col), MAX(_$col) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(s"SELECT _g$gCol, AVG(_$col) FROM tbl GROUP BY _g$gCol")
}
checkSparkAnswer(s"SELECT _g$gCol, SUM(_1), SUM(_2) FROM tbl GROUP BY _g$gCol")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar for sum, count, min, avg and max.

Count(distinct xx) and sum(distinct xx) is different, might have to be iterated by 4 cols.

checkSparkAnswer(s"SELECT _g$gCol, SUM(DISTINCT _3) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(
s"SELECT _g$gCol, COUNT(_3), COUNT(_4) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(
s"SELECT _g$gCol, COUNT(DISTINCT _1) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(s"SELECT _g$gCol, MIN(_1), MAX(_4) FROM tbl GROUP BY _g$gCol")
checkSparkAnswer(s"SELECT _g$gCol, AVG(_2), AVG(_4) FROM tbl GROUP BY _g$gCol")
}
}
}
Expand Down
Loading