Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQL] [WIP] Join test coverage improvements #7985

Closed

Conversation

JoshRosen
Copy link
Contributor

This WIP patch aims to improve the test coverage of Spark SQL's join operators by adding SparkPlanTest tests for the physical operators.

(Some of these changes are extracted from #7904 for easier review).

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #39980 has finished for PR 7985 at commit 7a845bf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class VectorSlicer(override val uid: String)

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #39981 has finished for PR 7985 at commit 0fa69a7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class InnerJoinSuite extends SparkPlanTest

case _ =>
left.output ++ right.output
case x =>
throw new IllegalArgumentException(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BroadcastNestedLoopJoin can't handle non-outer joins, but there was nothing to guard against this (this would never end up being planned though).

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #39984 has finished for PR 7985 at commit be61fbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

My goal here is to keep adding tests until I hit 100% line coverage (and ideally 100% branch coverage) on SortMergeJoin.

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #39993 has finished for PR 7985 at commit 2048cad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class VectorSlicer(override val uid: String)

val sortMergeJoin =
execution.joins.SortMergeJoin(leftKeys, rightKeys, left, right)
val filteredJoin = boundCondition.map(Filter(_, sortMergeJoin)).getOrElse(sortMergeJoin)
EnsureRequirements(filteredJoin.sqlContext).apply(filteredJoin)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question of whether I should be using EnsureRequirements here or not. On the one hand, this allows us to test whether these operators properly declare their requirements / input preconditions in a way that the optimizer can satisfy them.

On the other hand, this makes it harder to exercise precise control over the input iterators that the physical operators see, leaving certain parts of the test orchestration a little more "up to coincidence." For instance, let's say that I want to precisely control the input: I can create left and right plans with single partitions and just trust that the Exchange is not going to unnecessarily re-shuffle before sorting, but that's brittle if someone breaks Exchange to mis-handle this case (see #7988, which specifically fixes an issue that would have broken this strategy).

I guess the way forward is to state the Exchange / EnsureRequirements behavior requirements here, add unit tests for them in PlannerSuite, then assume those requirements here and build on top of them.

@davies
Copy link
Contributor

davies commented Aug 10, 2015

@JoshRosen Should we merge this before #7904 ?

@JoshRosen JoshRosen force-pushed the join-test-coverage-improvements branch from 2048cad to 2a9165e Compare August 10, 2015 21:24
@JoshRosen
Copy link
Contributor Author

I have decided to merge these changes into the SMJ patch since it's easier to develop both patches at the same time.

@JoshRosen JoshRosen closed this Aug 10, 2015
@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40333 has finished for PR 7985 at commit c700df8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen JoshRosen deleted the join-test-coverage-improvements branch August 29, 2016 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants