-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SQL] [WIP] Join test coverage improvements #7985
[SQL] [WIP] Join test coverage improvements #7985
Conversation
Test build #39980 has finished for PR 7985 at commit
|
Test build #39981 has finished for PR 7985 at commit
|
case _ => | ||
left.output ++ right.output | ||
case x => | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BroadcastNestedLoopJoin can't handle non-outer joins, but there was nothing to guard against this (this would never end up being planned though).
Test build #39984 has finished for PR 7985 at commit
|
My goal here is to keep adding tests until I hit 100% line coverage (and ideally 100% branch coverage) on SortMergeJoin. |
Test build #39993 has finished for PR 7985 at commit
|
val sortMergeJoin = | ||
execution.joins.SortMergeJoin(leftKeys, rightKeys, left, right) | ||
val filteredJoin = boundCondition.map(Filter(_, sortMergeJoin)).getOrElse(sortMergeJoin) | ||
EnsureRequirements(filteredJoin.sqlContext).apply(filteredJoin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question of whether I should be using EnsureRequirements
here or not. On the one hand, this allows us to test whether these operators properly declare their requirements / input preconditions in a way that the optimizer can satisfy them.
On the other hand, this makes it harder to exercise precise control over the input iterators that the physical operators see, leaving certain parts of the test orchestration a little more "up to coincidence." For instance, let's say that I want to precisely control the input: I can create left and right plans with single partitions and just trust that the Exchange is not going to unnecessarily re-shuffle before sorting, but that's brittle if someone breaks Exchange to mis-handle this case (see #7988, which specifically fixes an issue that would have broken this strategy).
I guess the way forward is to state the Exchange / EnsureRequirements behavior requirements here, add unit tests for them in PlannerSuite, then assume those requirements here and build on top of them.
@JoshRosen Should we merge this before #7904 ? |
2048cad
to
2a9165e
Compare
I have decided to merge these changes into the SMJ patch since it's easier to develop both patches at the same time. |
Test build #40333 has finished for PR 7985 at commit
|
This WIP patch aims to improve the test coverage of Spark SQL's join operators by adding
SparkPlanTest
tests for the physical operators.(Some of these changes are extracted from #7904 for easier review).