Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9729] [SPARK-9363] [SQL] Use sort merge join for left and right outer join #7904

Closed
wants to merge 62 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
be19a0f
[SPARK-9054] [SQL] Rename RowOrdering to InterpretedOrdering; use new…
JoshRosen Aug 5, 2015
34b8e0c
Import ordering
JoshRosen Aug 5, 2015
e610655
Add comment RE: Ascending ordering
JoshRosen Aug 5, 2015
df88548
Squash @adrian-wang's changes.
adrian-wang Jun 19, 2015
58edb2e
Remove old TODO
JoshRosen Aug 3, 2015
9faa2ee
Use withSQLConf in JoinSuite
JoshRosen Aug 4, 2015
8d83e15
Use explicit toScala conversions in ShuffledHashOuterJoin.
JoshRosen Aug 4, 2015
a471a6e
Revert changes to SortMergeJoin; add new SortMergeOuterJoin operator
JoshRosen Aug 4, 2015
cf8c042
Fix join operator selection for outer join:
JoshRosen Aug 4, 2015
a09d6e3
Rename HashOuterJoin to OuterJoin.
JoshRosen Aug 4, 2015
58b2d1c
Clean up non-obvious side-effect in JoinedRow.with[Left|Right]
JoshRosen Aug 4, 2015
07ef478
Style cleanup in flatMap; use curly braces instead of parens.
JoshRosen Aug 4, 2015
c3c7ed4
Move initialize() definition closer to usage.
JoshRosen Aug 4, 2015
78714dd
Large refactoring of SMJ internals to improve clarity.
JoshRosen Aug 5, 2015
124f4ba
Remove unnecessary row copying.
JoshRosen Aug 5, 2015
8c50c30
Support SMJ for left outer join.
JoshRosen Aug 5, 2015
8dade55
Also enable for right outer join.
JoshRosen Aug 5, 2015
8e496b2
Fix scalastyle
JoshRosen Aug 5, 2015
6587ef2
Rewrite OuterJoinSuite in preparation for adding more tests.
JoshRosen Aug 5, 2015
a8d1074
Merge branch 'SPARK-9054' into outer-join-smj
JoshRosen Aug 5, 2015
4603081
Merge remote-tracking branch 'origin/master' into outer-join-smj
JoshRosen Aug 5, 2015
681e879
Add tests for outer joins with both inputs empty
JoshRosen Aug 6, 2015
3772505
Fix two minor bugs in SMJ (regression tests pending)
JoshRosen Aug 6, 2015
716bdff
Merge remote-tracking branch 'origin/master' into outer-join-smj
JoshRosen Aug 6, 2015
82632c8
Allow UnsafeRows to be processed in SortMergeJoin
JoshRosen Aug 7, 2015
6e18bc3
Rename HashJoin to EquiJoinSelection
JoshRosen Aug 7, 2015
289e91d
Remove unnecessary requiredChildDistribution from BroadcastHashOuterJoin
JoshRosen Aug 7, 2015
e3f6d71
Use ArrayBuffer instead of CompactBuffer
JoshRosen Aug 7, 2015
075f372
Add missing row key null checks in BroadcastHashOuterJoin
JoshRosen Aug 7, 2015
df250c8
Update to reflect deferral of full outer join to followup patch
JoshRosen Aug 7, 2015
6bbde8c
Merge remote-tracking branch 'origin/master' into outer-join-smj
JoshRosen Aug 7, 2015
bdf513c
Rename build to buffered
JoshRosen Aug 7, 2015
1d8a48c
Update SortMergeJoin to output UnsafeRow in Unsafe mode
JoshRosen Aug 7, 2015
82b7e45
Try to clean up confusingly dense one-liner
JoshRosen Aug 7, 2015
4a4590f
Commment update
JoshRosen Aug 7, 2015
93723e2
Experiment towards using efficient internal iterators.
JoshRosen Aug 7, 2015
7712f7e
Merge remote-tracking branch 'origin/master' into outer-join-smj
JoshRosen Aug 7, 2015
441b89a
Back out now-unnecessary changes to other OuterJoin operators.
JoshRosen Aug 7, 2015
d16b60a
Revert another unnecessary change.
JoshRosen Aug 7, 2015
2c1253f
Add RowIterator.fromScala and use it to guarantee that copying is unn…
JoshRosen Aug 7, 2015
2b68452
Override row format methods for SortMergeOuterJoin
JoshRosen Aug 7, 2015
db80faa
Merge remote-tracking branch 'origin/master' into outer-join-smj
JoshRosen Aug 7, 2015
d41ac51
Fix bug in advancing leftIdx/rightIdx.
JoshRosen Aug 8, 2015
9f48a5c
Efficiency improvement in boundCondition.
JoshRosen Aug 8, 2015
1813a45
For left and right outer joins, streamed rows should not have null jo…
JoshRosen Aug 8, 2015
f183307
Two minor comments on output ordering
JoshRosen Aug 8, 2015
f456086
Add note RE: non-nullability of streamed side's join keys.
JoshRosen Aug 8, 2015
2e5eb2d
Fix loss of rows when removing RowIteratorToScala wrapper.
JoshRosen Aug 8, 2015
a7a24f5
Use RowIterator in SortMergeJoin as well
JoshRosen Aug 8, 2015
7910e83
Add giant comment to RowIterator
JoshRosen Aug 8, 2015
fd439cb
Move RowIterator to execution package
JoshRosen Aug 8, 2015
51ee4b2
Remove incorrect assertions; the non-join-key columns can be null
JoshRosen Aug 8, 2015
e23db3d
Experiment with removing copy
JoshRosen Aug 8, 2015
f701652
Fix incorrectly-placed null check.
JoshRosen Aug 9, 2015
7d3cc5d
It turns out that the copy is unnecessary.
JoshRosen Aug 9, 2015
f83b412
Push null check into buffered iterator next().
JoshRosen Aug 9, 2015
81956b0
Improve unit test coverage of join physical operators.
JoshRosen Aug 6, 2015
899dce2
Expand test data to cover multiple buffered rows per group.
JoshRosen Aug 10, 2015
e79909e
Fix parallelism in join operator unit tests.
JoshRosen Aug 11, 2015
5c34f75
Add regression test exposing bug with missing while loop
JoshRosen Aug 11, 2015
c188a21
Fix while loops while adding regression tests.
JoshRosen Aug 11, 2015
eabacca
comment updates
JoshRosen Aug 11, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Push null check into buffered iterator next().
JoshRosen committed Aug 9, 2015
commit f83b412a08e7cbf4fe07f5bd1a266efaad78b0b8
Original file line number Diff line number Diff line change
@@ -148,6 +148,7 @@ private[joins] class SortMergeJoinScanner(
private[this] var streamedRow: InternalRow = _
private[this] var streamedRowKey: InternalRow = _
private[this] var bufferedRow: InternalRow = _
// Note: this is guaranteed to never have any null columns:
private[this] var bufferedRowKey: InternalRow = _
/**
* The join key for the rows buffered in `bufferedMatches`, or null if `bufferedMatches` is empty
@@ -157,7 +158,7 @@ private[joins] class SortMergeJoinScanner(
private[this] val bufferedMatches: ArrayBuffer[InternalRow] = new ArrayBuffer[InternalRow]

// Initialization (note: do _not_ want to advance streamed here).
advancedBuffered()
advancedBufferedToRowWithNullFreeJoinKey()

// --- Public methods ---------------------------------------------------------------------------

@@ -196,11 +197,10 @@ private[joins] class SortMergeJoinScanner(
do {
if (streamedRowKey.anyNull) {
advancedStreamed()
} else if (bufferedRowKey.anyNull) {
advancedBuffered()
} else {
assert(!bufferedRowKey.anyNull)
comp = keyOrdering.compare(streamedRowKey, bufferedRowKey)
if (comp > 0) advancedBuffered()
if (comp > 0) advancedBufferedToRowWithNullFreeJoinKey()
else if (comp < 0) advancedStreamed()
}
} while (streamedRow != null && bufferedRow != null && comp != 0)
@@ -242,15 +242,10 @@ private[joins] class SortMergeJoinScanner(
if (bufferedRow != null && !streamedRowKey.anyNull) {
// The buffered iterator could still contain matching rows, so we'll need to walk through
// it until we either find matches or pass where they would be found.
var comp =
if (bufferedRowKey.anyNull) 1 else keyOrdering.compare(streamedRowKey, bufferedRowKey)
while (comp > 0 && advancedBuffered()) {
comp = if (bufferedRowKey.anyNull) {
1
} else {
keyOrdering.compare(streamedRowKey, bufferedRowKey)
}
}
var comp = 1
do {
comp = keyOrdering.compare(streamedRowKey, bufferedRowKey)
} while (comp > 0 && advancedBufferedToRowWithNullFreeJoinKey())
if (comp == 0) {
// We have found matches, so buffer them (this updates matchJoinKey)
bufferMatchingRows()
@@ -283,18 +278,22 @@ private[joins] class SortMergeJoinScanner(
}

/**
* Advance the buffered iterator and compute the new row's join key.
* Advance the buffered iterator until we find a row with join key that does not contain nulls.
* @return true if the buffered iterator returned a row and false otherwise.
*/
private def advancedBuffered(): Boolean = {
if (bufferedIter.advanceNext()) {
private def advancedBufferedToRowWithNullFreeJoinKey(): Boolean = {
var foundRow: Boolean = false
while (!foundRow && bufferedIter.advanceNext()) {
bufferedRow = bufferedIter.getRow
bufferedRowKey = bufferedKeyGenerator(bufferedRow)
true
} else {
foundRow = !bufferedRowKey.anyNull
}
if (!foundRow) {
bufferedRow = null
bufferedRowKey = null
false
} else {
true
}
}

@@ -312,11 +311,7 @@ private[joins] class SortMergeJoinScanner(
bufferedMatches.clear()
do {
bufferedMatches += bufferedRow.copy() // need to copy mutable rows before buffering them
advancedBuffered()
} while (
bufferedRow != null &&
!bufferedRowKey.anyNull &&
keyOrdering.compare(streamedRowKey, bufferedRowKey) == 0
)
advancedBufferedToRowWithNullFreeJoinKey()
} while (bufferedRow != null && keyOrdering.compare(streamedRowKey, bufferedRowKey) == 0)
}
}