-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend fast inequality join #8614
Extend fast inequality join #8614
Conversation
758a8ee
to
c292c1e
Compare
@kokosing, @sopel39 , @anusudarsan I cleaned this up as the whole feature from the start felt to messy to me. Please give it (hopefully) last read. |
c292c1e
to
0750369
Compare
0750369
to
20cc040
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@losipiuk LGTM mod comment. Also, I fixed a checkstyle error.
// check if filtering function to startingPosition | ||
if (applyLessThanFunction(startingPosition, probePosition, allProbeChannelsPage)) { | ||
if (applyLessThanFunction(filterFunction, startingPosition, probePosition, allProbeChannelsPage)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename applyLessThanFunction -> applySearchFunction as a part of the refactoring commit 9e5b4b38040a59120bd3de6ac67c832403efb97e ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added renaming commit.
8c10e51
to
4e689f5
Compare
@sopel39 last pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One important comment about correctness, a bit of refactor and wording. Feel free to ignore anything you find a nit-picking.
@@ -44,13 +44,14 @@ | |||
* <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- while cleanup up this javadoc, first line is:
This class assumes that lessThanFunction is a superset of the whole filtering
This is remotely related to truth. Before other changes in this PR, no ANDs
nor ORs
are supported.
After other changes, only specific ANDs
and ORs
are supported (those that compare to same build side symbol).
- Further, in the following
by passing any of the filterFunction_i to the SortedPositionLinks. We could not
s/We could not/We cannot/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the Javadoc very much.
* That allows us to define an order of the elements in positionLinks (this defining which | ||
* element is smaller) using {@code g(...)} function and to perform a binary search using | ||
* {@code f(probePosition)} value. | ||
* and buildSymbolRef is a symbol reference on the build side of join. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/buildSymbolRef/{@code buildSymbolRef}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N/A
* and buildSymbolRef is a symbol reference on the build side of join. | ||
* | ||
* That allows us to define an order of the elements in positionLinks by sorting them by | ||
* {@code buildSymbolRef} and then perform a binary search using {@code f(probePosition)}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/{@code f(probePosition)}/{@code f(probeColumn1, probeColumn2, ..., probeColumnN)}
- note about binary search is a bit misleading, we do binary search only for
>, >=
cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will think about the binary search case
* A.a < B.x | ||
* A.a < f(B.x, B.y, B.z) | ||
* <p> | ||
* where a is the build side symbol reference and x,y,z are probe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/a/{@code a}
, same for x,y,z
@@ -32,7 +32,10 @@ | |||
/** | |||
* Currently this class handles only simple expressions like: | |||
* <p> | |||
* A.a < B.x | |||
* A.a < f(B.x, B.y, B.z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SortedPositionLinks
you said about >, <, <=, >=
. Here, <
is only for terseness or this class indeed supports only <
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N/A
} | ||
|
||
@Override | ||
public int start(int startingPosition, int probePosition, Page allProbeChannelsPage) | ||
{ | ||
for (JoinFilterFunction searchFunction : searchFunctions) { | ||
startingPosition = findStartPositionForFunction(searchFunction, startingPosition, probePosition, allProbeChannelsPage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls do not overwrite input param startingPosition
Optional<SortExpressionContext> rightProcessed = process(binaryExpression.getRight()); | ||
|
||
if (!leftProcessed.isPresent() || !rightProcessed.isPresent() || !leftProcessed.get().getSortExpression().equals(rightProcessed.get().getSortExpression())) { | ||
return Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return left (ie any) and put TODO to make it cost-based decision.
If there are two <
conditions on two different build symbols, better to have sorted position links on one of them than on none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
private SortExpressionContext merge(SortExpressionContext left, SortExpressionContext right) | ||
{ | ||
checkArgument(left.getSortExpression().equals(right.getSortExpression())); | ||
ImmutableSet.Builder<Expression> searchExpressions = ImmutableSet.builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a list? Expression de-duplication is something optimizer should be doing, a list should suffice.
} | ||
|
||
private static void assertGetSortExpression(Expression expression) | ||
private void assertGetSortExpression(Expression expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why no longer static
?
@@ -2338,6 +2338,22 @@ public void testJoinWithGreaterThanInJoinClause() | |||
} | |||
|
|||
@Test | |||
public void testJoinWithRangePredicatesinJoinClause() | |||
{ | |||
assertQuery("SELECT COUNT(*)\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace \n
with
in the queries
@@ -131,11 +131,11 @@ protected Boolean visitSymbolReference(SymbolReference symbolReference, Void con | |||
} | |||
} | |||
|
|||
public static class SortExpression | |||
public static class RowSortExpressionContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a point in renaming & extracting the class you remove 2 commits later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed. Let's leave it.
@anusudarsan I am addressing the comments |
4e689f5
to
4229b05
Compare
@@ -554,7 +554,7 @@ public Void visitJoin(JoinNode node, Integer indent) | |||
formatOutputs(node.getOutputSymbols())); | |||
} | |||
|
|||
node.getSortExpression().ifPresent(expression -> print(indent + 2, "SortExpression[%s]", expression)); | |||
node.getSortExpressionContext().ifPresent(context -> print(indent + 2, "SortExpression[%s]", context.getSortExpression())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting what does it do in plan printer. Unconditionally to the flag enabling inequi join. (no change requested)
@@ -1591,14 +1591,17 @@ private LookupSourceFactory createLookupSourceFactory( | |||
context.getTypes(), | |||
context.getSession())); | |||
|
|||
Optional<Integer> sortChannel = node.getSortExpressionContext() | |||
Optional<SortExpressionContext> sortExpressionContext = node.getSortExpressionContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 :)
checkState(!searchFunctions.isEmpty(), "Using sortedPositionLinks with no search functions"); | ||
this.searchFunctions = requireNonNull(searchFunctions, "searchFunctions is null"); | ||
this.searchFunctions = searchFunctions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider converting it to array here.
@@ -192,53 +193,54 @@ public int next(int position, int probePosition, Page allProbeChannelsPage) | |||
@Override | |||
public int start(int startingPosition, int probePosition, Page allProbeChannelsPage) | |||
{ | |||
int[] links = sortedPositionLinks[startingPosition]; | |||
if (links == null) { | |||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't links==null
the case of no collisions?
In original code, we had the following here:
if (applyLessThanFunction(startingPosition, probePosition, allProbeChannelsPage)) {
return startingPosition;
}
and this seemingly could return startingPosition
even if sortedPositionLinks[startingPosition]==null
int left = 0; | ||
int right = sortedPositionLinks[startingPosition].length - 1; | ||
int left = startOffset; | ||
int right = links.length - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can safely inline left
& right
if (offset < 0) { | ||
return -1; | ||
} | ||
if (!applySearchFunction(searchFunction, startingPosition, offset, probePosition, allProbeChannelsPage)) { | ||
if (!applySearchFunction(searchFunction, links, offset, probePosition, allProbeChannelsPage)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now i dont understand what this if
does here. Looks like it's redundant and lowerBound
could cover it. (no change required, but you can think about it..)
List<JoinFilterFunctionFactory> searchFunctionFactories = sortExpressionContext | ||
.map(SortExpressionContext::getSearchExpressions) | ||
.map(searchExpressions -> searchExpressions | ||
.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retain .stream()
on prev line
@@ -5276,7 +5284,7 @@ public void testWithHiding() | |||
"SELECT * FROM b", | |||
"SELECT 2"); | |||
assertQueryFails( | |||
"WITH a AS (VALUES 1), " + | |||
"WITH a AS (VALUES 1), " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unel.
for (JoinFilterFunction searchFunction : searchFunctions) { | ||
if (!applySearchFunction(searchFunction, startingPosition, probePosition, allProbeChannelsPage)) { | ||
allSearchFunctionsMatchedForStartingPosition = false; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply return false
"(VALUES (1,1),(2,1)) t1(a,b), " + | ||
"(VALUES (1,1),(1,2),(2,1)) t2(x,y) " + | ||
"WHERE a=x and b>=y", | ||
"VALUES (1,1,1,1), (2,1,2,1)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test in TestPositionLinks
too
@@ -208,6 +211,18 @@ public int start(int startingPosition, int probePosition, Page allProbeChannelsP | |||
return links[currentStartOffset]; | |||
} | |||
|
|||
private boolean applyAllSearchFunctions(int startingPosition, int probePosition, Page allProbeChannelsPage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shan't this new cool method be used in next() too?
@@ -61,7 +61,7 @@ public void testArrayPositionLinks() | |||
public void testSortedPositionLinks() | |||
{ | |||
JoinFilterFunction filterFunction = (leftAddress, rightPosition, rightPage) -> | |||
BIGINT.getLong(rightPage.getBlock(0), leftAddress) > 4; | |||
BIGINT.getLong(TEST_PAGE.getBlock(0), leftAddress) > 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmt msg typo "embended"
4d9bcbf
to
d32d2ed
Compare
Squashed fixups. |
|
||
public SortExpressionContext(Expression sortExpression) | ||
{ | ||
this.sortExpression = requireNonNull(sortExpression, "sortExpression can not be null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"can not" is not valid, see #8770
Optional<SortExpressionContext> rightProcessed = process(binaryExpression.getRight()); | ||
|
||
if (!leftProcessed.isPresent() || !rightProcessed.isPresent() || !leftProcessed.get().getSortExpression().equals(rightProcessed.get().getSortExpression())) { | ||
return Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
1563e88
to
7883b37
Compare
@findepi. Would you like to take a look in some spare time? ;) |
7883b37
to
1f0e036
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor comments
.collect(groupingBy(SortExpressionContext::getSortExpression, reducing(SortExpressionExtractor::merge))) | ||
.values() | ||
.stream() | ||
.map(Optional::get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious: call to Optional.get
without .ifPresent
. You can avoid having those misleading Optional
-s, if you replace .collect(groupingBy(...))
with .collect(toMap(SortExpressionContext::getSortExpression, c->c, SortExpressionExtractor::merge)
.
import static java.util.Objects.requireNonNull; | ||
import static java.util.stream.Collectors.groupingBy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmt msg "Consider different sort expressions in SortExpressionExtractor" -- it is not quite clear. Maybe
"Extract sort expressions from complex join filters" ?
.map(Optional::get) | ||
.collect(toImmutableList()); | ||
|
||
// For now heuristically pick sort expression which has most search expression assigned to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/search expression/&s
|
||
// For now heuristically pick sort expression which has most search expression assigned to it. | ||
// TODO: make it cost based decision based on symbol statistics | ||
return sortExpressionCandidates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm LF
// TODO: make it cost based decision based on symbol statistics | ||
return sortExpressionCandidates | ||
.stream() | ||
.sorted(comparing(context -> -context.getSearchExpressions().size())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using .reversed()
instead of -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that before. As well as Comparator.reverseOrder
. But then I have to add explicit cast of context
to SortExpressionContext
.
Like this:
.sorted(comparing(context -> ((SortExpressionContext) context).getSearchExpressions().size()).reversed())
I prefer -
in that case unless you can help me get rid of cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas. However, rather than cast, you should rather s/context ->/(SortExpressionContext context) ->
.
You can also -1 *
instead of -
to make it more visible. Or leave as is, not a big deal.
Addressed comments. One question. |
// TODO: make it cost based decision based on symbol statistics | ||
return sortExpressionCandidates | ||
.stream() | ||
.sorted(comparing(context -> -context.getSearchExpressions().size())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas. However, rather than cast, you should rather s/context ->/(SortExpressionContext context) ->
.
You can also -1 *
instead of -
to make it more visible. Or leave as is, not a big deal.
f4deff6
to
f4e0b62
Compare
Use Row prefix to point fact that class operates in channels domain.
Add SortExpressionContext which captures logical sort expression.
Pass explicit searchExpression when doing inequality filtering for join. Previously whole filterFunction was assummed to be the search expression. With explici searchExpression we can capture more cases when we want to use subset of filter function conjunts (possibly transformed) as search function.
f4e0b62
to
318332a
Compare
The sorted position links is searched for each of the expression in the range predicate. Thus this optimization works only for predicates with AND (conjuncts). The iteration over the position links is stopped as soon as any of the filter expression evaluates to false.
TestPositionLinks used `rightPage` to simulate access to build-side data. Changing code to use TEST_PAGE which more accuratelly simulates implementation of behaviour of standard implementation of JoinFilterFunction which have build-side data embeded.
This extends functionality added in #7097, for #6922.
Internal review - Teradata#630
The PR extends the functionality to speed up query with range predicates eg: benchmarkRangePredicateJoin . But I added benchmark tests for other queries which were already addressed by the optimization. So you can see the comparison below with and without this optimization.
Complete Benchmarking results
@losipiuk