-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: use SEMI_JOIN_REWRITE hint to rewrite the semi join #35325
planner: use SEMI_JOIN_REWRITE hint to rewrite the semi join #35325
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that there're also some aggregations not eliminated
"Join{DataScan(t)->DataScan(x)->Aggr(firstrow(test.t.a))}(test.t.a,test.t.a)->Projection->Sel([lt(test.t.a, 1)])->Projection", | ||
"Join{Join{DataScan(t)->DataScan(x)->Aggr(firstrow(test.t.a))}(test.t.a,test.t.a)->Projection->DataScan(x)->Aggr(firstrow(test.t.a))}(test.t.a,test.t.a)->Projection->Projection", | ||
"Join{Join{DataScan(t)->DataScan(x)}(test.t.a,test.t.a)->DataScan(x)->Aggr(firstrow(test.t.a))}(test.t.a,test.t.a)->Projection->Projection", | ||
"Join{Join{DataScan(t)->DataScan(x)->Aggr(firstrow(test.t.a))}(test.t.a,test.t.a)->Projection->DataScan(x)->Aggr(firstrow(test.t.a))}(test.t.a,test.t.a)->Projection->Projection", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two semi-joins should all be able rewriting.
Needs further checking.
The hint cannot be captured by the SPM(since the behavior of the hint is not enabled by default. So it's okay that it's not captured currently). |
planner/core/logical_plan_builder.go
Outdated
@@ -3626,6 +3628,12 @@ func (b *PlanBuilder) pushTableHints(hints []*ast.TableOptimizerHint, currentLev | |||
leadingJoinOrder = append(leadingJoinOrder, tableNames2HintTableInfo(b.ctx, hint.HintName.L, hint.Tables, b.hintProcessor, currentLevel)...) | |||
} | |||
leadingHintCnt++ | |||
case HintSemiJoinRewrite: | |||
if !b.checkSemiJoinHint { | |||
b.ctx.GetSessionVars().StmtCtx.AppendWarning(ErrInternal.GenWithStack("The SEMI_JOIN_REWRITE hint is not used correctly, maybe it's not in a subquery or the subquery is not EXSITS clause.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b.ctx.GetSessionVars().StmtCtx.AppendWarning(ErrInternal.GenWithStack("The SEMI_JOIN_REWRITE hint is not used correctly, maybe it's not in a subquery or the subquery is not EXSITS clause.")) | |
b.ctx.GetSessionVars().StmtCtx.AppendWarning(ErrInternal.GenWithStack("The SEMI_JOIN_REWRITE hint is not used correctly, maybe it's not in a subquery or the subquery is not EXISTS clause.")) |
subAgg := LogicalAggregation{ | ||
AggFuncs: make([]*aggregation.AggFuncDesc, 0, len(join.EqualConditions)), | ||
GroupByItems: make([]expression.Expression, 0, len(join.EqualConditions)), | ||
}.Init(p.SCtx(), p.Children()[1].SelectBlockOffset()) | ||
|
||
aggOutputCols := make([]*expression.Column, 0, len(join.EqualConditions)) | ||
for i := range join.EqualConditions { | ||
innerCol := join.EqualConditions[i].GetArgs()[1].(*expression.Column) | ||
firstRow, err := aggregation.NewAggFuncDesc(join.SCtx(), ast.AggFuncFirstRow, []expression.Expression{innerCol}, false) | ||
if err != nil { | ||
return nil, err | ||
} | ||
subAgg.AggFuncs = append(subAgg.AggFuncs, firstRow) | ||
subAgg.GroupByItems = append(subAgg.GroupByItems, innerCol) | ||
aggOutputCols = append(aggOutputCols, innerCol) | ||
} | ||
subAgg.SetChildren(innerChild) | ||
subAgg.SetSchema(expression.NewSchema(aggOutputCols...)) | ||
subAgg.buildSelfKeyInfo(subAgg.Schema()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we directly use buildDistinct()
like we do in handleInSubquery()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In buildDistinct
it also deals with the hints or the optFlag which is not needed after the plan building phase.
I think we can keep the current codes since it's a very simple rewrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
}.Init(p.SCtx(), p.Children()[1].SelectBlockOffset()) | ||
|
||
aggOutputCols := make([]*expression.Column, 0, len(join.EqualConditions)) | ||
for i := range join.EqualConditions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can reuse the buildDistinct here like InSubquery?seems the last &columnPruner{} can reduce the unused column if needed.
…o semi-join-rewrite-with-hint
Care for the license header from Rest LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM.
And please make it pass the CI.
planner/core/expression_rewriter.go
Outdated
@@ -315,7 +315,9 @@ func (er *expressionRewriter) constructBinaryOpFunction(l expression.Expression, | |||
} | |||
} | |||
|
|||
func (er *expressionRewriter) buildSubquery(ctx context.Context, subq *ast.SubqueryExpr) (LogicalPlan, error) { | |||
// buildSubquery translates the subquery ast to plan. | |||
// Currently, only the IN/EXIST can apply the rewrite hint(rewrite the semi join to inner join with aggregation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is a bit misleading.
The hint is applicable for EXISTS
only. The rewrite for IN
is not controlled by the hint.
/run-unit-test |
/run-check_dev |
Code Coverage Details: https://codecov.io/github/pingcap/tidb/commit/ab3778347e3efbe56896d09741e4c6000f16c75a |
/run-mysql-test |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 0397da9
|
/run-check_dev_2 |
TiDB MergeCI notify✅ Well Done! New fixed [1] after this pr merged.
|
cherry pick to release-6.1 in PR #37283 |
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
What problem does this PR solve?
Issue Number: close #35323
Problem Summary:
What is changed and how it works?
Add the new hint
SEMI_JOIN_REWRITE
which is written in the subquery scope.If the hint is specified, we rewrite the semi join to inner join with aggregatioin just like what
tidb_opt_insubq_to_join_and_agg
does.An Example
before
After
Since Exist semi-join character doesn't care how many rows from the inner side existed, adding distinct here can reduce the size of the second join result set here as we say.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.