Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](nereids) Support inner join query rewrite by materialized view #27922

Merged
merged 6 commits into from
Dec 7, 2023

Conversation

seawinde
Copy link
Contributor

@seawinde seawinde commented Dec 4, 2023

Proposed changes

Work in process. Support inner join query rewrite by materialized view in some scene.
Such as an exmple as following:

mv = "select lineitem.L_LINENUMBER, orders.O_CUSTKEY " +
"from orders " +
"inner join lineitem on lineitem.L_ORDERKEY = orders.O_ORDERKEY "
query = "select lineitem.L_LINENUMBER " +
"from lineitem " +
"inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY "

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

7 similar comments
@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.15 seconds
stream load tsv: 560 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17163626648 Bytes

@seawinde
Copy link
Contributor Author

seawinde commented Dec 4, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.46 seconds
stream load tsv: 564 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.3 seconds inserted 10000000 Rows, about 353K ops/s
storage size: 17163852703 Bytes

@seawinde seawinde force-pushed the inner_join_mv_rewrite_impl branch 2 times, most recently from e93fd55 to f0075e6 Compare December 5, 2023 06:44
@seawinde
Copy link
Contributor Author

seawinde commented Dec 5, 2023

run buildall

@seawinde seawinde force-pushed the inner_join_mv_rewrite_impl branch from e0ce9d1 to cdf8817 Compare December 5, 2023 08:53
@seawinde
Copy link
Contributor Author

seawinde commented Dec 5, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit cdf881741916036a048bd6d2414eaa7274425f9a, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4690	4409	4408	4408
q2	367	147	163	147
q3	1464	1220	1204	1204
q4	1099	908	876	876
q5	3157	3188	3170	3170
q6	243	126	129	126
q7	1004	501	495	495
q8	2195	2207	2188	2188
q9	6697	6682	6667	6667
q10	3269	3282	3259	3259
q11	328	213	201	201
q12	353	211	217	211
q13	4551	3810	3806	3806
q14	244	215	216	215
q15	579	521	516	516
q16	431	381	378	378
q17	1007	610	618	610
q18	7589	7182	7835	7182
q19	1512	1413	1405	1405
q20	538	316	298	298
q21	3084	2719	2635	2635
q22	356	288	292	288
Total cold run time: 44757 ms
Total hot run time: 40285 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4380	4418	4348	4348
q2	270	166	177	166
q3	3515	3517	3517	3517
q4	2378	2378	2360	2360
q5	5727	5749	5745	5745
q6	239	121	124	121
q7	2380	1870	1867	1867
q8	3520	3519	3526	3519
q9	9077	9039	9006	9006
q10	3924	4000	3985	3985
q11	508	388	380	380
q12	758	589	579	579
q13	4305	3577	3585	3577
q14	299	242	248	242
q15	576	518	527	518
q16	479	445	442	442
q17	1871	1889	1859	1859
q18	8585	8217	8596	8217
q19	1718	1752	1754	1752
q20	2274	1962	1942	1942
q21	6501	6175	6140	6140
q22	498	414	425	414
Total cold run time: 63782 ms
Total hot run time: 60696 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.2 seconds
stream load tsv: 562 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17163557029 Bytes

@seawinde
Copy link
Contributor Author

seawinde commented Dec 5, 2023

run buildall

@seawinde seawinde force-pushed the inner_join_mv_rewrite_impl branch from 9c9ba36 to d2497b3 Compare December 5, 2023 23:48
@seawinde
Copy link
Contributor Author

seawinde commented Dec 5, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit d2497b35c64547f7d4b7c15cb5aa17e23fb9c3f6, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4677	4400	4467	4400
q2	364	160	161	160
q3	1460	1225	1255	1225
q4	1114	922	887	887
q5	3171	3194	3181	3181
q6	247	130	128	128
q7	993	497	485	485
q8	2214	2200	2174	2174
q9	6668	6721	6625	6625
q10	3205	3275	3271	3271
q11	319	195	208	195
q12	362	214	213	213
q13	4575	3798	3826	3798
q14	248	211	218	211
q15	566	524	523	523
q16	445	383	380	380
q17	1003	610	609	609
q18	7445	6959	7207	6959
q19	1535	1371	1432	1371
q20	542	328	333	328
q21	3092	2643	2684	2643
q22	355	291	301	291
Total cold run time: 44600 ms
Total hot run time: 40057 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4349	4396	4361	4361
q2	269	160	172	160
q3	3532	3529	3519	3519
q4	2378	2372	2369	2369
q5	5737	5738	5768	5738
q6	239	122	123	122
q7	2371	1896	1856	1856
q8	3524	3531	3534	3531
q9	9070	9003	8999	8999
q10	3926	4002	4007	4002
q11	497	385	380	380
q12	767	586	584	584
q13	4316	3568	3551	3551
q14	291	246	259	246
q15	571	520	519	519
q16	491	452	462	452
q17	1874	1857	1856	1856
q18	9704	8122	8201	8122
q19	1728	1726	1767	1726
q20	2276	1952	1930	1930
q21	6538	6185	6154	6154
q22	500	427	410	410
Total cold run time: 64948 ms
Total hot run time: 60587 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.61 seconds
stream load tsv: 574 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17162538956 Bytes

Comment on lines 70 to 73
planner.plan(unboundMvPlan, PhysicalProperties.ANY, ExplainLevel.ALL_PLAN);
Plan mvAnalyzedPlan = planner.getAnalyzedPlan();
Plan mvRewrittenPlan = planner.getRewrittenPlan();
Plan mvPlan = mvRewrittenPlan instanceof LogicalResultSink
Copy link
Contributor

@keanji-x keanji-x Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAnalyzedPlan is only visible for testing.
May be can only use rewritten plan and get plan by:

Suggested change
planner.plan(unboundMvPlan, PhysicalProperties.ANY, ExplainLevel.ALL_PLAN);
Plan mvAnalyzedPlan = planner.getAnalyzedPlan();
Plan mvRewrittenPlan = planner.getRewrittenPlan();
Plan mvPlan = mvRewrittenPlan instanceof LogicalResultSink
Plan rewrittenPlan = planner.plan(plan, PhysicalProperties.ANY, ExplainLevel.REWRITTEN_PLAN);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you are right

// TODO Should get struct info from hyper graph and check
return false;
HyperGraph hyperGraph = structInfo.getHyperGraph();
HashSet<JoinType> requiredJoinType = Sets.newHashSet(JoinType.INNER_JOIN, JoinType.LEFT_OUTER_JOIN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be as a static member

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,have fix it

// Can not compensate, bail out
if (compensatePredicates == null || compensatePredicates.isEmpty()) {
if (compensatePredicates.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about when predicates are exactly the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the same, the compensatePredicates will be always true. call this method org.apache.doris.nereids.rules.exploration.mv.Predicates.SplitPredicate#isAlwaysTrue

@seawinde
Copy link
Contributor Author

seawinde commented Dec 6, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.64 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17161890093 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 2f8e70b49e5bfcf13aea8ca458f7b45b38efc535, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4745	4486	4497	4486
q2	373	159	170	159
q3	1467	1260	1245	1245
q4	1108	963	927	927
q5	3150	3178	3207	3178
q6	248	130	127	127
q7	1006	493	484	484
q8	2258	2189	2171	2171
q9	6684	6662	6673	6662
q10	3210	3257	3267	3257
q11	323	198	207	198
q12	357	217	216	216
q13	4569	3865	3819	3819
q14	238	214	215	214
q15	574	524	534	524
q16	451	391	401	391
q17	1016	581	549	549
q18	7588	6973	7431	6973
q19	1510	1384	1401	1384
q20	573	326	319	319
q21	3077	2741	2779	2741
q22	366	292	302	292
Total cold run time: 44891 ms
Total hot run time: 40316 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4473	4411	4421	4411
q2	272	165	176	165
q3	3541	3534	3533	3533
q4	2406	2404	2398	2398
q5	5740	5740	5718	5718
q6	243	119	121	119
q7	2384	1880	1879	1879
q8	3507	3516	3510	3510
q9	9117	9052	9058	9052
q10	3912	3986	3975	3975
q11	514	385	386	385
q12	765	599	598	598
q13	4291	3529	3566	3529
q14	280	248	245	245
q15	569	535	528	528
q16	514	450	463	450
q17	1886	1835	1872	1835
q18	8671	8203	8358	8203
q19	1717	1737	1730	1730
q20	2243	1937	1928	1928
q21	6515	6221	6201	6201
q22	506	420	425	420
Total cold run time: 64066 ms
Total hot run time: 60812 ms

@seawinde seawinde requested a review from keanji-x December 7, 2023 02:27
@seawinde
Copy link
Contributor Author

seawinde commented Dec 7, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 10c27d074e61de063fac2b15bbd38c973d8003c2, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4689	4450	4513	4450
q2	361	159	153	153
q3	1469	1255	1289	1255
q4	1120	912	958	912
q5	3172	3192	3165	3165
q6	251	135	128	128
q7	1005	498	495	495
q8	2221	2233	2207	2207
q9	6695	6721	6687	6687
q10	3206	3272	3268	3268
q11	318	214	215	214
q12	354	224	217	217
q13	4575	3769	3811	3769
q14	240	211	221	211
q15	573	523	523	523
q16	439	392	388	388
q17	1012	596	584	584
q18	7468	7289	7025	7025
q19	1523	1446	1413	1413
q20	507	1321	623	623
q21	3106	2683	2662	2662
q22	354	287	299	287
Total cold run time: 44658 ms
Total hot run time: 40636 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4393	4417	4397	4397
q2	271	162	183	162
q3	3560	3538	3509	3509
q4	2380	2367	2359	2359
q5	5745	5731	5734	5731
q6	241	121	122	121
q7	2371	1859	1839	1839
q8	3527	3523	3532	3523
q9	9064	9051	9011	9011
q10	3907	3971	3981	3971
q11	494	391	396	391
q12	771	598	601	598
q13	4273	3565	3572	3565
q14	281	258	252	252
q15	573	524	509	509
q16	500	482	496	482
q17	1886	1881	1867	1867
q18	8658	8304	8278	8278
q19	1748	1698	1782	1698
q20	2259	1938	1943	1938
q21	6489	6171	6152	6152
q22	494	432	409	409
Total cold run time: 63885 ms
Total hot run time: 60762 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.95 seconds
stream load tsv: 583 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17195240527 Bytes

Copy link
Contributor

github-actions bot commented Dec 7, 2023

PR approved by anyone and no changes requested.

@seawinde seawinde force-pushed the inner_join_mv_rewrite_impl branch from 10c27d0 to be85ee2 Compare December 7, 2023 07:29
@seawinde
Copy link
Contributor Author

seawinde commented Dec 7, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.96 seconds
stream load tsv: 586 seconds loaded 74807831229 Bytes, about 121 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17193694728 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit be85ee2b1c5b5a5c7aa5c1c96c0f960fd54747c6, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4708	4444	4468	4444
q2	379	177	158	158
q3	1467	1277	1198	1198
q4	1104	916	881	881
q5	3162	3154	3139	3139
q6	250	132	129	129
q7	1001	504	494	494
q8	2228	2192	2185	2185
q9	6735	6681	6637	6637
q10	3240	3267	3278	3267
q11	326	209	207	207
q12	358	211	215	211
q13	4556	3858	3800	3800
q14	254	212	220	212
q15	566	522	522	522
q16	443	388	392	388
q17	1010	603	575	575
q18	7541	8122	7985	7985
q19	1516	1384	1386	1384
q20	556	304	341	304
q21	3071	2659	2658	2658
q22	358	286	294	286
Total cold run time: 44829 ms
Total hot run time: 41064 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4411	4414	4379	4379
q2	266	163	173	163
q3	3537	3514	3520	3514
q4	2393	2376	2364	2364
q5	5722	5739	5728	5728
q6	238	121	121	121
q7	2383	1847	1869	1847
q8	3507	3519	3515	3515
q9	9049	9042	9017	9017
q10	3905	3998	4020	3998
q11	510	394	399	394
q12	763	604	588	588
q13	4325	3562	3570	3562
q14	283	265	249	249
q15	560	519	524	519
q16	496	478	487	478
q17	1888	1833	1850	1833
q18	8780	8567	8402	8402
q19	1691	1724	1756	1724
q20	2290	1943	1965	1943
q21	6531	6175	6160	6160
q22	497	418	432	418
Total cold run time: 64025 ms
Total hot run time: 60916 ms

@seawinde seawinde force-pushed the inner_join_mv_rewrite_impl branch from be85ee2 to d302529 Compare December 7, 2023 09:14
@seawinde
Copy link
Contributor Author

seawinde commented Dec 7, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit d30252923e6aa47533fa7a1a70489fbf6f3bab1d, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4702	4495	4516	4495
q2	370	148	161	148
q3	1452	1234	1213	1213
q4	1115	970	938	938
q5	3182	3141	3152	3141
q6	248	129	129	129
q7	991	495	485	485
q8	2204	2250	2201	2201
q9	6711	6754	6686	6686
q10	3222	3235	3297	3235
q11	336	209	205	205
q12	355	212	216	212
q13	4576	3810	3866	3810
q14	246	213	221	213
q15	577	519	517	517
q16	446	384	390	384
q17	1018	589	596	589
q18	8562	7144	7098	7098
q19	1534	1401	1366	1366
q20	537	335	311	311
q21	3106	2712	2697	2697
q22	359	293	297	293
Total cold run time: 45849 ms
Total hot run time: 40366 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4444	4390	4468	4390
q2	271	164	177	164
q3	3519	3659	3600	3600
q4	2405	2391	2381	2381
q5	5733	5740	5723	5723
q6	242	122	121	121
q7	2383	1856	1865	1856
q8	3536	3521	3529	3521
q9	9025	9006	9007	9006
q10	3929	4017	3986	3986
q11	500	390	388	388
q12	763	587	594	587
q13	4282	3558	3550	3550
q14	291	257	251	251
q15	570	517	527	517
q16	503	445	485	445
q17	1894	1839	1876	1839
q18	8639	8378	8242	8242
q19	1738	1725	1769	1725
q20	2257	1953	1924	1924
q21	6515	6204	6176	6176
q22	505	418	415	415
Total cold run time: 63944 ms
Total hot run time: 60807 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.54 seconds
stream load tsv: 584 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17194068592 Bytes

Comment on lines +580 to +582
if (context.getSessionVariable().isEnableMaterializedViewRewrite()) {
planner.addHook(InitMaterializationContextHook.INSTANCE);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better do this when init planner?

@@ -39,7 +39,7 @@ public class TableCollector extends DefaultPlanVisitor<Void, TableCollectorConte
public Void visit(Plan plan, TableCollectorContext context) {
if (plan instanceof CatalogRelation) {
TableIf table = ((CatalogRelation) plan).getTable();
if (context.getTargetTableTypes().contains(table.getType())) {
if (context.getTargetTableTypes().isEmpty() || context.getTargetTableTypes().contains(table.getType())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add isEmpty? should add some comment to explain it

Comment on lines +50 to +58
@Override
public boolean equals(Object obj) {
return super.equals(obj);
}

@Override
public int hashCode() {
return super.hashCode();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary code

Comment on lines +49 to +57
@Override
public boolean equals(Object obj) {
return super.equals(obj);
}

@Override
public int hashCode() {
return super.hashCode();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary code

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 7, 2023
Copy link
Contributor

github-actions bot commented Dec 7, 2023

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit be81eb1 into apache:master Dec 7, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…iew (apache#27922)

Work in process. Support inner join query rewrite by materialized view in some scene.
Such as an exmple as following:

> mv = "select  lineitem.L_LINENUMBER, orders.O_CUSTKEY " +
>             "from orders " +
>             "inner join lineitem on lineitem.L_ORDERKEY = orders.O_ORDERKEY "
>     query = "select lineitem.L_LINENUMBER " +
>             "from lineitem " +
>             "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY "
morrySnow pushed a commit that referenced this pull request Dec 5, 2024
… condition has alias (#44779)

### What problem does this PR solve?

Related PR: #27922

Problem Summary:
query and mv def are as following,` partsupp.public_col as public_col `
is alias, this would cause rewritting fail by materialized view with
msg, the graph logic between query and view is different.

      select 
      o_custkey, 
      o_orderdate, 
      o_shippriority, 
      o_comment, 
      o_orderkey, 
      orders.public_col as col1, 
      l_orderkey, 
      l_partkey, 
      l_suppkey, 
      lineitem.public_col as col2, 
      ps_partkey, 
      ps_suppkey, 
      partsupp.public_col as col3, 
      partsupp.public_col * 2 as col4, 
      o_orderkey + l_orderkey + ps_partkey * 2, 
      sum(
        o_orderkey + l_orderkey + ps_partkey * 2
      ), 
      count() as count_all 
    from 
      (
        select 
          o_custkey, 
          o_orderdate, 
          o_shippriority, 
          o_comment, 
          o_orderkey, 
          orders.public_col as public_col 
        from 
          orders
      ) orders 
      left join (
        select 
          l_orderkey, 
          l_partkey, 
          l_suppkey, 
          lineitem.public_col as public_col 
        from 
          lineitem 
        where 
          lineitem.public_col is null 
          or lineitem.public_col <> 1
      ) lineitem on l_orderkey = o_orderkey 
      inner join (
        select 
          ps_partkey, 
          ps_suppkey, 
          partsupp.public_col as public_col 
        from 
          partsupp
      ) partsupp on ps_partkey = o_orderkey
    where 
      lineitem.public_col is null 
      or lineitem.public_col <> 1 
      and o_orderkey = 2
    group by 
      1, 
      2, 
      3, 
      4, 
      5, 
      6, 
      7, 
      8, 
      9, 
      10, 
      11, 
      12, 
      13, 
      14;

### Release note

Fix rewrite fail by materialized view when filter or join condition has
alias
seawinde added a commit to seawinde/doris that referenced this pull request Dec 6, 2024
… condition has alias (apache#44779)

Related PR: apache#27922

Problem Summary:
query and mv def are as following,` partsupp.public_col as public_col `
is alias, this would cause rewritting fail by materialized view with
msg, the graph logic between query and view is different.

      select
      o_custkey,
      o_orderdate,
      o_shippriority,
      o_comment,
      o_orderkey,
      orders.public_col as col1,
      l_orderkey,
      l_partkey,
      l_suppkey,
      lineitem.public_col as col2,
      ps_partkey,
      ps_suppkey,
      partsupp.public_col as col3,
      partsupp.public_col * 2 as col4,
      o_orderkey + l_orderkey + ps_partkey * 2,
      sum(
        o_orderkey + l_orderkey + ps_partkey * 2
      ),
      count() as count_all
    from
      (
        select
          o_custkey,
          o_orderdate,
          o_shippriority,
          o_comment,
          o_orderkey,
          orders.public_col as public_col
        from
          orders
      ) orders
      left join (
        select
          l_orderkey,
          l_partkey,
          l_suppkey,
          lineitem.public_col as public_col
        from
          lineitem
        where
          lineitem.public_col is null
          or lineitem.public_col <> 1
      ) lineitem on l_orderkey = o_orderkey
      inner join (
        select
          ps_partkey,
          ps_suppkey,
          partsupp.public_col as public_col
        from
          partsupp
      ) partsupp on ps_partkey = o_orderkey
    where
      lineitem.public_col is null
      or lineitem.public_col <> 1
      and o_orderkey = 2
    group by
      1,
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      10,
      11,
      12,
      13,
      14;

Fix rewrite fail by materialized view when filter or join condition has
alias
seawinde added a commit to seawinde/doris that referenced this pull request Dec 6, 2024
… condition has alias (apache#44779)

### What problem does this PR solve?

Related PR: apache#27922

Problem Summary:
query and mv def are as following,` partsupp.public_col as public_col `
is alias, this would cause rewritting fail by materialized view with
msg, the graph logic between query and view is different.

      select 
      o_custkey, 
      o_orderdate, 
      o_shippriority, 
      o_comment, 
      o_orderkey, 
      orders.public_col as col1, 
      l_orderkey, 
      l_partkey, 
      l_suppkey, 
      lineitem.public_col as col2, 
      ps_partkey, 
      ps_suppkey, 
      partsupp.public_col as col3, 
      partsupp.public_col * 2 as col4, 
      o_orderkey + l_orderkey + ps_partkey * 2, 
      sum(
        o_orderkey + l_orderkey + ps_partkey * 2
      ), 
      count() as count_all 
    from 
      (
        select 
          o_custkey, 
          o_orderdate, 
          o_shippriority, 
          o_comment, 
          o_orderkey, 
          orders.public_col as public_col 
        from 
          orders
      ) orders 
      left join (
        select 
          l_orderkey, 
          l_partkey, 
          l_suppkey, 
          lineitem.public_col as public_col 
        from 
          lineitem 
        where 
          lineitem.public_col is null 
          or lineitem.public_col <> 1
      ) lineitem on l_orderkey = o_orderkey 
      inner join (
        select 
          ps_partkey, 
          ps_suppkey, 
          partsupp.public_col as public_col 
        from 
          partsupp
      ) partsupp on ps_partkey = o_orderkey
    where 
      lineitem.public_col is null 
      or lineitem.public_col <> 1 
      and o_orderkey = 2
    group by 
      1, 
      2, 
      3, 
      4, 
      5, 
      6, 
      7, 
      8, 
      9, 
      10, 
      11, 
      12, 
      13, 
      14;

### Release note

Fix rewrite fail by materialized view when filter or join condition has
alias
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants