Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](nereids) Support basic aggregate rewrite and function rollup using materialized view #28269

Merged
merged 9 commits into from
Dec 15, 2023

Conversation

seawinde
Copy link
Contributor

Proposed changes

Add aggregate materializedviewRules for query rewrite.
it support the query rewrite as following:

def mv = "select lineitem.L_LINENUMBER, orders.O_CUSTKEY, sum(O_TOTALPRICE) as sum_alias " +
        "from lineitem " +
        "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY " +
        "group by lineitem.L_LINENUMBER, orders.O_CUSTKEY "
def query = "select lineitem.L_LINENUMBER, sum(O_TOTALPRICE) as sum_alias " +
        "from lineitem " +
        "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY " +
        "group by lineitem.L_LINENUMBER"

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@seawinde
Copy link
Contributor Author

run buildall

Comment on lines 39 to 40
protected Plan rewriteQueryByView(MatchMode matchMode,
StructInfo queryStructInfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullable annotion

Copy link
Contributor

PR approved by anyone and no changes requested.

@seawinde seawinde force-pushed the aggregate_mv_rewrite_impl branch from 6c5ba49 to 59a12ed Compare December 12, 2023 09:05
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.11 seconds
stream load tsv: 581 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17218465350 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 1c14bb983b12405500937754d098982e9d59e993, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4713	4463	4463	4463
q2	361	153	164	153
q3	1457	1224	1198	1198
q4	1106	907	901	901
q5	3114	3148	3141	3141
q6	249	128	131	128
q7	1012	480	483	480
q8	2192	2221	2162	2162
q9	6668	6648	6669	6648
q10	3220	3267	3242	3242
q11	329	204	207	204
q12	354	202	207	202
q13	4582	3828	3754	3754
q14	243	215	215	215
q15	563	521	527	521
q16	445	397	382	382
q17	993	641	508	508
q18	7561	6888	7428	6888
q19	1526	1457	1444	1444
q20	561	290	330	290
q21	3057	2658	2658	2658
q22	349	284	287	284
Total cold run time: 44655 ms
Total hot run time: 39866 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4412	4465	4418	4418
q2	269	163	171	163
q3	3511	3492	3497	3492
q4	2394	2373	2379	2373
q5	5728	5711	5714	5711
q6	236	121	122	121
q7	2361	1821	1856	1821
q8	3521	3524	3527	3524
q9	9015	8973	8997	8973
q10	3908	3987	3994	3987
q11	505	382	381	381
q12	772	591	617	591
q13	4289	3555	3562	3555
q14	290	260	253	253
q15	578	522	518	518
q16	494	455	455	455
q17	1894	1862	1868	1862
q18	8538	8256	8262	8256
q19	1725	1722	1749	1722
q20	2275	1966	1928	1928
q21	6458	6173	6088	6088
q22	506	420	426	420
Total cold run time: 63679 ms
Total hot run time: 60612 ms

@seawinde seawinde force-pushed the aggregate_mv_rewrite_impl branch from 1c14bb9 to 3cc0628 Compare December 12, 2023 13:07
@seawinde
Copy link
Contributor Author

run buildall

@seawinde seawinde force-pushed the aggregate_mv_rewrite_impl branch from 2bf0d3f to 56e7b6c Compare December 12, 2023 13:11
@seawinde
Copy link
Contributor Author

run buildall

1 similar comment
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.24 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17219909872 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit c1d14fd8108548c3e80d86847792884c482195d6, data reload: false

run tpch-sf100 query with default conf and session variables
q1	51493	31829	31815	31815
q2	379	138	137	137
q3	1556	1257	1204	1204
q4	1113	928	923	923
q5	3220	3224	3259	3224
q6	264	141	141	141
q7	1019	496	479	479
q8	2150	2211	2185	2185
q9	6936	6933	6922	6922
q10	3250	3309	3293	3293
q11	331	216	205	205
q12	347	205	201	201
q13	4580	3839	3823	3823
q14	440	314	316	314
q15	763	1319	1317	1317
q16	440	381	391	381
q17	1025	584	551	551
q18	7532	7402	7216	7216
q19	1544	1333	1415	1333
q20	571	334	283	283
q21	3071	2623	2670	2623
q22	354	279	285	279
Total cold run time: 92378 ms
Total hot run time: 68849 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	30808	31842	31810	31810
q2	284	166	166	166
q3	3665	3662	3628	3628
q4	2393	2381	2373	2373
q5	5791	5778	5802	5778
q6	243	135	135	135
q7	3070	3401	3378	3378
q8	3538	3559	3542	3542
q9	9199	9116	9132	9116
q10	3966	4034	4076	4034
q11	804	929	1358	929
q12	766	587	585	585
q13	4276	3564	3533	3533
q14	476	431	430	430
q15	1173	1351	1317	1317
q16	493	458	485	458
q17	1898	1867	1862	1862
q18	8885	8226	8441	8226
q19	1759	1777	1771	1771
q20	2862	2585	2680	2585
q21	6514	6167	6136	6136
q22	495	415	416	415
Total cold run time: 93358 ms
Total hot run time: 92207 ms

@seawinde seawinde force-pushed the aggregate_mv_rewrite_impl branch from 585e191 to 585286d Compare December 14, 2023 00:54
@seawinde
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 585286d019746eb30bf54cbfb9eaf1ab28373c00, data reload: false

run tpch-sf100 query with default conf and session variables
q1	49179	31860	31861	31860
q2	370	135	135	135
q3	1508	1238	1242	1238
q4	1115	916	887	887
q5	3186	3171	3219	3171
q6	262	140	153	140
q7	979	495	480	480
q8	2143	2221	2209	2209
q9	6931	6875	6830	6830
q10	3242	3390	3318	3318
q11	344	226	200	200
q12	354	207	210	207
q13	4586	3877	3823	3823
q14	442	314	318	314
q15	780	1303	1345	1303
q16	435	399	388	388
q17	1025	589	568	568
q18	7266	6874	7905	6874
q19	1545	1411	1331	1331
q20	613	391	336	336
q21	3094	2618	2644	2618
q22	345	280	283	280
Total cold run time: 89744 ms
Total hot run time: 68510 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	30829	31866	31839	31839
q2	280	167	175	167
q3	3623	3633	3605	3605
q4	2391	2371	2377	2371
q5	5751	5768	5767	5767
q6	242	133	132	132
q7	3051	3406	3372	3372
q8	3557	3536	3534	3534
q9	9192	9191	9189	9189
q10	3965	4078	4066	4066
q11	820	914	1357	914
q12	767	591	593	591
q13	4275	3571	3541	3541
q14	508	441	453	441
q15	1097	1355	1316	1316
q16	517	462	449	449
q17	1873	1859	1840	1840
q18	8611	8892	8258	8258
q19	1873	1823	1789	1789
q20	2875	2617	2625	2617
q21	6606	6295	6128	6128
q22	500	423	412	412
Total cold run time: 93203 ms
Total hot run time: 92338 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.6 seconds
stream load tsv: 591 seconds loaded 74807831229 Bytes, about 120 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17219857833 Bytes

@seawinde seawinde force-pushed the aggregate_mv_rewrite_impl branch from 585286d to a4e163a Compare December 14, 2023 02:50
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit a4e163a4f35d8e0a93920dc7c4b9dd8ceb15b668, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4756	4456	4508	4456
q2	371	144	140	140
q3	1476	1306	1246	1246
q4	1111	898	884	884
q5	3187	3179	3182	3179
q6	248	130	128	128
q7	1001	491	491	491
q8	2209	2221	2205	2205
q9	6686	6702	6674	6674
q10	3223	3281	3279	3279
q11	327	205	195	195
q12	351	208	209	208
q13	4531	3783	3807	3783
q14	242	210	206	206
q15	570	523	527	523
q16	433	377	394	377
q17	1016	648	606	606
q18	7047	6799	6741	6741
q19	1575	1463	1353	1353
q20	562	306	335	306
q21	3058	2613	2640	2613
q22	343	273	282	273
Total cold run time: 44323 ms
Total hot run time: 39866 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4421	4408	4406	4406
q2	266	165	170	165
q3	3507	3499	3492	3492
q4	2395	2382	2394	2382
q5	5724	5722	5742	5722
q6	234	123	120	120
q7	2402	1870	1880	1870
q8	3519	3554	3532	3532
q9	8987	8987	9022	8987
q10	3928	4003	3993	3993
q11	507	384	372	372
q12	764	605	605	605
q13	4300	3535	3536	3535
q14	289	263	256	256
q15	578	527	524	524
q16	502	454	468	454
q17	1895	1855	1857	1855
q18	8545	8098	8153	8098
q19	1760	1762	1736	1736
q20	2243	1946	1935	1935
q21	6503	6134	6259	6134
q22	506	423	409	409
Total cold run time: 63775 ms
Total hot run time: 60582 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.92 seconds
stream load tsv: 588 seconds loaded 74807831229 Bytes, about 121 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17223853296 Bytes

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 2e0570e79182b39a6496eaa648c506e30e1f7699, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4784	4487	4538	4487
q2	361	157	177	157
q3	1451	1223	1258	1223
q4	1137	947	939	939
q5	3141	3181	3167	3167
q6	249	128	128	128
q7	1000	482	474	474
q8	2218	2206	2200	2200
q9	6728	6674	6684	6674
q10	3199	3258	3285	3258
q11	325	207	218	207
q12	354	213	210	210
q13	4560	3818	3829	3818
q14	245	218	220	218
q15	573	527	525	525
q16	444	386	390	386
q17	1017	563	564	563
q18	7206	6972	6894	6894
q19	1527	1343	1428	1343
q20	511	279	304	279
q21	3053	2622	2711	2622
q22	351	288	293	288
Total cold run time: 44434 ms
Total hot run time: 40060 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4438	4440	4441	4440
q2	267	164	175	164
q3	3549	3542	3526	3526
q4	2399	2385	2379	2379
q5	5717	5766	5735	5735
q6	243	122	122	122
q7	2367	1833	1860	1833
q8	3525	3638	3522	3522
q9	9014	8981	9027	8981
q10	3919	3976	4002	3976
q11	514	395	395	395
q12	769	603	598	598
q13	4295	3595	3580	3580
q14	283	262	254	254
q15	580	529	524	524
q16	514	460	479	460
q17	1875	1880	1834	1834
q18	8694	8302	8195	8195
q19	1744	1731	1722	1722
q20	2265	1958	1943	1943
q21	6540	6201	6202	6201
q22	523	427	421	421
Total cold run time: 64034 ms
Total hot run time: 60805 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.61 seconds
stream load tsv: 585 seconds loaded 74807831229 Bytes, about 121 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17221020451 Bytes

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.15 seconds
stream load tsv: 583 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17219866467 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 03dbc8a201cd0cebe80686de201743e32c2ba81f, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4726	4459	4486	4459
q2	363	152	160	152
q3	1466	1269	1216	1216
q4	1121	912	865	865
q5	3209	3155	3178	3155
q6	245	130	130	130
q7	986	496	476	476
q8	2190	2241	2193	2193
q9	6689	6664	6850	6664
q10	3203	3243	3247	3243
q11	330	201	198	198
q12	355	210	207	207
q13	4605	3785	3800	3785
q14	246	214	211	211
q15	579	531	525	525
q16	449	393	387	387
q17	1007	585	538	538
q18	7215	6889	7015	6889
q19	1519	1345	1405	1345
q20	545	305	299	299
q21	3043	2614	2699	2614
q22	349	280	293	280
Total cold run time: 44440 ms
Total hot run time: 39831 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4383	4401	4396	4396
q2	266	162	174	162
q3	3545	3519	3520	3519
q4	2411	2404	2392	2392
q5	5762	5717	5740	5717
q6	241	121	122	121
q7	2388	1836	1878	1836
q8	3522	3531	3527	3527
q9	9109	9056	9083	9056
q10	3900	3998	3992	3992
q11	495	378	391	378
q12	774	602	629	602
q13	4279	3604	3580	3580
q14	292	257	253	253
q15	563	518	529	518
q16	506	486	459	459
q17	1884	1843	1849	1843
q18	8770	8306	8278	8278
q19	1752	1723	1747	1723
q20	2247	1936	1921	1921
q21	6567	6196	6212	6196
q22	505	433	416	416
Total cold run time: 64161 ms
Total hot run time: 60885 ms

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 14, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 4c51558 into apache:master Dec 15, 2023
hello-stephen pushed a commit to hello-stephen/doris that referenced this pull request Dec 28, 2023
…p using materialized view (apache#28269)

Add aggregate materializedviewRules for query rewrite.
it support the query rewrite as following:

    def mv = "select lineitem.L_LINENUMBER, orders.O_CUSTKEY, sum(O_TOTALPRICE) as sum_alias " +
            "from lineitem " +
            "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY " +
            "group by lineitem.L_LINENUMBER, orders.O_CUSTKEY "
    def query = "select lineitem.L_LINENUMBER, sum(O_TOTALPRICE) as sum_alias " +
            "from lineitem " +
            "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY " +
            "group by lineitem.L_LINENUMBER"
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
…p using materialized view (apache#28269)

Add aggregate materializedviewRules for query rewrite.
it support the query rewrite as following:

    def mv = "select lineitem.L_LINENUMBER, orders.O_CUSTKEY, sum(O_TOTALPRICE) as sum_alias " +
            "from lineitem " +
            "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY " +
            "group by lineitem.L_LINENUMBER, orders.O_CUSTKEY "
    def query = "select lineitem.L_LINENUMBER, sum(O_TOTALPRICE) as sum_alias " +
            "from lineitem " +
            "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY " +
            "group by lineitem.L_LINENUMBER"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants