Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](nereids)make agg output unchanged after normalized repeat #36369

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

feiniaofeiafei
Copy link
Contributor

@feiniaofeiafei feiniaofeiafei commented Jun 17, 2024

cherry-pick #36207 to branch-2.0
The NormalizeRepeat rule can change the output of agg.
For example:

         SELECT
             col_int_undef_signed2 AS C1 ,
             col_int_undef_signed2
         FROM
             normalize_repeat_name_unchanged
         GROUP BY
         GROUPING SETS (
         (col_int_undef_signed2),
         (col_int_undef_signed2))

Before fixing the bug, the plan is:

LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] )
      +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7], excepts=[] )
         +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true )
            +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] )
               +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`#7], excepts=[] )
                  +--LogicalOlapScan (  )

This can lead to column not found in LogicalResultSink, report error: Input slot(s) not in childs output: col_int_undef_signed2#1 in plan: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] )
child output is: [C1#7]

This pr makes agg output unchanged after normalized repeat. After fixing, the plan is:

LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] )
      +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7 as `col_int_undef_signed2`#1], excepts=[] )
         +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true )
            +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] )
               +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`#7], excepts=[] )
                  +--LogicalOlapScan (  )

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@feiniaofeiafei feiniaofeiafei marked this pull request as draft June 17, 2024 03:26
@feiniaofeiafei feiniaofeiafei changed the base branch from master to branch-2.0 June 17, 2024 03:27
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei feiniaofeiafei marked this pull request as ready for review June 17, 2024 03:38
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 49906 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e4e5c7ddb112ed8016117a0f8c493311031a6d59, data reload: false

------ Round 1 ----------------------------------
q1	17609	4397	4367	4367
q2	2079	161	147	147
q3	10459	1870	1918	1870
q4	10328	1249	1335	1249
q5	8865	3887	3927	3887
q6	233	124	152	124
q7	2024	1630	1619	1619
q8	9312	2700	2700	2700
q9	10709	10345	10253	10253
q10	8634	3502	3522	3502
q11	421	241	245	241
q12	478	303	307	303
q13	18340	3972	4028	3972
q14	363	333	322	322
q15	525	459	459	459
q16	683	574	586	574
q17	1149	1008	961	961
q18	7224	6954	6910	6910
q19	1786	1643	1625	1625
q20	568	323	308	308
q21	4455	4157	4081	4081
q22	530	434	432	432
Total cold run time: 116774 ms
Total hot run time: 49906 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4342	4346	4324	4324
q2	322	222	235	222
q3	4160	4175	4150	4150
q4	2746	2739	2732	2732
q5	7204	7089	7091	7089
q6	239	119	121	119
q7	3229	2854	2805	2805
q8	4376	4484	4502	4484
q9	16803	16879	16800	16800
q10	4228	4248	4281	4248
q11	770	695	677	677
q12	1047	859	883	859
q13	7059	3771	3778	3771
q14	462	434	425	425
q15	515	463	461	461
q16	730	706	669	669
q17	3797	3839	3876	3839
q18	8794	8758	8816	8758
q19	1743	1693	1632	1632
q20	2378	2125	2097	2097
q21	8546	8492	8413	8413
q22	1046	969	979	969
Total cold run time: 84536 ms
Total hot run time: 79543 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 204845 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e4e5c7ddb112ed8016117a0f8c493311031a6d59, data reload: false

query1	937	430	377	377
query2	6533	2819	2637	2637
query3	6928	216	203	203
query4	20241	17987	18111	17987
query5	19730	6534	6467	6467
query6	304	219	238	219
query7	4157	316	317	316
query8	448	423	384	384
query9	3123	2684	2634	2634
query10	408	286	302	286
query11	11405	10743	10744	10743
query12	125	78	77	77
query13	5590	707	689	689
query14	17571	13800	13476	13476
query15	365	241	258	241
query16	6463	283	260	260
query17	1718	1450	903	903
query18	2292	419	410	410
query19	208	163	160	160
query20	78	82	79	79
query21	188	99	103	99
query22	5230	5199	5068	5068
query23	32717	32072	31953	31953
query24	7123	6596	6640	6596
query25	521	431	419	419
query26	534	166	166	166
query27	1800	306	310	306
query28	6077	2394	2315	2315
query29	2854	2680	2861	2680
query30	240	166	168	166
query31	929	785	738	738
query32	67	64	62	62
query33	390	276	253	253
query34	862	480	490	480
query35	1127	905	928	905
query36	1267	1175	1255	1175
query37	92	60	62	60
query38	3100	2935	2917	2917
query39	1398	1318	1321	1318
query40	203	92	94	92
query41	46	43	44	43
query42	93	88	83	83
query43	801	724	690	690
query44	1211	727	724	724
query45	258	236	240	236
query46	1230	979	973	973
query47	1889	1697	1881	1697
query48	1036	743	733	733
query49	638	380	369	369
query50	867	575	618	575
query51	4809	4614	4631	4614
query52	100	77	82	77
query53	441	327	327	327
query54	2661	2473	2515	2473
query55	85	74	78	74
query56	237	232	226	226
query57	1230	1173	1036	1036
query58	225	207	201	201
query59	4150	4057	4200	4057
query60	218	221	228	221
query61	96	95	95	95
query62	855	491	457	457
query63	487	349	347	347
query64	2424	1521	1452	1452
query65	3619	3534	3545	3534
query66	777	389	376	376
query67	15441	15743	15367	15367
query68	10139	659	642	642
query69	574	342	351	342
query70	1920	1477	1494	1477
query71	418	316	315	315
query72	6513	3508	3519	3508
query73	736	313	311	311
query74	6339	5857	5915	5857
query75	5329	3624	3693	3624
query76	6497	1176	1180	1176
query77	1090	263	248	248
query78	12668	12526	12171	12171
query79	11699	643	626	626
query80	743	402	399	399
query81	494	240	233	233
query82	1055	102	100	100
query83	178	136	135	135
query84	259	71	72	71
query85	807	325	321	321
query86	337	304	329	304
query87	3251	3045	3035	3035
query88	5581	2291	2288	2288
query89	421	301	299	299
query90	2638	217	199	199
query91	170	150	138	138
query92	59	53	53	53
query93	6034	576	587	576
query94	1228	213	209	209
query95	1125	1100	1062	1062
query96	647	321	321	321
query97	6519	6348	6438	6348
query98	193	180	173	173
query99	2840	911	917	911
Total cold run time: 319768 ms
Total hot run time: 204845 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e4e5c7ddb112ed8016117a0f8c493311031a6d59, data reload: false

query1	0.02	0.02	0.02
query2	0.08	0.03	0.02
query3	0.25	0.04	0.04
query4	1.80	0.06	0.07
query5	0.53	0.52	0.52
query6	1.24	0.63	0.62
query7	0.01	0.00	0.01
query8	0.04	0.03	0.02
query9	0.53	0.50	0.47
query10	0.53	0.52	0.53
query11	0.12	0.08	0.08
query12	0.12	0.09	0.08
query13	0.62	0.61	0.62
query14	0.79	0.78	0.77
query15	0.79	0.77	0.77
query16	0.36	0.39	0.39
query17	1.03	1.02	1.02
query18	0.21	0.24	0.26
query19	1.96	1.87	1.85
query20	0.02	0.01	0.01
query21	15.86	0.54	0.53
query22	2.26	2.07	1.37
query23	17.33	0.86	0.98
query24	7.10	1.94	0.83
query25	0.39	0.06	0.08
query26	0.79	0.15	0.17
query27	0.03	0.04	0.05
query28	5.29	0.76	0.78
query29	12.60	2.28	2.33
query30	0.61	0.54	0.47
query31	2.81	0.38	0.37
query32	3.40	0.49	0.49
query33	3.08	3.07	3.05
query34	15.24	4.81	4.79
query35	4.86	4.85	4.84
query36	1.05	1.02	1.00
query37	0.06	0.04	0.05
query38	0.03	0.02	0.02
query39	0.02	0.01	0.02
query40	0.15	0.15	0.14
query41	0.07	0.01	0.02
query42	0.03	0.01	0.01
query43	0.02	0.02	0.01
Total cold run time: 104.13 s
Total hot run time: 30.24 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit e4e5c7ddb112ed8016117a0f8c493311031a6d59 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.6 seconds inserted 10000000 Rows, about 462K ops/s

@morrySnow morrySnow changed the title [Fix](nereids)make agg output unchanged after normalized repeat (#36207) [Fix](nereids)make agg output unchanged after normalized repeat Jun 19, 2024
@morrySnow morrySnow merged commit 2e2f102 into apache:branch-2.0 Jun 19, 2024
24 of 25 checks passed
hello-stephen pushed a commit that referenced this pull request Jun 19, 2024
introduced by #36369

Co-authored-by: moailing <moailing@selectdb.com>
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
…he#36369)

cherry-pick apache#36207 to branch-2.0

The NormalizeRepeat rule can change the output of agg.
For example:

         SELECT
             col_int_undef_signed2 AS C1 ,
             col_int_undef_signed2
         FROM
             normalize_repeat_name_unchanged
         GROUP BY
         GROUPING SETS (
         (col_int_undef_signed2),
         (col_int_undef_signed2))

Before fixing the bug, the plan is:

LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] )
      +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7], excepts=[] )
         +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true )
            +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] )
               +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`apache#7], excepts=[] )
                  +--LogicalOlapScan (  )

This can lead to column not found in LogicalResultSink, report error:
Input slot(s) not in childs output: col_int_undef_signed2#1 in plan:
LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] )
child output is: [C1#7]

This pr makes agg output unchanged after normalized repeat. After
fixing, the plan is:

LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] )
      +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7 as `col_int_undef_signed2`apache#1], excepts=[] )
         +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true )
            +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] )
               +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`apache#7], excepts=[] )
                  +--LogicalOlapScan (  )
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
introduced by apache#36369

Co-authored-by: moailing <moailing@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants