Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve](group commit) Group commit support max filter ratio when rows is less than value in config #28139

Merged
merged 5 commits into from
Dec 12, 2023

Conversation

mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented Dec 7, 2023

Proposed changes

if a load rows is less than group_commit_memory_rows_for_max_filter_ratio(default 10000), will check if the load meet the max_filter_ratio.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

std::make_shared<doris::vectorized::FutureBlock>();
future_block->swap(*(output_block.get()));
Status GroupCommitBlockSink::_add_blocks() {
DCHECK(_is_block_appended == false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant boolean literal supplied to boolean operator [readability-simplify-boolean-expr]

Suggested change
DCHECK(_is_block_appended == false);
DCHECK(!_is_block_appended);

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Dec 8, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.23 seconds
stream load tsv: 593 seconds loaded 74807831229 Bytes, about 120 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17217527646 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 168031efe1f2e3286239430ed53918a5c41ba391, data reload: false

run tpch-sf100 query with default conf and session variables
q1	21161	4440	4374	4374
q2	2577	138	137	137
q3	12318	1129	1200	1129
q4	12315	826	851	826
q5	9345	3128	3119	3119
q6	247	131	129	129
q7	980	492	488	488
q8	11150	2142	2146	2142
q9	6986	6705	6672	6672
q10	9716	3242	3225	3225
q11	371	209	206	206
q12	361	207	201	201
q13	21623	3790	3813	3790
q14	239	212	211	211
q15	573	522	533	522
q16	438	385	397	385
q17	1001	604	571	571
q18	7434	7027	7186	7027
q19	1719	1479	1377	1377
q20	641	297	319	297
q21	3007	2654	2627	2627
q22	350	279	285	279
Total cold run time: 124552 ms
Total hot run time: 39734 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4393	4379	4404	4379
q2	263	168	172	168
q3	3535	3517	3505	3505
q4	2363	2358	2365	2358
q5	5723	5699	5673	5673
q6	237	123	122	122
q7	2354	1876	1851	1851
q8	3481	3471	3486	3471
q9	8984	8937	8981	8937
q10	3884	3959	3973	3959
q11	505	384	393	384
q12	763	590	583	583
q13	4700	3580	3564	3564
q14	278	260	266	260
q15	572	524	519	519
q16	507	454	473	454
q17	1830	1842	1865	1842
q18	8738	8361	8354	8354
q19	1696	1733	1715	1715
q20	2259	1947	1949	1947
q21	6455	6118	6138	6118
q22	505	428	423	423
Total cold run time: 64025 ms
Total hot run time: 60586 ms

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Dec 11, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit fc3cd958b84ff41c53252cd2285638e00b376712, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4733	4447	4499	4447
q2	364	153	157	153
q3	1459	1244	1182	1182
q4	1102	888	829	829
q5	3139	4319	3165	3165
q6	254	125	126	125
q7	998	493	525	493
q8	2172	2208	2198	2198
q9	6690	6674	6668	6668
q10	3208	3239	3284	3239
q11	324	205	206	205
q12	350	201	201	201
q13	4551	3813	3761	3761
q14	246	213	215	213
q15	570	524	518	518
q16	444	391	392	391
q17	1018	555	539	539
q18	7597	7311	7839	7311
q19	1517	1371	1455	1371
q20	1160	289	513	289
q21	3080	2679	2677	2677
q22	354	280	282	280
Total cold run time: 45330 ms
Total hot run time: 40255 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4379	4475	4389	4389
q2	268	162	175	162
q3	3557	3571	3564	3564
q4	2396	2415	2392	2392
q5	5759	5747	5748	5747
q6	239	117	123	117
q7	2410	1886	1890	1886
q8	3509	3531	3541	3531
q9	9109	9075	9051	9051
q10	3899	3977	3968	3968
q11	513	381	381	381
q12	764	601	605	601
q13	4286	3605	3582	3582
q14	287	247	242	242
q15	567	513	515	513
q16	498	435	461	435
q17	1879	1866	1859	1859
q18	8770	8936	8339	8339
q19	1741	1731	1741	1731
q20	2256	1961	1948	1948
q21	6532	6216	6209	6209
q22	506	432	421	421
Total cold run time: 64124 ms
Total hot run time: 61068 ms

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Dec 11, 2023

run buildall

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Dec 11, 2023
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit f30772c5094c5e6a3393082c37235e1799f4b617, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4660	4455	4488	4455
q2	362	152	156	152
q3	1456	1251	1198	1198
q4	1109	916	908	908
q5	3131	3138	3123	3123
q6	256	128	126	126
q7	987	483	479	479
q8	2194	2225	2192	2192
q9	6662	6673	6690	6673
q10	3223	3253	3244	3244
q11	315	203	197	197
q12	355	206	197	197
q13	4556	3862	3829	3829
q14	241	213	211	211
q15	569	527	522	522
q16	446	379	377	377
q17	1007	559	528	528
q18	8151	9009	7621	7621
q19	1512	1263	1396	1263
q20	528	363	303	303
q21	3066	2662	2688	2662
q22	352	277	283	277
Total cold run time: 45138 ms
Total hot run time: 40537 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4386	4531	4411	4411
q2	267	163	171	163
q3	3551	3541	3528	3528
q4	2382	2376	2377	2376
q5	5732	5730	5732	5730
q6	243	120	119	119
q7	2398	1910	1857	1857
q8	3524	3521	3519	3519
q9	8999	8915	8948	8915
q10	3884	3962	3970	3962
q11	503	377	387	377
q12	757	598	591	591
q13	4298	3577	3522	3522
q14	286	250	267	250
q15	568	518	520	518
q16	488	458	463	458
q17	1890	1848	1840	1840
q18	8673	8274	8415	8274
q19	1712	1734	1754	1734
q20	2252	1944	1939	1939
q21	6531	6192	6135	6135
q22	508	432	420	420
Total cold run time: 63832 ms
Total hot run time: 60638 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.71 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17220750714 Bytes

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 45b2dba into apache:master Dec 12, 2023
xzj7019 pushed a commit to xzj7019/doris that referenced this pull request Dec 13, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants