Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](arrow-flight) Modify FE Arrow version to 14.0.1 #28093

Merged
merged 1 commit into from
Dec 7, 2023

Conversation

xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Dec 6, 2023

Proposed changes

Previously temporarily upgrade Arrow to dev version 15.0.0-SNAPSHOT, because the latest release version Arrow 14.0.1 jdbc:arrow-flight-sql has BUG, jdbc:arrow-flight-sql cannot be used normally, see: apache/arrow#38785

But Arrow 15.0.0-SNAPSHOT was not published to the Maven central repository, and the network could not be connected sometimes, so back to Arrow 14.0.1. jdbc:arrow-flight-sql will be supported after upgrading to Arrow 15.0.0 release version.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Dec 6, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 02428d5e79e8cc401baf75b7e195925943931249, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4698	4459	4482	4459
q2	361	149	158	149
q3	1470	1245	1239	1239
q4	1110	965	914	914
q5	3181	3210	3184	3184
q6	251	128	131	128
q7	1012	495	504	495
q8	2220	2218	2180	2180
q9	6693	6724	6690	6690
q10	3240	3289	3263	3263
q11	328	209	204	204
q12	356	207	210	207
q13	4577	3772	3750	3750
q14	242	215	218	215
q15	568	530	529	529
q16	443	388	399	388
q17	1028	563	581	563
q18	7820	8294	7038	7038
q19	1528	1402	1427	1402
q20	501	298	337	298
q21	3141	2677	2672	2672
q22	352	286	299	286
Total cold run time: 45120 ms
Total hot run time: 40253 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4393	4406	4416	4406
q2	268	165	173	165
q3	3530	3516	3519	3516
q4	2398	2386	2376	2376
q5	5732	5738	5750	5738
q6	240	119	121	119
q7	2411	1884	1869	1869
q8	3520	3521	3531	3521
q9	9103	9026	8988	8988
q10	3944	3980	4018	3980
q11	502	388	368	368
q12	760	601	603	601
q13	4287	3547	3543	3543
q14	292	267	250	250
q15	565	516	529	516
q16	498	469	471	469
q17	1871	1873	1884	1873
q18	8751	8824	8451	8451
q19	1720	1726	1746	1726
q20	2242	1937	1936	1936
q21	6550	6183	6144	6144
q22	501	406	415	406
Total cold run time: 64078 ms
Total hot run time: 60961 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.37 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17194570584 Bytes

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Dec 7, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 6bc5962b1be1f0153d9cefe75313ff9b194ec273, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4714	4459	4476	4459
q2	371	144	158	144
q3	1464	1234	1245	1234
q4	1111	928	901	901
q5	3151	3182	3137	3137
q6	252	129	124	124
q7	998	493	490	490
q8	2226	2221	2166	2166
q9	6680	6682	6657	6657
q10	3227	3270	3260	3260
q11	329	204	212	204
q12	358	214	208	208
q13	4573	3775	4784	3775
q14	241	218	216	216
q15	599	526	517	517
q16	443	390	390	390
q17	1015	588	585	585
q18	7394	7151	7077	7077
q19	1516	1354	1424	1354
q20	512	322	1209	322
q21	3098	2690	2727	2690
q22	354	293	305	293
Total cold run time: 44626 ms
Total hot run time: 40203 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4442	4456	4380	4380
q2	268	162	173	162
q3	3535	3536	3518	3518
q4	2383	2371	2355	2355
q5	5743	5717	5745	5717
q6	242	122	122	122
q7	2373	1872	1852	1852
q8	3510	3506	3531	3506
q9	9024	8996	8997	8996
q10	3910	4013	4001	4001
q11	500	367	359	359
q12	763	586	595	586
q13	4324	3554	3560	3554
q14	286	256	257	256
q15	565	512	519	512
q16	505	444	450	444
q17	1869	1863	1862	1862
q18	8744	8394	9024	8394
q19	1740	1771	1737	1737
q20	2260	1964	1947	1947
q21	6516	6145	6140	6140
q22	498	425	421	421
Total cold run time: 64000 ms
Total hot run time: 60821 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.64 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17194244724 Bytes

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Dec 7, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 1b23e5c02ccafe3d95743a8e24b61af79faa1048, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4701	4443	4508	4443
q2	358	156	169	156
q3	1465	1258	1279	1258
q4	1104	964	926	926
q5	3116	3138	3157	3138
q6	250	129	124	124
q7	984	499	483	483
q8	2134	2254	2186	2186
q9	6688	6700	6629	6629
q10	3225	3231	3271	3231
q11	321	207	199	199
q12	365	219	211	211
q13	4567	3790	3799	3790
q14	245	212	223	212
q15	569	517	525	517
q16	444	390	398	390
q17	1005	592	601	592
q18	7365	7076	7537	7076
q19	1515	1394	1393	1393
q20	605	342	335	335
q21	3118	2652	2698	2652
q22	355	291	295	291
Total cold run time: 44499 ms
Total hot run time: 40232 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4389	4408	4384	4384
q2	266	161	176	161
q3	3536	3514	3507	3507
q4	2367	2361	2367	2361
q5	5740	5730	5754	5730
q6	239	119	120	119
q7	2369	1883	1876	1876
q8	3524	3526	3567	3526
q9	9043	9015	8992	8992
q10	3894	3986	4005	3986
q11	504	388	388	388
q12	759	614	588	588
q13	4285	3589	3556	3556
q14	283	254	260	254
q15	563	517	526	517
q16	492	450	478	450
q17	1880	1852	1884	1852
q18	8619	8332	8161	8161
q19	1706	1765	1763	1763
q20	2267	1949	1926	1926
q21	6502	6097	6195	6097
q22	504	419	406	406
Total cold run time: 63731 ms
Total hot run time: 60600 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.41 seconds
stream load tsv: 585 seconds loaded 74807831229 Bytes, about 121 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17194475952 Bytes

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 7, 2023
Copy link
Contributor

github-actions bot commented Dec 7, 2023

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 7, 2023

PR approved by anyone and no changes requested.

Copy link
Member

@mrhhsg mrhhsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yiguolei yiguolei merged commit 397a401 into apache:master Dec 7, 2023
26 of 28 checks passed
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
Previously temporarily upgrade Arrow to dev version 15.0.0-SNAPSHOT, because the latest release version Arrow 14.0.1 jdbc:arrow-flight-sql has BUG, jdbc:arrow-flight-sql cannot be used normally, see: apache/arrow#38785

But Arrow 15.0.0-SNAPSHOT was not published to the Maven central repository, and the network could not be connected sometimes, so back to Arrow 14.0.1. jdbc:arrow-flight-sql will be supported after upgrading to Arrow 15.0.0 release version.
hello-stephen pushed a commit to hello-stephen/doris that referenced this pull request Dec 28, 2023
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants