Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](arm) Optimize the BlockBloomFilter::bucket_find on ARM platforms using NEON instructions. #38888

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

Mryange
Copy link
Contributor

@Mryange Mryange commented Aug 5, 2024

Proposed changes

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_BucketFindNeon         8.14 ns         8.14 ns    344002441
BM_BucketFindNative       17.5 ns         17.5 ns    160152491

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions github-actions bot added the doing label Aug 5, 2024
@Mryange
Copy link
Contributor Author

Mryange commented Aug 5, 2024

run buildall

Copy link
Contributor

github-actions bot commented Aug 5, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 42004 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f266a4a9833097da6b3bf125770d6d3d6510216f, data reload: false

------ Round 1 ----------------------------------
q1	17759	4299	4092	4092
q2	2027	204	219	204
q3	10565	1355	1388	1355
q4	10247	866	1011	866
q5	7675	3001	3001	3001
q6	225	141	143	141
q7	1070	621	609	609
q8	9463	1925	1973	1925
q9	8544	6629	6633	6629
q10	8737	3862	3862	3862
q11	436	253	257	253
q12	409	227	225	225
q13	17765	2942	2969	2942
q14	271	245	241	241
q15	527	489	491	489
q16	490	399	389	389
q17	979	929	929	929
q18	8167	7400	7297	7297
q19	1395	1223	1217	1217
q20	568	324	359	324
q21	5376	4839	4731	4731
q22	356	283	284	283
Total cold run time: 113051 ms
Total hot run time: 42004 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4067	4015	4033	4015
q2	334	221	217	217
q3	3006	3051	3011	3011
q4	1899	1880	1904	1880
q5	5278	5253	5242	5242
q6	217	128	133	128
q7	2060	1706	1723	1706
q8	3212	3293	3282	3282
q9	8367	8340	8346	8340
q10	3759	3822	3846	3822
q11	564	453	468	453
q12	745	595	583	583
q13	13751	2962	2965	2962
q14	288	262	261	261
q15	517	487	478	478
q16	449	405	401	401
q17	1729	1728	1729	1728
q18	7801	7442	7189	7189
q19	1692	1675	1672	1672
q20	1969	1774	1764	1764
q21	5481	5228	5225	5225
q22	536	459	445	445
Total cold run time: 67721 ms
Total hot run time: 54804 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168984 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f266a4a9833097da6b3bf125770d6d3d6510216f, data reload: false

query1	919	394	382	382
query2	6489	1736	1721	1721
query3	6669	212	219	212
query4	20382	17542	17576	17542
query5	4308	514	526	514
query6	295	176	168	168
query7	4603	301	304	301
query8	259	214	199	199
query9	8503	2397	2385	2385
query10	465	274	283	274
query11	10413	9945	9955	9945
query12	146	88	91	88
query13	1639	386	373	373
query14	9602	6945	7038	6945
query15	218	169	159	159
query16	6993	448	437	437
query17	921	560	550	550
query18	1837	277	277	277
query19	193	148	153	148
query20	94	87	86	86
query21	206	104	105	104
query22	4169	4135	4235	4135
query23	33869	32913	32813	32813
query24	10348	3054	2983	2983
query25	669	380	390	380
query26	1726	174	158	158
query27	2887	274	294	274
query28	6883	1991	1970	1970
query29	1193	425	414	414
query30	282	149	150	149
query31	942	726	759	726
query32	102	53	54	53
query33	699	315	321	315
query34	933	503	500	500
query35	854	721	722	721
query36	1023	845	879	845
query37	267	79	78	78
query38	2858	2771	2760	2760
query39	856	799	809	799
query40	290	114	119	114
query41	47	46	44	44
query42	123	98	107	98
query43	461	411	427	411
query44	1196	735	730	730
query45	207	177	179	177
query46	1081	794	782	782
query47	1781	1702	1733	1702
query48	386	286	288	286
query49	1176	437	441	437
query50	912	448	440	440
query51	6702	6683	6655	6655
query52	102	93	89	89
query53	265	186	182	182
query54	626	466	461	461
query55	79	76	80	76
query56	287	257	271	257
query57	1131	1056	1037	1037
query58	283	263	268	263
query59	2610	2387	2302	2302
query60	300	287	285	285
query61	94	107	94	94
query62	926	655	668	655
query63	221	189	186	186
query64	5942	1924	1878	1878
query65	3149	3090	3106	3090
query66	1438	334	333	333
query67	15350	14804	14888	14804
query68	4301	574	597	574
query69	455	296	304	296
query70	1121	1020	1044	1020
query71	365	280	285	280
query72	7163	2656	2494	2494
query73	763	334	332	332
query74	5986	5658	5664	5658
query75	3357	2732	2761	2732
query76	2223	1224	1268	1224
query77	428	312	329	312
query78	9404	9046	8956	8956
query79	1938	540	543	540
query80	1204	534	532	532
query81	560	227	223	223
query82	1064	135	131	131
query83	244	181	173	173
query84	277	82	86	82
query85	1337	325	298	298
query86	420	290	276	276
query87	3260	3140	3091	3091
query88	2967	2424	2401	2401
query89	379	294	295	294
query90	1815	199	192	192
query91	132	102	100	100
query92	63	51	52	51
query93	1593	625	622	622
query94	956	307	293	293
query95	385	276	269	269
query96	602	279	274	274
query97	3210	3052	3044	3044
query98	235	200	196	196
query99	1625	1319	1273	1273
Total cold run time: 262711 ms
Total hot run time: 168984 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f266a4a9833097da6b3bf125770d6d3d6510216f, data reload: false

query1	0.04	0.05	0.04
query2	0.07	0.04	0.04
query3	0.22	0.04	0.05
query4	1.68	0.06	0.07
query5	0.50	0.48	0.49
query6	1.14	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.57	0.51	0.52
query10	0.56	0.57	0.56
query11	0.16	0.11	0.11
query12	0.15	0.12	0.12
query13	0.61	0.61	0.60
query14	0.77	0.80	0.79
query15	0.94	0.86	0.87
query16	0.35	0.37	0.35
query17	1.01	0.99	1.01
query18	0.23	0.22	0.21
query19	1.86	1.72	1.72
query20	0.01	0.00	0.01
query21	15.39	0.76	0.66
query22	3.97	7.85	1.42
query23	18.02	1.33	1.28
query24	2.27	0.22	0.22
query25	0.18	0.08	0.07
query26	0.33	0.22	0.21
query27	0.45	0.23	0.23
query28	13.16	1.02	0.97
query29	12.53	3.32	3.28
query30	0.25	0.06	0.06
query31	2.87	0.42	0.40
query32	3.24	0.50	0.48
query33	2.97	2.98	2.94
query34	15.43	4.27	4.23
query35	4.28	4.25	4.29
query36	0.67	0.48	0.49
query37	0.18	0.16	0.16
query38	0.16	0.16	0.15
query39	0.04	0.04	0.03
query40	0.16	0.13	0.14
query41	0.10	0.05	0.05
query42	0.05	0.04	0.05
query43	0.05	0.05	0.05
Total cold run time: 107.69 s
Total hot run time: 30.04 s

@hello-stephen
Copy link
Contributor

run arm

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 19, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@zhangstar333 zhangstar333 merged commit d3d3584 into apache:master Aug 19, 2024
29 of 33 checks passed
dataroaring pushed a commit that referenced this pull request Oct 9, 2024
…s using NEON instructions. (#38888)

## Proposed changes

```
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_BucketFindNeon         8.14 ns         8.14 ns    344002441
BM_BucketFindNative       17.5 ns         17.5 ns    160152491
```
Mryange added a commit to Mryange/doris that referenced this pull request Nov 8, 2024
…s using NEON instructions. (apache#38888)

## Proposed changes

```
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_BucketFindNeon         8.14 ns         8.14 ns    344002441
BM_BucketFindNative       17.5 ns         17.5 ns    160152491
```
yiguolei pushed a commit that referenced this pull request Nov 10, 2024
#43508)

…s using NEON instructions. (#38888)
#38888
## Proposed changes

```
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
BM_BucketFindNeon         8.14 ns         8.14 ns    344002441
BM_BucketFindNative       17.5 ns         17.5 ns    160152491
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants