Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](inverted index) Fix match_regexp to correctly handle empty string patterns #39503

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

zzzxl1993
Copy link
Contributor

Proposed changes

  1. Handle empty strings consistently for match_phrase with and without an index.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zzzxl1993
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38297 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b72af0f85e942bc392a2e078290ec43cd1591448, data reload: false

------ Round 1 ----------------------------------
q1	17873	4374	4321	4321
q2	2068	205	203	203
q3	10431	1134	1094	1094
q4	10166	798	772	772
q5	7766	2845	2815	2815
q6	265	162	164	162
q7	1024	686	652	652
q8	9383	2119	2066	2066
q9	7058	6589	6568	6568
q10	7074	2289	2235	2235
q11	482	271	271	271
q12	433	263	265	263
q13	17786	3022	3002	3002
q14	294	253	251	251
q15	538	525	519	519
q16	541	401	400	400
q17	992	681	683	681
q18	7382	6866	6854	6854
q19	6077	1008	962	962
q20	686	339	348	339
q21	4021	2994	2840	2840
q22	1135	1029	1027	1027
Total cold run time: 113475 ms
Total hot run time: 38297 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4444	4275	4332	4275
q2	409	295	293	293
q3	3059	2614	2625	2614
q4	1934	1652	1733	1652
q5	5640	5681	5690	5681
q6	245	160	153	153
q7	2195	1800	1789	1789
q8	3352	3449	3418	3418
q9	8810	8525	8698	8525
q10	3551	3353	3307	3307
q11	650	513	566	513
q12	838	652	663	652
q13	16690	3216	3187	3187
q14	327	304	292	292
q15	551	527	519	519
q16	506	458	474	458
q17	1813	1519	1508	1508
q18	8258	7831	7819	7819
q19	1990	1545	1589	1545
q20	2739	1889	1829	1829
q21	11166	5246	5296	5246
q22	1442	1086	1113	1086
Total cold run time: 80609 ms
Total hot run time: 56361 ms

@@ -406,15 +406,6 @@ Status FunctionMatchRegexp::execute_match(FunctionContext* context, const std::s
VLOG_DEBUG << "begin to run FunctionMatchRegexp::execute_match, parser_type: "
<< inverted_index_parser_type_to_string(inverted_index_ctx->parser_type);

if (match_query_str.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should consider other query

@doris-robot
Copy link

TPC-H: Total hot run time: 38275 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b72af0f85e942bc392a2e078290ec43cd1591448, data reload: false

------ Round 1 ----------------------------------
q1	18348	4433	4328	4328
q2	2059	209	211	209
q3	10808	986	1087	986
q4	10526	779	808	779
q5	7781	2891	2784	2784
q6	262	164	161	161
q7	1030	682	664	664
q8	9382	2134	2089	2089
q9	7257	6575	6607	6575
q10	7051	2207	2185	2185
q11	495	265	263	263
q12	419	248	258	248
q13	18906	2991	3025	2991
q14	296	248	246	246
q15	545	514	519	514
q16	515	402	401	401
q17	1010	746	623	623
q18	7628	6865	6877	6865
q19	7054	1088	1037	1037
q20	707	337	349	337
q21	3890	3081	2954	2954
q22	1127	1036	1053	1036
Total cold run time: 117096 ms
Total hot run time: 38275 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4492	4333	4230	4230
q2	414	298	295	295
q3	2852	2655	2618	2618
q4	1969	1729	1728	1728
q5	5655	5706	5678	5678
q6	232	143	146	143
q7	2203	1756	1719	1719
q8	3280	3506	3508	3506
q9	8772	8777	8789	8777
q10	3570	3336	3326	3326
q11	614	535	531	531
q12	850	645	656	645
q13	16513	3215	3206	3206
q14	320	294	289	289
q15	574	535	528	528
q16	515	463	468	463
q17	1837	1547	1557	1547
q18	8213	8057	7738	7738
q19	9293	1623	1554	1554
q20	2183	1924	1892	1892
q21	13817	5274	5422	5274
q22	1193	1085	1079	1079
Total cold run time: 89361 ms
Total hot run time: 56766 ms

@zzzxl1993
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38137 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4d9d5b9a22ec77563170f778cf2f503adceef87d, data reload: false

------ Round 1 ----------------------------------
q1	18183	4643	4376	4376
q2	2074	204	216	204
q3	11578	1002	1127	1002
q4	10524	849	770	770
q5	7786	2828	2849	2828
q6	266	158	157	157
q7	1022	655	660	655
q8	9412	2116	2132	2116
q9	7332	6547	6580	6547
q10	7076	2221	2205	2205
q11	524	270	262	262
q12	434	262	264	262
q13	18924	2990	3011	2990
q14	308	268	261	261
q15	567	523	533	523
q16	545	403	401	401
q17	1025	689	693	689
q18	7585	6996	6704	6704
q19	7568	1037	1141	1037
q20	721	345	352	345
q21	3865	2796	2984	2796
q22	1118	1007	1030	1007
Total cold run time: 118437 ms
Total hot run time: 38137 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4550	4282	4340	4282
q2	405	311	304	304
q3	2841	2629	2766	2629
q4	1973	1725	1743	1725
q5	5610	5751	5606	5606
q6	254	146	146	146
q7	2152	1770	1780	1770
q8	3313	3547	3514	3514
q9	8783	8798	8779	8779
q10	3641	3374	3306	3306
q11	634	517	512	512
q12	814	695	660	660
q13	16342	3154	3047	3047
q14	335	299	300	299
q15	580	539	518	518
q16	508	458	462	458
q17	1858	1550	1558	1550
q18	8311	7886	7642	7642
q19	5938	1630	1592	1592
q20	2183	1906	1871	1871
q21	14711	5203	5258	5203
q22	1138	1088	1069	1069
Total cold run time: 86874 ms
Total hot run time: 56482 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196443 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4d9d5b9a22ec77563170f778cf2f503adceef87d, data reload: false

query1	1317	917	891	891
query2	6632	1990	1987	1987
query3	10687	3780	3919	3780
query4	59034	24145	23374	23374
query5	5787	719	704	704
query6	549	209	208	208
query7	6423	327	333	327
query8	549	450	445	445
query9	9219	2563	2546	2546
query10	595	347	332	332
query11	18054	15071	15428	15071
query12	237	131	126	126
query13	1678	441	454	441
query14	11725	7494	7419	7419
query15	251	202	194	194
query16	7389	525	578	525
query17	1155	622	623	622
query18	2051	351	343	343
query19	317	175	178	175
query20	153	140	140	140
query21	255	146	148	146
query22	4568	4425	4356	4356
query23	34225	33670	33942	33670
query24	5694	3038	2965	2965
query25	580	431	426	426
query26	710	194	198	194
query27	1804	314	307	307
query28	4030	2198	2154	2154
query29	704	450	444	444
query30	239	189	189	189
query31	1071	854	831	831
query32	104	76	78	76
query33	550	354	334	334
query34	894	489	509	489
query35	862	761	765	761
query36	1110	981	966	966
query37	150	102	101	101
query38	4002	3910	3907	3907
query39	1529	1518	1460	1460
query40	238	153	151	151
query41	142	139	137	137
query42	138	117	116	116
query43	542	518	503	503
query44	1135	778	787	778
query45	233	199	195	195
query46	1121	738	757	738
query47	1905	1838	1828	1828
query48	408	337	334	334
query49	917	584	581	581
query50	862	477	463	463
query51	6940	6881	6801	6801
query52	120	105	121	105
query53	295	228	225	225
query54	614	503	494	494
query55	89	87	88	87
query56	340	321	304	304
query57	1176	1153	1116	1116
query58	308	307	303	303
query59	2981	2843	2815	2815
query60	345	323	325	323
query61	148	148	145	145
query62	812	692	704	692
query63	263	232	226	226
query64	3236	1855	1887	1855
query65	3230	3212	3214	3212
query66	1059	679	687	679
query67	15139	14864	14951	14864
query68	8333	588	579	579
query69	660	438	343	343
query70	1524	1153	1081	1081
query71	543	317	312	312
query72	6804	2326	2079	2079
query73	2494	351	352	351
query74	9341	8909	9084	8909
query75	4170	2740	2756	2740
query76	4595	1128	1018	1018
query77	859	436	442	436
query78	11407	9698	9109	9109
query79	10293	548	546	546
query80	1339	606	603	603
query81	615	261	262	261
query82	709	160	159	159
query83	351	210	211	210
query84	283	94	97	94
query85	781	360	356	356
query86	401	320	288	288
query87	4591	4279	4212	4212
query88	4513	2513	2490	2490
query89	534	330	325	325
query90	2288	227	228	227
query91	180	128	129	128
query92	91	75	76	75
query93	4255	540	544	540
query94	1073	328	319	319
query95	394	297	289	289
query96	633	282	279	279
query97	3262	3094	3066	3066
query98	245	234	235	234
query99	1762	1328	1289	1289
Total cold run time: 340942 ms
Total hot run time: 196443 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.55 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4d9d5b9a22ec77563170f778cf2f503adceef87d, data reload: false

query1	0.05	0.05	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.66	0.08	0.08
query5	0.52	0.49	0.48
query6	1.13	0.73	0.74
query7	0.02	0.02	0.02
query8	0.05	0.05	0.04
query9	0.55	0.50	0.51
query10	0.55	0.55	0.54
query11	0.16	0.13	0.12
query12	0.16	0.13	0.13
query13	0.63	0.59	0.59
query14	0.76	0.78	0.82
query15	0.86	0.83	0.84
query16	0.37	0.36	0.37
query17	1.05	1.00	1.06
query18	0.21	0.20	0.21
query19	1.80	1.75	1.74
query20	0.02	0.01	0.01
query21	15.43	0.68	0.69
query22	4.50	7.42	1.39
query23	18.36	1.40	1.33
query24	2.11	0.23	0.22
query25	0.15	0.08	0.09
query26	0.28	0.18	0.19
query27	0.09	0.08	0.08
query28	13.22	1.03	1.01
query29	12.64	3.30	3.29
query30	0.42	0.19	0.20
query31	2.80	0.39	0.40
query32	3.26	0.49	0.48
query33	2.96	2.99	3.01
query34	16.89	4.38	4.40
query35	4.46	4.36	4.51
query36	0.67	0.49	0.52
query37	0.23	0.17	0.18
query38	0.17	0.16	0.17
query39	0.08	0.05	0.06
query40	0.18	0.15	0.14
query41	0.12	0.09	0.08
query42	0.09	0.07	0.07
query43	0.06	0.06	0.06
Total cold run time: 110.03 s
Total hot run time: 30.55 s

Copy link
Contributor

@csun5285 csun5285 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 22, 2024
@qidaye qidaye merged commit 43f0050 into apache:master Aug 22, 2024
29 of 32 checks passed
zzzxl1993 added a commit to zzzxl1993/doris that referenced this pull request Sep 11, 2024
…ng patterns (apache#39503)

1. Handle empty strings consistently for match_phrase with and without an index.
dataroaring pushed a commit that referenced this pull request Oct 9, 2024
…ng patterns (#39503)

1. Handle empty strings consistently for match_phrase with and without an index.
@yiguolei yiguolei mentioned this pull request Nov 6, 2024
@yiguolei yiguolei mentioned this pull request Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants