Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](invert index) match_phrase_prefix feature added #27404

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

zzzxl1993
Copy link
Contributor

Proposed changes

select count() from test_index_match_phrase_prefix where request match_phrase_prefix 'xxx';

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <CLucene.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'CLucene.h' file not found [clang-diagnostic-error]

#include <CLucene.h>
         ^


#pragma once

#include <CLucene.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'CLucene.h' file not found [clang-diagnostic-error]

#include <CLucene.h>
         ^


#pragma once

#include <CLucene.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'CLucene.h' file not found [clang-diagnostic-error]

#include <CLucene.h>
         ^

@@ -479,6 +489,24 @@ Status FullTextIndexReader::match_all_index_search(
return Status::OK();
}

Status FullTextIndexReader::match_phrase_prefix_index_search(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'match_phrase_prefix_index_search' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status FullTextIndexReader::match_phrase_prefix_index_search(
static Status FullTextIndexReader::match_phrase_prefix_index_search(


String get_name() const override { return name; }

Status execute_match(const std::string& column_name, const std::string& match_query_str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'execute_match' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status execute_match(const std::string& column_name, const std::string& match_query_str,
static Status execute_match(const std::string& column_name, const std::string& match_query_str,

be/src/vec/functions/match.h:141:

-                          ColumnUInt8::Container& result) const override {
+                          ColumnUInt8::Container& result) override {

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.54% (8449/23121)
Line Coverage: 28.83% (68687/238226)
Region Coverage: 27.79% (35515/127790)
Branch Coverage: 24.54% (18113/73804)
Coverage Report: http://coverage.selectdb-in.cc/coverage/086b102e80002980858e0938d1c596de85950ad4_086b102e80002980858e0938d1c596de85950ad4/report/index.html

@zzzxl1993 zzzxl1993 force-pushed the match_phrase_prefix branch 3 times, most recently from 91370c9 to aea4ecf Compare November 23, 2023 08:21
@zzzxl1993
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.56% (8450/23110)
Line Coverage: 28.85% (68675/238032)
Region Coverage: 27.82% (35517/127687)
Branch Coverage: 24.55% (18121/73808)
Coverage Report: http://coverage.selectdb-in.cc/coverage/aea4ecf37d53d3ab98dbc4fdf2bb9aa0ff0d2160_aea4ecf37d53d3ab98dbc4fdf2bb9aa0ff0d2160/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.77 seconds
stream load tsv: 571 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.4 seconds inserted 10000000 Rows, about 352K ops/s
storage size: 17099192326 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit aea4ecf37d53d3ab98dbc4fdf2bb9aa0ff0d2160, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4927	4659	4666	4659
q2	363	155	157	155
q3	2040	1979	1925	1925
q4	1401	1257	1243	1243
q5	3948	3956	4021	3956
q6	251	127	131	127
q7	1413	881	884	881
q8	2774	2783	2771	2771
q9	9910	9568	9503	9503
q10	3470	3528	3500	3500
q11	380	250	245	245
q12	440	294	292	292
q13	4578	3798	3798	3798
q14	317	291	287	287
q15	596	536	518	518
q16	666	581	587	581
q17	1137	978	937	937
q18	7846	7390	7513	7390
q19	1661	1659	1673	1659
q20	578	307	310	307
q21	4458	3986	4011	3986
q22	478	386	386	386
Total cold run time: 53632 ms
Total hot run time: 49106 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4566	4570	4567	4567
q2	330	230	259	230
q3	4015	4001	4008	4001
q4	2709	2698	2704	2698
q5	9664	9609	9689	9609
q6	246	121	124	121
q7	3044	2504	2451	2451
q8	4469	4467	4439	4439
q9	12932	12771	12852	12771
q10	4051	4176	4184	4176
q11	738	642	647	642
q12	975	825	817	817
q13	4318	3573	3553	3553
q14	377	336	336	336
q15	577	524	516	516
q16	733	667	671	667
q17	3862	3908	3831	3831
q18	9644	9110	9092	9092
q19	1826	1760	1782	1760
q20	2420	2083	2045	2045
q21	8812	8665	8586	8586
q22	922	768	774	768
Total cold run time: 81230 ms
Total hot run time: 77676 ms

InvertedIndexCtx* inverted_index_ctx,
const ColumnArray::Offsets64* array_offsets,
ColumnUInt8::Container& result) const override {
return Status::Error<ErrorCode::INVERTED_INDEX_NOT_SUPPORTED>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we need to support

@zzzxl1993 zzzxl1993 force-pushed the match_phrase_prefix branch from aea4ecf to 310062b Compare December 4, 2023 10:10
@zzzxl1993
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.33 seconds
stream load tsv: 566 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.5 seconds inserted 10000000 Rows, about 338K ops/s
storage size: 17163725835 Bytes

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Dec 5, 2023

PR approved by anyone and no changes requested.

@zzzxl1993 zzzxl1993 force-pushed the match_phrase_prefix branch from 310062b to ca62633 Compare December 5, 2023 06:45
@zzzxl1993 zzzxl1993 force-pushed the match_phrase_prefix branch from ca62633 to dc1fd8d Compare December 5, 2023 07:56
@zzzxl1993
Copy link
Contributor Author

run buildall

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 5, 2023
Copy link
Contributor

github-actions bot commented Dec 5, 2023

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.09 seconds
stream load tsv: 563 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17163589089 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit dc1fd8ddc6ab995436af492463a1477bcc258133, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4718	4508	4450	4450
q2	364	151	162	151
q3	1469	1265	1241	1241
q4	1120	934	958	934
q5	3219	3178	3161	3161
q6	251	130	127	127
q7	988	498	476	476
q8	2221	2250	2171	2171
q9	6671	6667	6662	6662
q10	3228	3281	3254	3254
q11	320	201	196	196
q12	349	212	209	209
q13	4550	3823	3771	3771
q14	241	213	222	213
q15	567	533	520	520
q16	440	385	379	379
q17	993	601	580	580
q18	7780	7347	6969	6969
q19	1518	1414	1421	1414
q20	537	343	311	311
q21	3097	2649	2686	2649
q22	357	281	285	281
Total cold run time: 44998 ms
Total hot run time: 40119 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4374	4341	4384	4341
q2	266	163	177	163
q3	3523	3524	3523	3523
q4	2380	2368	2378	2368
q5	5761	5737	5739	5737
q6	239	119	120	119
q7	2360	1859	1849	1849
q8	3511	3518	3524	3518
q9	9091	9061	9053	9053
q10	3928	3968	3979	3968
q11	504	373	380	373
q12	767	594	603	594
q13	4305	3603	3562	3562
q14	272	241	245	241
q15	573	510	529	510
q16	487	446	463	446
q17	1886	1852	1839	1839
q18	8648	8067	8086	8067
q19	1737	1768	1743	1743
q20	2255	1967	1940	1940
q21	6540	6190	6168	6168
q22	518	424	428	424
Total cold run time: 63925 ms
Total hot run time: 60546 ms

@qidaye qidaye merged commit 05adbfd into apache:master Dec 5, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…7404)

select count() from test_index_match_phrase_prefix where request match_phrase_prefix 'xxx';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.4 meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants