feat(search): supporting chinese glossaryterm full text retrieval(#3914) #3956

Huyueeer · 2022-01-24T03:20:51Z

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable)

Change

source of problem: #3914
configure the analyzer to replace word segmentation in other languages supported by main_tokenizer, The tokenizer that has been tested includes smartcn_tokenizer & ik_smart, Elasticsearch analysis plugins.

…ahub-project#3914)

jjoyce0510

Wow - awesome PR!

This looks great to me. Want another pair of eyes on it, then we can ship. (cc. @dexter-mh-lee)

Thank you @Huyueeer!

metadata-service/factories/src/main/resources/application.yml

github-actions · 2022-02-01T19:36:52Z

Unit Test Results (build & test)

  70 files ±0   70 suites ±0 18m 58s ⏱️ -47s
611 tests ±0 552 ✔️ ±0 59 💤 ±0 0 ❌ ±0

Results for commit 9b1f151. ± Comparison against base commit 0fd4cb5.

♻️ This comment has been updated with latest results.

…b-project#3914)

shirshanka

LGTM!

…ahub-project#3914) (datahub-project#3956) * feat(search): supporting chinese glossaryterm full text retrieval(datahub-project#3914) * refactor(search): modify mainTokenizer to appropriate position(datahub-project#3914) Co-authored-by: Shirshanka Das <shirshanka@apache.org>

xiangqiao123 · 2023-03-14T06:47:05Z

@Huyueeer In Chinese, two Chinese characters are very common. Can you help put MIN_LENGTH configurable? It will be very helpful for Chinese word segmentation

datahub/metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/indexbuilder/SettingsBuilder.java

Line 211 in 72198f9

.put("min", "3")

Huyueeer · 2023-03-16T02:04:43Z

@Huyueeer In Chinese, two Chinese characters are very common. Can you help put MIN_LENGTH configurable? It will be very helpful for Chinese word segmentation

datahub/metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/indexbuilder/SettingsBuilder.java

Line 211 in 72198f9

.put("min", "3")

@xiangqiao123 Sorry, this part should be rebuilt. It seems that you should find the person who implements this part

feat(search): supporting chinese glossaryterm full text retrieval(dat…

6b747a2

…ahub-project#3914)

jjoyce0510 reviewed Jan 28, 2022

View reviewed changes

dexter-mh-lee reviewed Jan 28, 2022

View reviewed changes

metadata-service/factories/src/main/resources/application.yml Outdated Show resolved Hide resolved

Huyueeer and others added 2 commits February 10, 2022 08:10

refactor(search): modify mainTokenizer to appropriate position(datahu…

52bb940

…b-project#3914)

Merge branch 'master' into chinese_support

9b1f151

shirshanka approved these changes Feb 25, 2022

View reviewed changes

shirshanka merged commit 3a0fe44 into datahub-project:master Feb 25, 2022

Huyueeer deleted the chinese_support branch March 4, 2022 02:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): supporting chinese glossaryterm full text retrieval(#3914) #3956

feat(search): supporting chinese glossaryterm full text retrieval(#3914) #3956

Huyueeer commented Jan 24, 2022 •

edited

Loading

jjoyce0510 left a comment

github-actions bot commented Feb 1, 2022 •

edited

Loading

shirshanka left a comment

xiangqiao123 commented Mar 14, 2023 •

edited

Loading

Huyueeer commented Mar 16, 2023

feat(search): supporting chinese glossaryterm full text retrieval(#3914) #3956

feat(search): supporting chinese glossaryterm full text retrieval(#3914) #3956

Conversation

Huyueeer commented Jan 24, 2022 • edited Loading

Checklist

Change

jjoyce0510 left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 1, 2022 • edited Loading

Unit Test Results (build & test)

shirshanka left a comment

Choose a reason for hiding this comment

xiangqiao123 commented Mar 14, 2023 • edited Loading

Huyueeer commented Mar 16, 2023

Huyueeer commented Jan 24, 2022 •

edited

Loading

github-actions bot commented Feb 1, 2022 •

edited

Loading

xiangqiao123 commented Mar 14, 2023 •

edited

Loading