Update autocomplete ES filters and use whitespace tokenizer #9

kynetiv · 2016-09-15T16:49:23Z

This PR is an update to the elastic search config for the autocomplete type on both index and search analyzers. The main differences here are:

Lower the edge_ngram min_gram to 1.
- In order to pickup terms that are significant, however maybe very short (el paso, la verne) or resemble stop words (A & M).
Remove stop words filter.
- The existing autocomplete query is using the common query which effectively is already dynamically dropping common words (including stop words) according to some of the supplied parameters. see this post for more on the common query.
Switch to whitespace tokenizer.
- The current standard tokenizer is dropping some special characters that when supplied to a query (in the context of our common query using an and for low frequency terms) will not match as the special character would be missing from any indexed documents. Using the whitespace tokenizer for indexing and searching, we instead only split on whitespace and allow terms with - & and others that join words.
Add word_delimiter filter.
- To account for some subwords that the whitespace tokenizer would lose, for example California-Berkeley, we add the word_delimiter filter that will again split on special characters. This is only helpful when specifying the preserve_original flag so that we can also index the original term so as to keep any hyphenated or other joined words. I think this is helpful in scenarios like the Berkeley example where the words can be searched independently, as well as it appears in the name. Additionally helpful when one of the two words may be a common (high frequency word).

kynetiv · 2016-09-30T19:35:13Z

This is good to go.

update autocomplete ES filters and use whitespace tokenizer

ee84d75

kynetiv changed the title ~~[WIP] update autocomplete ES filters and use whitespace tokenizer~~ Update autocomplete ES filters and use whitespace tokenizer Sep 30, 2016

kynetiv merged commit cec09af into dev Sep 30, 2016

kynetiv deleted the update-autocomplete-analyzers branch September 30, 2016 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update autocomplete ES filters and use whitespace tokenizer #9

Update autocomplete ES filters and use whitespace tokenizer #9

kynetiv commented Sep 15, 2016 •

edited

Loading

kynetiv commented Sep 30, 2016

Update autocomplete ES filters and use whitespace tokenizer #9

Update autocomplete ES filters and use whitespace tokenizer #9

Conversation

kynetiv commented Sep 15, 2016 • edited Loading

kynetiv commented Sep 30, 2016

kynetiv commented Sep 15, 2016 •

edited

Loading