Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index phrases #30450

Merged
merged 24 commits into from
Jun 4, 2018
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
e39d396
Store offsets in index prefix fields when stored in the parent field
romseygeek Mar 14, 2018
e931107
Merge branch 'master' into index-phrases
romseygeek Apr 12, 2018
3719e30
Merge branch 'master' into index-phrases
romseygeek Apr 24, 2018
4cea300
Upgrade to lucene 7.4.0 snapshot
romseygeek Apr 24, 2018
efd612a
Merge branch 'master' into index-phrases
romseygeek May 2, 2018
b591fc4
Add the ability to index two-term shingles for faster phrase queries
romseygeek May 8, 2018
aca2b7e
docs
romseygeek May 8, 2018
910f0c1
Merge branch 'master' into index-phrases
romseygeek May 8, 2018
e0fe29d
Placate checkstyle
romseygeek May 8, 2018
c156cbd
changelog
romseygeek May 8, 2018
007ee3d
Merge branch 'master' into index-phrases
romseygeek May 18, 2018
66b1e48
MappedFieldType.matchQuery -> .analyzePhrase
romseygeek May 22, 2018
4d6cb66
Merge remote-tracking branch 'origin/master' into index-phrases
romseygeek May 22, 2018
9eb8a6d
iter
romseygeek May 24, 2018
69cf210
Merge remote-tracking branch 'origin/master' into index-phrases
romseygeek May 24, 2018
0deebb6
iter
romseygeek May 24, 2018
6cfa4b1
iter
romseygeek May 29, 2018
6208c31
Merge branch 'master' into index-phrases
romseygeek May 29, 2018
1d7852e
Check for positions
romseygeek May 29, 2018
289186e
Merge branch 'master' into index-phrases
romseygeek May 30, 2018
518e280
Merge branch 'master' into index-phrases
romseygeek May 30, 2018
8dd5cd5
index_phrases isn't an updateable setting
romseygeek May 30, 2018
dab97ad
Merge remote-tracking branch 'origin/master' into index-phrases
romseygeek Jun 1, 2018
b2e732a
Merge remote-tracking branch 'origin/master' into index-phrases
romseygeek Jun 1, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/reference/mapping/types/text.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,12 @@ The following parameters are accepted by `text` fields:
the expense of a larger index. Accepts an
<<index-prefix-config,`index-prefix configuration block`>>

<<index-phrases,`index_phrases`>>::

If enabled, two-term word combinations ('shingles') are indexed into a separate
field. This allows phrase queries to run more efficiently, at the expense
of a larger index. Accepts `true` or `false` (default).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should document that this feature works better when stop words are not removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


<<norms,`norms`>>::

Whether field-length should be taken into account when scoring queries.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ private ICUTokenizerConfig getIcuConfig(Environment env, Settings settings) {
if (tailored.isEmpty()) {
return null;
} else {
final BreakIterator breakers[] = new BreakIterator[UScript.CODE_LIMIT];
final RuleBasedBreakIterator breakers[] = new RuleBasedBreakIterator[UScript.CODE_LIMIT];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is already in master

for (Map.Entry<Integer, String> entry : tailored.entrySet()) {
int code = entry.getKey();
String resourcePath = entry.getValue();
Expand All @@ -105,7 +105,7 @@ public RuleBasedBreakIterator getBreakIterator(int script) {
}

//parse a single RBBi rule file
private BreakIterator parseRules(String filename, Environment env) throws IOException {
private RuleBasedBreakIterator parseRules(String filename, Environment env) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


final Path path = env.configFile().resolve(filename);
String rules = Files.readAllLines(path)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
"search with indexed phrases":
- skip:
version: " - 6.99.99"
reason: index_phrase is only available as of 7.0.0
- do:
indices.create:
index: test
body:
mappings:
test:
properties:
text:
type: text
index_phrases: true

- do:
index:
index: test
type: test
id: 1
body: { text: "peter piper picked a peck of pickled peppers" }

- do:
indices.refresh:
index: [test]

- do:
search:
index: test
body:
query:
match_phrase:
text:
query: "peter piper"

- match: {hits.total: 1}

- do:
search:
index: test
q: '"peter piper"~1'
df: text

- match: {hits.total: 1}

- do:
search:
index: test
body:
query:
match_phrase:
text: "peter piper picked"

- match: {hits.total: 1}

- do:
search:
index: test
body:
query:
match_phrase:
text: "piper"

- match: {hits.total: 1}


Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

package org.elasticsearch.index.mapper;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexReader;
Expand All @@ -43,6 +44,7 @@
import org.elasticsearch.index.query.QueryRewriteContext;
import org.elasticsearch.index.query.QueryShardContext;
import org.elasticsearch.index.query.QueryShardException;
import org.elasticsearch.index.search.MatchQuery;
import org.elasticsearch.index.similarity.SimilarityProvider;
import org.elasticsearch.search.DocValueFormat;
import org.joda.time.DateTimeZone;
Expand Down Expand Up @@ -360,6 +362,10 @@ public Query nullValueQuery() {

public abstract Query existsQuery(QueryShardContext context);

public Query analyzePhrase(String field, TokenStream stream, int slop) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe phraseQuery would be more consistent with other methods here (existsQuery, termQuery, rangeQuery)?

throw new IllegalArgumentException("Can only use phrase queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only mention text fields in the error message, even if we don't fail phrase queries on keywords, I'm not sure this is very useful?

}

/**
* An enum used to describe the relation between the range of terms in a
* shard when compared with a query range
Expand Down
Loading