Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Tantivy version for slop queries #1722

Merged
merged 5 commits into from
Jul 11, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added
- Support for boolean field
- Support for slop in phrase queries

### Fixed

Expand Down
30 changes: 30 additions & 0 deletions docs/reference/query-language.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,36 @@ Quickwit supports parenthesis to group multiple clauses:
(color:red OR color:green) AND size:large
```

### Slop Operator
evanxg852000 marked this conversation as resolved.
Show resolved Hide resolved

Quickwit also supports phrase queries with a slop parameter using the slop operator `~` followed by the value of the slop. For instance, the query `body:"small bike"~2` will match documents containing the word `small`, followed by one or two words immediately followed by the word `bike`.

:::caution
Slop queries can only be used on field indexed with the [record option](./../configuration/index-config.md#text-type) set to `position` value.
:::

#### Examples:

With the following corpus:
```json
[
{"id": 1, "body": "a red bike"},
{"id": 2, "body": "a small blue bike"},
{"id": 3, "body": "a small, rusty, and yellow bike"},
{"id": 4, "body": "fred's small bike"},
{"id": 5, "body": "a tiny shelter"}
]
```
The following queries will output:

- `body:"small bird"~2`: no match []
- `body:"red bike"~2`: matches [1]
- `body:"small blue bike"~3`: matches [2]
- `body:"small bike"`: matches [4]
- `body:"small bike"~1`: matches [2, 4]
- `body:"small bike"~2`: matches [2, 4]
- `body:"small bike"~3`: matches [2, 3, 4]

### Escaping Special Characters

Special reserved characters are: `+` , `^`, `` ` ``, `:`, `{`, `}`, `"`, `[`, `]`, `(`, `)`, `~`, `!`, `\\`, `*`, `SPACE`. Such characters can still appear in query terms, but they need to be escaped by an antislash `\` .
70 changes: 70 additions & 0 deletions quickwit-search/src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,76 @@ async fn test_single_node_simple() -> anyhow::Result<()> {
Ok(())
}

async fn slop_search_and_check(
test_sandbox: &TestSandbox,
index_id: &str,
query: &str,
expected_num_match: u64,
) -> anyhow::Result<()> {
let search_request = SearchRequest {
index_id: index_id.to_string(),
query: query.to_string(),
search_fields: vec!["body".to_string()],
start_timestamp: None,
end_timestamp: None,
max_hits: 5,
start_offset: 0,
..Default::default()
};
let single_node_result = single_node_search(
&search_request,
&*test_sandbox.metastore(),
test_sandbox.storage_uri_resolver(),
)
.await?;
assert_eq!(
single_node_result.num_hits, expected_num_match,
"query: {}",
query
);
assert_eq!(
single_node_result.hits.len(),
expected_num_match as usize,
"query: {}",
query
);
Ok(())
}

#[tokio::test]
async fn test_slop_queries() -> anyhow::Result<()> {
let index_id = "slop-query";
let doc_mapping_yaml = r#"
field_mappings:
- name: title
type: text
- name: body
type: text
record: position
"#;

let test_sandbox = TestSandbox::create(index_id, doc_mapping_yaml, "{}", &["body"]).await?;
let docs = vec![
json!({"title": "one", "body": "a red bike"}),
json!({"title": "two", "body": "a small blue bike"}),
json!({"title": "three", "body": "a small, rusty, and yellow bike"}),
json!({"title": "four", "body": "fred's small bike"}),
json!({"title": "five", "body": "a tiny shelter"}),
];
test_sandbox.add_documents(docs.clone()).await?;

slop_search_and_check(&test_sandbox, index_id, "\"small bird\"~2", 0).await?;
slop_search_and_check(&test_sandbox, index_id, "\"red bike\"~2", 1).await?;
slop_search_and_check(&test_sandbox, index_id, "\"small blue bike\"~3", 1).await?;
slop_search_and_check(&test_sandbox, index_id, "\"small bike\"", 1).await?;
slop_search_and_check(&test_sandbox, index_id, "\"small bike\"~1", 2).await?;
slop_search_and_check(&test_sandbox, index_id, "\"small bike\"~2", 2).await?;
slop_search_and_check(&test_sandbox, index_id, "\"small bike\"~3", 3).await?;
slop_search_and_check(&test_sandbox, index_id, "\"tiny shelter\"~3", 1).await?;

Ok(())
}

// TODO remove me once `Iterator::is_sorted_by_key` is stabilized.
fn is_sorted<E, I: Iterator<Item = E>>(mut it: I) -> bool
where E: Ord {
Expand Down