Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TermSetQuery when building the Query from the user input. #1568

Open
fmassot opened this issue Sep 28, 2022 · 1 comment
Open

Use TermSetQuery when building the Query from the user input. #1568

fmassot opened this issue Sep 28, 2022 · 1 comment

Comments

@fmassot
Copy link
Contributor

fmassot commented Sep 28, 2022

#1539 introduces TermSetQuery which is a nice addition to the already supported query types.

In Quickwit, we rely on tantivy query grammar and then we use tantivy query parser to build a Box<dyn Query>. The query parser does not currently use the TermSetQuery so we can't benefit from it in Quickwit.

I'm not sure how to handle that, either in Quickwit or in tantivy. In tantivy I see 2 possible solutions:

  • add a specific operator in the query grammar to activate the TermSetQuery, something like field_name IN [1,2,3]
  • when parsing the query, find terms query on the same field and create a `TermSetQuery``

I like the first solution as it would provide a nice way for the user to express this type of query. Any thoughts on this @fulmicoton?

@PSeitz
Copy link
Contributor

PSeitz commented Sep 29, 2022

We could consider to support more complex expressions on fields by parsing a subtree.

my_field_name:(a OR b OR c)

Currently this is parsed as a phrase "(a OR b OR c)". The way to express this currently is:

my_field_name:a OR my_field_name:b OR my_field_name:c

Both would require post-processing though. We should check that the parser can handle thousands of terms.

Seems to be fast enough with linear complexity:

field:term1 OR field:term2 OR field:term3 ... 

running 3 tests
test tests::bench_100_000_terms ... bench: 127,676,714 ns/iter (+/- 1,687,208)
test tests::bench_10_000_terms  ... bench:  12,545,798 ns/iter (+/- 232,466)
test tests::bench_1_000_terms   ... bench:   1,202,724 ns/iter (+/- 16,644)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants