-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support latest vectorsearch (dev branch) and hybrid queries #1980
Conversation
This extension is part of the SQLite source tree but not built in. Primary impetus for adding it is that the vector-search extension now uses it, but as an extension can't bundle the code, so it requires the owner of the SQLite handle to load it. It's a great optimization for passing lots of values into a query in an `IN(...)` clause, without having to encode every single value into the SQL string. I added it to the SQLiteCpp library as well as C++ API for using it when binding parameters. The one place we can immediately use it in LiteCore is SQLiteKeyStore::withDocBodies(), so I updated that method. It might give us a tiny boost in replication performance...
- When vector_match() is the only criterion in the WHERE clause, OR if an explicit max_results arg is given, it's a "plain" query like already existed. - Otherwise it's a "hybrid" query, which invokes the vectorsearch extension differently (with a JOIN constraint on its rowid column.) This is less efficient, but computes distances for all the rows selected by the other WHERE tests, instead of just finding the closest docs in the whole collection, so it gives more accurate results.
b4da378
to
3d033f7
Compare
@snej Can you fix the windows build issue so the PR could get reviewed? |
Code Coverage Results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In beta 1 and 2, the default limit of the vector_match() is 3.
For the case of hybrid search without specifying the limit, from the test, it seems like there is no default limit = 3 applied anymore? If this is correct, this will be a behavior change that needs to be documented and probably need to see if it will cause any confusion when the default limit will be applied.
I have chatted with @jianminzhao to confirm my understanding. As the default limit will not be applied in the hybrid queries, we will need to explain this in the documentation (maybe with some examples). It's intuitive to understand so I hope it will not be hard for users to understand this. |
CBL-5629: Update zlib to 1.3.1 (#2032) CBL-5627: Update min MacOS version to 12.0 (#2033) CBL-5539: Add an API to check if a vector index is trained or not (#2035) CBL-5628: Update mbedtls to 2.28.8 (#2027) 374d485 Support latest vectorsearch (dev branch) and hybrid queries (#1980) 5c3c854 Lazy vector index updating (#1949) CBL-5522: Port - N1QL Parser has exponential slowdown for redundant parentheses (#1984) ab19634 Part of CBL 5579 in order to facilitate VS on .NET Android (#1993) CBL-5507: Fix index-past-end in CookieStore (#1982) CBL-5591: Binary Decoder to account for the new Logging object path (#1995) 294c3f8 Define _LIBCPP_REMOVE_TRANSITIVE_INCLUDES (#1987) CBL-5438: DateTime standard format parser (#1977) CBL-5498: Util changes for ConnectedClient (#1978) CBL-5450: Remote rev KeepBody flag could be cleared accidentally f8a8de2 Remove UWP builds from build scripts (#1954) CBL-5425: Binary Encoder to encode the (Logging) object path (#1986) CBL-4661: Fix ROUND_EVEN. (#1981)
CBL-5629: Update zlib to 1.3.1 (#2032) CBL-5627: Update min MacOS version to 12.0 (#2033) CBL-5539: Add an API to check if a vector index is trained or not (#2035) CBL-5628: Update mbedtls to 2.28.8 (#2027) 374d485 Support latest vectorsearch (dev branch) and hybrid queries (#1980) 5c3c854 Lazy vector index updating (#1949) CBL-5522: Port - N1QL Parser has exponential slowdown for redundant parentheses (#1984) ab19634 Part of CBL 5579 in order to facilitate VS on .NET Android (#1993) CBL-5507: Fix index-past-end in CookieStore (#1982) CBL-5591: Binary Decoder to account for the new Logging object path (#1995) 294c3f8 Define _LIBCPP_REMOVE_TRANSITIVE_INCLUDES (#1987) CBL-5438: DateTime standard format parser (#1977) CBL-5498: Util changes for ConnectedClient (#1978) CBL-5450: Remote rev KeepBody flag could be cleared accidentally f8a8de2 Remove UWP builds from build scripts (#1954) CBL-5425: Binary Encoder to encode the (Logging) object path (#1986) CBL-4661: Fix ROUND_EVEN. (#1981)
carray
extension, because the latestvectorsearch
library requires it.vector_match()
is the only criterion in the WHERE clause, OR if an explicitmax_results
arg is given, it's a "plain" query like already existed.max_results
given, but the query itself has a LIMIT, use the LIMIT as the max_results for the vector query. This is intuitive, and makes it so you only need to use max_results if you want to force a plain vector query in combination with other conditions.