Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cube Operator for VectorSearch #8576

Open
mauriciocirelli opened this issue Aug 13, 2024 · 2 comments
Open

Cube Operator for VectorSearch #8576

mauriciocirelli opened this issue Aug 13, 2024 · 2 comments
Labels
cube store Issues relating to Cube Store enhancement New feature proposal

Comments

@mauriciocirelli
Copy link

Dear community,

We have a demand for performing similarity searches on our database, mainly due to the AI demand.
Users may ask questions with typos and a similarity operator would be way better than the traditional equality/contains operators.

Since most of popular dabatases like Postgres and Mongo have operators for performing Vector Similarity Searches, I think Cube users would benefit greatly from it.

Please, kindly consider this feature request.

@igorlukanin
Copy link
Member

igorlukanin commented Aug 13, 2024

Hi @mauriciocirelli 👋

Thanks for filing this!

Users may ask questions with typos...

So, do I understand it correctly, that you'd like to be able to respond to full-fledged "questions" rather than, say, to filter on a single field in a query? For questions, does the AI API look like a good fit?

@igorlukanin igorlukanin added the question The issue is a question. Please use Stack Overflow for questions. label Aug 13, 2024
@mauriciocirelli
Copy link
Author

mauriciocirelli commented Aug 13, 2024

Hi @mauriciocirelli 👋

Thanks for filing this!

Users may ask questions with typos...
So, do I understand it correctly, that you'd like to be able to respond to full-fledged "questions" rather than, say, to filter on a single field in a query? For questions, does the AI API look like a good fit?

Hi @igorlukanin

We have been playing with this for almost an year so far, so the AI API wasnt available. Now it is on Beta and we are evaluating it. It works pretty well, but it does not fix typos on the queries like that.
So a similarity comparator is still needed.

For instance, I may ask a question About "Iggor". The AI API will generate a query with a filter on Name equals to "Iggor", which would not match any records.
The AI could be improved to use the similarity operator instead and create a query with a filter on Name similar to "Iggor", which would match "Igor" by similarity score.
Or, we could use queryRewrite to change the equality comparison to a similarity comparison on the fly when the query comes from the AI - this is what we are doing right now, using a Similarity HTTP API we have designed. We could avoid using this custom HTTP API if Cube had a built-in similarity comparator.

EDIT:

We just need to figure out a way that those filters would still use the pre-aggregation caches.
The approach we have been using so far replaces the values on the filter, but keeps the equality operator, so all queries still match the pre-aggregation caches.

It is important that this similarity operator still works with pre-aggregations. It may query the DB for the similar values, but the final query should still be able to match a pre-aggregation.

Ultimately, Cube could fetch the possible values periodically from the db (using a refresh-key or a special kind of pre-aggregation) and run the similarity search on this cache. This seems a nice approach as it would eliminate the need to write code to run similarity queries on all supported databases while also avoiding the overhead of hitting the db.

@igorlukanin igorlukanin added enhancement New feature proposal cube store Issues relating to Cube Store and removed question The issue is a question. Please use Stack Overflow for questions. labels Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cube store Issues relating to Cube Store enhancement New feature proposal
Projects
None yet
Development

No branches or pull requests

2 participants