Experimental feature: Hybrid Search and Vector Store #677
Replies: 40 comments 127 replies
-
hi, an excellent feature, can we use filters with vector search ? |
Beta Was this translation helpful? Give feedback.
-
Hi, if I use OpenAI to create the vector to store in DB but when I search I use Hugingface vector to search is it will work fine or show the right result or not? |
Beta Was this translation helpful? Give feedback.
-
When I tested the meilisearch:v1.3.0-rc.3 vector search, I got results, but didn't see _semanticSimilarity. Strange |
Beta Was this translation helpful? Give feedback.
-
I've been experimenting with Vector Search with the release in 1.3, but I am having two issues, which I am not sure how to address
I know it's experimental, but this seems to be very drastic issue, so maybe I am not doing it right? Update: |
Beta Was this translation helpful? Give feedback.
-
Hybrid search is not yet in the experimental right? I assume we are looking into it before the 1.4 release. I could set up and use the current vector search right away, I also had a session at "Drupal Camp Pune" and did a small introduction to vectors and hybrid search added it behind a small demo for the users to interact with. (We are trying to use it behind a QnA bot) Let me know how can I help with this hybrid search thing. Even though I am not a rust expert I can give it a try :) |
Beta Was this translation helpful? Give feedback.
-
Is it possible to have varying lengths of vectors across documents in an index? documents = [
{ 'id': 1, 'Text': 'I like to eat broccoli and bananas.', '_vectors': embedding_service(["I like to eat broccoli", "I like to eat bananas", "I like to eat broccoli and bananas."]) },
{ 'id': 2, 'Text': 'The Packers won the 2011 NFL Super Bowl with Aaron Rodgers', '_vectors': embedding_service(["The Packers won in 2011", "The Packers won the 2011 NFL Super Bowl with Aaron Rodgers"]) },
] would fail with this message:
What I'm looking for is having multiple vectors belong to a document and upon vector search, compute dot product with all vectors for all documents |
Beta Was this translation helpful? Give feedback.
-
I have some questions regarding this feature:
[1] I am new to the whole vector search so please correct me if I mention something wrong. |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, its september 30 today, how is the feature going?, what other potential date can be expect it to be fully stable relatively? Thanks for considering this feature 👍 |
Beta Was this translation helpful? Give feedback.
-
hi, i just setup a collection with 16 Million 3 dimension vectors basically all possible RGB values normalized , as i inserted them, first 300k records got added fine after that, its painfully slow , the tasks still running after 12 hrs can anyone suggest any limit to what amount of vector we can add ? looking for a solution of 40M records with 1024 dimensions. |
Beta Was this translation helpful? Give feedback.
-
Vector Search Is it possible to search for images by image? |
Beta Was this translation helpful? Give feedback.
-
Hey folks 👋 v1.6 has been released! 🦊 Hybrid search and auto-embedding features are now available for your use ✨ Check out our documentation to learn how to use them. We're looking forward to your feedback! |
Beta Was this translation helpful? Give feedback.
-
hi @macraig, is it possible to hybrid search with user-provided embeddings? I read through the document but seem like they are mutual exclusive? |
Beta Was this translation helpful? Give feedback.
-
I've wanted to add some more recent and well-performing embedders, like E5 models or even mpnet model, which is the 3rd most popular Sentence Similarity model on Hugging Face. But as I encountered errors and asked in support, I was told only BERT models for autoembedding with HuggingFace are supported at the moment, which makes it impractical for me to use built-in Meilisearch capabilities for this and I have to use a separate service, where I can use better models. Here's a good link discussing differences and why many mainstream BERT embedders are not as good as new alternatives: https://blog.metarank.ai/from-zero-to-semantic-search-embedding-model-592e16d94b61 So it would be great if support for more models could be added. |
Beta Was this translation helpful? Give feedback.
-
Anyone knows how to monitor the state of the auto-embedding? I configured the embedder using the embedders: {
default: {
source: 'huggingFace',
model: 'BAAI/bge-base-en-v1.5',
documentTemplate:
"Title: {{doc.title}}. {% if doc.subheading %}Subheading: {{doc.subheading}}. {% endif %}Content: {{doc.content}}."
} I have 100k documents in the index. And it is running for a while already in my local machine, so I would like to debug how long the full auto-embedding would take. Is there an endpoint that could give me information of the current embedding? |
Beta Was this translation helpful? Give feedback.
-
Since 25/01 OpenAI added two new embedding model:
They also added an optional field that can limit the number of dimensions for both new models. As of right now only
see: OpenAI blog post |
Beta Was this translation helpful? Give feedback.
-
Are there any updates on the stabilisation of this feature? Really want to start using it in production, but waiting till it's not an experimental feature |
Beta Was this translation helpful? Give feedback.
-
Regarding the storage part, an excellent solution would be binary quantization of vectors. It maintains 95% to 99% of the retrieval performance with significantly less storage because the vectors are binary instead of floating-point numbers. Regarding the speed, I think the necessary algorithms and data structures are already available. There are many established and emerging players in the vector search field that claim impressive speeds. For example, a new library claims "0.1 milliseconds query latency on million-scale vector datasets." Therefore, achieving this should be feasible. All that being said, considering the current vector search landscape, natively implementing Colbert ranking is indispensable for a top-choice, go-to database solution. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Adding a feature request from rustyx for non-BERT models described in meilisearch/meilisearch#4718 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Hey folks 👋 🦎 v1.9 has been released and it includes multiple updates and some Looking forward to your feedback! |
Beta Was this translation helpful? Give feedback.
-
Would love to see configuration on API base for OpenAI embedder get supported. Currently it's hardcoded but in enterprise env, we may need to access the service with a different URL or a proxy, which is done by changing the URL base. The official openai python library support this. |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, We are preparing to stabilize the feature and would appreciate your feedback to improve it. We noticed that most users opt for the Your input is invaluable in helping us refine and enhance the feature. Thank you! |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
Is there any way to check the indexing status? Is there any way to check for which documents the vectors are generated?
And nothing happens. So, the real question in hand is - how to debug what is going on? Please advise. |
Beta Was this translation helpful? Give feedback.
-
Hello again everyone 👋 🦩 v1.10 has been released and it includes multiple updates and some Looking forward to your feedback! |
Beta Was this translation helpful? Give feedback.
-
Hi, I am experimenting with local embedders. In the FAQ for Multilingual-E5-large they state that it is required to prefix input texts with Thanks in advance |
Beta Was this translation helpful? Give feedback.
-
Do you guys have any experience/recommendations for multilanguage models? We're trying to get meilisearch running on english/german use cases and the search results have been quite odd. One such example (english) without semantic search:
First hit: "title": "TEAC - Turntable - Natural Wood" With semantic search
"title": "Timex - Ladies' Health Tracker Watch - Blue" |
Beta Was this translation helpful? Give feedback.
-
Hello, how can I update the REST embedding server URL without recomputing all the embeddings please ? |
Beta Was this translation helpful? Give feedback.
-
Hello once again 👋 🐿️ v1.11 has been released and it includes multiple updates and some Looking forward to your feedback! |
Beta Was this translation helpful? Give feedback.
-
Meilisearch v1.3 (released July 31) introduces a Vector Store feature:
_vectors
field.vector
query field. The_semanticScore
field is added to the resulting documents. It represents a dot product of the distance between the nearest vector and thevector
from the search query.Meilisearch v1.6 (released January 15th) improves on the Vector Store feature:
Meilisearch v1.7 (released March 11th) improves on the Vector Store feature:
Keywords: Semantic Search, Vector Search, Embeddings Search, Hybrid Search.
Experimental feature abstract
Creating one or multiple
embedders
for an index triggers a new step in the indexing process where embeddings are generated for each indexed document.Passing
"hybrid": {}
to a query from the/indexes/{:indexUid}/search
or/multi-search
performs both a keyword search and a vector search. If no"vector"
was provided, it is generated from the"q"
field.How to use the feature?
Please refer to the public API page
What is an experimental feature
By enabling this feature via the
/experimental-features
route, you opt into the following:embedders
setting can change in a breaking way between two minor versions of Meilisearch.You can use this feature in production but be prepared to update your code from one version to the next.
Why is this feature not stable yet?
Storing the vectors is currently very expensive, and retrieving them is too. We hope to make progress on that. We are unsure of the API surface we want to expose, even if the current one seems correct.
🗣️ You are welcome to give feedback about the score details or ask any question on its usage; we are eager to collect feedback on the feature
When will the feature potentially be stable?
[Updated] Due to the large increase in API surface and us missing previous estimates, we cannot provide an estimate for the time being
To fully disable the feature, you need to delete the
embedders
setting from any index using the feature by callingDELETE http://localhost:7700/indexes/<index_using_the_feature>/settings/embedders
before callingPATCH 'http://localhost:7700/experimental-features/'
withvectorStore
set tofalse
.experimental-features
routeBeta Was this translation helpful? Give feedback.
All reactions