- v2.4.0 (and after) will come with support for vectors' indexing and search.
- We've achieved this by embedding FAISS indexes within our bleve (scorch) indexes.
- Introduction of a new zap file format: v16 - which will be the default going forward. Here we co-locate text and vector indexes as neighbors within segments, continuing to conform to the segmented architecture of scorch.
- Induction of FAISS into our eco system, which is a fork of the original facebookresearch/faiss
- FAISS is a C++ library that needs to be compiled and it's shared libraries need to be situated at an accessible path for your application.
- A
vectors
GO TAG needs to be set for bleve to access all the supporting code. This TAG must be set only after the FAISS shared library is made available. Failure to do either will inhibit you from using this feature. - Please follow these instructions below for any assistance in the area.
- Releases of
blevesearch/bleve
work with select checkpoints ofblevesearch/faiss
owing to API changes and improvements (tracking over thebleve
branch):- v2.4.0 requires blevesearch/faiss@7b119f4b (modified v1.7.4)
- v2.4.1 requires blevesearch/faiss@d9db66a3 (modified v1.7.4)
- v2.4.2 requires blevesearch/faiss@d9db66a3 (modified v1.7.4)
- v2.4.3 requires blevesearch/faiss@b747c55a (modified v1.8.0)
- v2.4.4 requires blevesearch/faiss@b747c55a (modified v1.8.0)
- The
vector
field type is an array that is to hold float32 values only. - The
vector_base64
field type to support base64 encoded strings using little endian byte ordering (v2.4.1+) - Supported similarity metrics are: [
"cosine"
(v2.4.3+),"dot_product"
,"l2_norm"
].cosine
paths will additionally normalize vectors before indexing and search.
- Supported dimensionality is between 1 and 2048 (v2.4.0), and up to 4096 (v2.4.1+).
- Supported vector index optimizations:
latency
,memory_efficient
(v2.4.1+),recall
. - Vectors from documents that do not conform to the index mapping dimensionality are simply discarded at index time.
- The dimensionality of the query vector must match the dimensionality of the indexed vectors to obtain any results.
- Pure kNN searches can be performed, but the
query
attribute within the search request must be set - to{"match_none": {}}
in this case. Thequery
attribute is made optional whenknn
is available with v2.4.1+. - Hybrid searches are supported, where results from
query
are unioned (for now) with results fromknn
. The tf-idf scores from exact searches are simply summed with the similarity distances to determine the aggregate scores.
aggregate_score = (query_boost * query_hit_score) + (knn_boost * knn_hit_distance)
- Multi kNN searches are supported - the
knn
object within the search request accepts an array of requests. These sub objects are unioned by default but this behavior can be overriden by settingknn_operator
to"and"
. - Previously supported pagination settings will work as they were, with size/limit being applied over the top-K hits combined with any exact search hits.
doc := struct{
Id string `json:"id"`
Text string `json:"text"`
Vec []float32 `json:"vec"`
}{
Id: "example",
Text: "hello from united states",
Vec: []float32{0,1,2,3,4,5,6,7,8,9},
}
textFieldMapping := mapping.NewTextFieldMapping()
vectorFieldMapping := mapping.NewVectorFieldMapping()
vectorFieldMapping.Dims = 10
vectorFieldMapping.Similarity = "l2_norm" // euclidean distance
bleveMapping := bleve.NewIndexMapping()
bleveMapping.DefaultMapping.Dynamic = false
bleveMapping.DefaultMapping.AddFieldMappingsAt("text", textFieldMapping)
bleveMapping.DefaultMapping.AddFieldMappingsAt("vec", vectorFieldMapping)
index, err := bleve.New("example.bleve", bleveMapping)
if err != nil {
panic(err)
}
index.Index(doc.Id, doc)
searchRequest := NewSearchRequest(query.NewMatchNoneQuery())
searchRequest.AddKNN(
"vec", // vector field name
[]float32{10,11,12,13,14,15,16,17,18,19}, // query vector (same dims)
5, // k
0, // boost
)
searchResult, err := index.Search(searchRequest)
if err != nil {
panic(err)
}
fmt.Println(searchResult.Hits)
searchRequest := NewSearchRequest(query.NewMatchNoneQuery())
filterQuery := NewTermQuery("hello")
searchRequest.AddKNNWithFilter(
"vec", // vector field name
[]float32{10,11,12,13,14,15,16,17,18,19}, // query vector (same dims)
5, // k
0, // boost
filterQuery, // filter query
)
searchResult, err := index.Search(searchRequest)
if err != nil {
panic(err)
}
fmt.Println(searchResult.Hits)
- Using
cmake
is a recommended approach by FAISS authors. - More details here - faiss/INSTALL.
Also documented here - go-faiss/README.
git clone https://github.com/blevesearch/faiss.git
cd faiss
cmake -B build -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON .
make -C build
sudo make -C build install
Building will produce the dynamic library faiss_c
. You will need to install it in a place where your system will find it (e.g. /usr/lib). You can do this with:
sudo cp build/c_api/libfaiss_c.so /usr/local/lib
While you shouldn't need to do any different over osX x86_64, with aarch64 - some instructions need adjusting (see facebookresearch/faiss#2111) ..
LDFLAGS="-L/opt/homebrew/opt/llvm/lib" CPPFLAGS="-I/opt/homebrew/opt/llvm/include" CXX=/opt/homebrew/opt/llvm/bin/clang++ CC=/opt/homebrew/opt/llvm/bin/clang cmake -B build -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_PYTHON=OFF .
make -C build
sudo make -C build install
sudo cp build/c_api/libfaiss_c.dylib /usr/local/lib
Once the supporting library is built and made available, a sanity run is recommended to make sure all unit tests and especially those accessing the vectors' code pass. Here's how I do on mac -
export DYLD_LIBRARY_PATH=/usr/local/lib
go test -v ./... --tags=vectors