-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Case studies
In this page, we reference example use cases for Faiss, with some explanations. The examples will most often be in the form of Python notebooks, but as usual translation to C++ should be smooth.
This script demonstrates how to add/remove elements from an IVF dataset in a rolling fashion. The key is to use a Hashtable as DirectMap type and remove with IDSelectorArray. Removal cost is then proportional to the number of elements to remove instead of number of elements in the dataset.
This script demonstates how to speed up a recommendation system. Conceptually, the queries vectors are users and the database vectors are items to recommend. The metric to "compare" them is maximum inner product, ie. which item is the most relevant for each user. There is a real-time constraint for this use case (should be returned in < 5 ms) and the accuracy should be as high as possible.
This script demonstrates how to do a k-means variant where in addition the clusters are constrained to contain no more than a maximum number of points.
This script demonstrates an asymmetric search use case: the query vectors are in full precision and the database vectors are compressed as binary vectors. This implementation is slow, it is mainly intended to show how much accuracy can be regained with asymmetric search.
This script demonstrates how to manually train an IVFPQ index enclosed in a OPQ pre-processor. This can be useful, for example, if there are pre-trained centroids handy for the data distribution.
This is also implemented in the function train_ivf_index_with_2level. It should be easy to expand to other types of composite indexes.
There is a sparse clustering implementation in faiss.contrib.clustering
.
This script demonstrates how to cluster vectors that are composed of a dense
part of dimension d1 and a sparse part of dimension d2 where d2 >> d1.
The centroids are represented as full dense vectors.
The implementation relies on the clustering.DatasetAssign
object, that abstracts
away the representation of the vectors to cluster. The clustering
module contains
a pure Python implementation of kmeans
that can consume this DatasetAssign
.
Faiss building blocks: clustering, PCA, quantization
Index IO, cloning and hyper parameter tuning
Threads and asynchronous calls
Inverted list objects and scanners
Indexes that do not fit in RAM
Brute force search without an index
Fast accumulation of PQ and AQ codes (FastScan)
Setting search parameters for one query
Binary hashing index benchmark