Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-60943 - Reduce number of centroids for IVF indexes. #234

Merged
merged 1 commit into from
Apr 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions section_faiss_vector_index.go
Original file line number Diff line number Diff line change
Expand Up @@ -348,8 +348,8 @@ func (v *vectorIndexOpaque) mergeAndWriteVectorIndexes(sbs []*SegmentBase,

nvecs := len(finalVecIDs)

// index type to be created after merge based on the number of vectors in
// indexData added into the index.
// index type to be created after merge based on the number of vectors
// in indexData added into the index.
nlist := determineCentroids(nvecs)
indexDescription, indexClass := determineIndexToUse(nvecs, nlist)

Expand Down Expand Up @@ -437,10 +437,7 @@ func determineCentroids(nvecs int) int {
var nlist int

switch {
// At 1M vectors, nlist = 4k gave a reasonably high recall with the right nprobe,
// whereas 1M/100 = 10000 centroids would increase training time without
// corresponding increase in recall
case nvecs >= 1000000:
case nvecs >= 200000:
nlist = int(4 * math.Sqrt(float64(nvecs)))
case nvecs >= 1000:
// 100 points per cluster is a reasonable default, considering the default
Expand Down
Loading