Skip to content

Commit

Permalink
perf(lance): add prefilter, nprobes, refine_factor to get better resu…
Browse files Browse the repository at this point in the history
…lt (#22066)
  • Loading branch information
hongbo-miao authored Dec 28, 2024
1 parent 7a3648d commit cb2987a
Showing 1 changed file with 13 additions and 2 deletions.
15 changes: 13 additions & 2 deletions data-storage/lance/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,25 @@ def main() -> None:
)

# Find 5 nearest neighbors
# Note: For better accuracy, you can use nprobes (5-10% of dataset) and refine_factor
k = 5

# nprobes:
# The number of probes determines the distribution of vector space.
# While a higher number enhances search accuracy, it also results in slower performance.
# Typically, setting nprobes to cover 5–10% of the dataset proves effective in achieving high recall with minimal latency.
#
# refine_factor:
# Refine the results by reading extra elements and re-ranking them in memory.
# A higher number makes the search more accurate but also slower.
results = dataset.to_table(
prefilter=True,
nearest={
"column": "vector",
"k": k,
"q": query_vector,
}
"nprobes": 500,
"refine_factor": 10,
},
).to_pandas()

logging.info("Nearest neighbors (distances show similarity, lower = more similar):")
Expand Down

0 comments on commit cb2987a

Please sign in to comment.