Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop exploring HNSW graph if scores are not getting better. #12770

Merged

Conversation

benwtrent
Copy link
Member

I noticed while testing lower dimensionality and quantization, we would explore the HNSW graph way too much. I was stuck figuring out why until I noticed the searcher checks for distance equality (not just if the distance is better) when exploring neighbors-of-neighbors. This seems like a bad heurstic, but to double check I looked at what nmslib does. This pointed me back to this commit: nmslib/nmslib#106

Seems like this performance hitch was discovered awhile ago :).

This commit adjusts HNSW to only explore the graph layer if the distance is actually better.

@benwtrent benwtrent changed the title Only explore neighbor-of-neighbor if similarity is better Stop exploring HNSW graph if scores are not getting better. Nov 6, 2023
@msokolov
Copy link
Contributor

msokolov commented Nov 6, 2023

oh, good catch! I guess equality is unusual, but this should help in some cases. I wonder if it helps with degenerate case where all scores are equal? EG all zero vectors (see #11626)

@benwtrent
Copy link
Member Author

I wonder if it helps with degenerate case where all scores are equal? EG all zero vectors (see #11626)

It would help by ending exploration for sure.

In this degenerate case, it is even worse than causing slowness. If there are millions of vectors we could cause an OOME as there are no barriers preventing the candidate list from expanding (we only pop one, but then could add 15 more on each neighbor explored for example).

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwtrent Thanks Ben, nice optimization

@benwtrent benwtrent merged commit e1ce1d6 into apache:main Nov 7, 2023
4 checks passed
@benwtrent benwtrent deleted the bugfix/stop-exploring-hnsw-too-much branch November 7, 2023 11:55
benwtrent added a commit that referenced this pull request Nov 7, 2023
I noticed while testing lower dimensionality and quantization, we would explore the HNSW graph way too much. I was stuck figuring out why until I noticed the searcher checks for distance equality (not just if the distance is better) when exploring neighbors-of-neighbors. This seems like a bad heuristic, but to double check I looked at what nmslib does. This pointed me back to this commit: nmslib/nmslib#106

Seems like this performance hitch was discovered awhile ago :).

This commit adjusts HNSW to only explore the graph layer if the distance is actually better.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants