-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple most_similar()
queries in one call
#2987
Comments
Our similarity classes in But more generally, where speed matters, our integration with approximate NN search (Annoy, NMSLIB, possibly others) is the best ROI. |
ANN introduces enough complications, in preparation/deployment/caveats, that I believe its support should be qualified as an "advanced, if needed" option, rather than a thing anyone can/should drop in "for speed". And so, anything that puts off the need for those extra steps and imprecise results for a larger group of users is potentially valuable. (#2883 has a recent example of someone who was overcomplicating things with Annoy prematurely.) I also suspect for many users with in-RAM datasets, batching/amortizing full, precise calculations may offer a speedup that's competitive with approximate indexing, without the extra indexing costs or result imprecision, though tests could prove that wrong.
So: unified conventions in method-names/parameters make sense, but it may make sense for implementations to still diverge, and it may make sense for |
Re. There's no functional difference, |
SpaCy's
most_similar
(https://spacy.io/api/vectors#most_similar) accepts multiple queries at a time, and further may then break them into batches. In so doing, the expensivedot
call at the heart of the calculation can work on larger chunks of data at a time, and visit each row of a large source array just once for multiple results - potentially a noticeable speedup.Gensim could consider upgrading
most_similar()
to offer the same batch efficiency.(Thought inspired by #2986's hopes-for-certain optimizations.)
The text was updated successfully, but these errors were encountered: