-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speeding up similarity queries #5
Comments
I compared three functions. All take an array of input values
len(s)=1,000 | 1) 3.46 ms | 2) 3.41 ms | 3) 99.1 µs |
|
About 5-6 times slower. but note that i have very short documents (avg 30 tokens or less, haven't counted), so the crazy difference I see is probably not for everyone. I guess if your documents are hundreds of tokens, the similarity calculations will outweigh the sorting. Either way this sorting optimisation works very well for me. (http://groups.google.com/group/gensim/msg/ae8e4d58d0ead1fb) |
Currently if one is only interested in the top-n most similar documents, the
MatrixSimilarity
andSparseMatrixSimilarity
classes compute all similarities, then sort them, then clip to top-n.Actually the sorting is slower than the matrix multiplication:
http://groups.google.com/group/gensim/browse_thread/thread/f6b839ceaa16c834
so for a start, speed up the post-processing (sorting) part.
The text was updated successfully, but these errors were encountered: