-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Phrases optimizations #837
Conversation
Note to self: use the faster detection loop in Python. @lev good student project: implement the inner loops as C (via Cython) optional extension (optional ala word2vec extension). |
Created #918 asking for volunteers to create tests |
Merged in #954 |
@tmylk It's necessary to use a Also, a So |
@tmylk a fast C/Cython implementation of The entire "training" is essentially incrementing a counter, no reason it shouldn't be as fast as your input iterator provides. Plus, phrases (collocations) are not going anywhere, so it's a stable module to invest more time into. |
add_vocab()
callPhraser
helper class & supporting options on existing methodsPhraser
takes aPhrases
and does a single (time-consuming) pass to discover all the Phrases that it would want to create, saving those into a much much smaller (and somewhat faster) helper object. This can be saved & used separately.Needs more testing.