-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multicore LDA #232
multicore LDA #232
Conversation
… + several bug fixes
fix bugs in state reset and state init
…og to see when is performed batch version queue merging. This version was tested both in terms of quality and time performance.
This reverts commit 0aa9b79.
Ziky90 develop
Ziky90 develop
py3k compatibility fix in LdaMulticore
…on and fixed eval_every=0 case
Results of time performance experiments on the English Wikipedia, 3.5m documents, 100k vocabulary. Using http://www.hetzner.de/en/hosting/produkte_rootserver/ex40ssd (i7 with 4 real cores, 8 "fake" hyperthread cores). just iterating over input data, no LDA training 1 worker 2 workers 3 workers 4 workers 5 workers old LdaModel, for comparison |
Getting the following exception with LdaMulticore:
Yet the processing goes on. Not sure if the results are gonna be okay, it's still running as you can imagine. But any exception is a problem, right? :) |
50 minutes that the main process is the only one to work (the children use 0% CPU), stuck here: 2014-10-08 21:06:42,249 : INFO : PROGRESS: pass 0, dispatched chunk #11 = documents up to #300000/682440, outstanding queue size 12 |
Seems like a limitation of Python's What Failing that, you'll probably have to use either smaller dictionary, or fewer topics (or both)... or monkey around patching I know that's unfortunate, and it's a silly limitation, but not much I can help with :( Thanks for reporting though, I'll give it more thought, maybe there's some way. |
Sorry for the late response. Indeed, this exception vanishes with smaller parameters (smaller dic, smaller chunksize). The bottleneck is the memory (even 4 workers is too much for my setup, I have 16GB).
I guess that must come from this specific dataset. I had trained the regular LDA model on it 3 months ago and it worked fine though (even if it was slow, of course)... I'll try to run it again to make sure the issue does not come from the multicore implementation. Thank you Radim for your answer. |
Well, I do think there is a problem here. I've launched the multicore lda on a much much smaller corpus, and it's been stuck for more than 7 hours on the same perplexity estimate than previously (ie. the first one). When I ^C the job, it's again stuck in a queue.Full loop. That doesn't seem right to me.
|
This PR parallelizes LDA training, using multiprocessing. By default it will use all existing cores, to train the LDA model faster.
This functionality is implemented as a new class
gensim.models.ldamodel.LdaModelMulticore
, which inherits from the existinggensim.models.ldamodel.LdaModel
. The original class is not affected.LdaModelMulticore
supports batch training, online training and most other parameters the old implementation did. It doesn't support distributed computing and it doesn't support hyperparameter auto-optimization withalpha='auto'
.