-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LdaMulticore livelock when documents converge? #244
Comments
Hmm, I've received a very similar report on the mailing list here. I think it may have something to do with the fact that Can you share your corpus + dictionary Dan, for debugging? And thanks for reporting. |
Unfortunately I can't share the corpus/dictionary since it's proprietary data, but are there aggregate statistics that might be helpful? I'd like to help as much as I can, but my hands are a little tied wrt sharing data |
FYI, I'm on it, just busy days :) |
Hey - @danwiesenthal and I were working with some data which we can share and have run into this issue again. Would it be helpful to get this data to you to work with? |
Sure, thanks! Let me assign @ziky90 to this, who will assist you. |
Hi @alif |
Email is probably best. What email address should I send the data to? |
You can send the data to my email: ziky90@gmail.com, thanks. |
@ziky90 Is this resolved? |
I was not able to replicate the bug, so I guess that we can close this and we'll se if someone else will reopen this? |
Hi, but it takes too long time and can not output anything .. |
Hi,
I'm seeing unreliable behavior in LdaMulticore when I tweak parameters like the number of iterations or passes. Sometimes the lda run goes fine and all cores seem to be reasonably well utilized; other times, notably when the iterations/passes are higher, it hangs without output for a very long time (2days+ when a usual run takes 1.5hrs), with one core constantly at or near 100%. A trend I've noticed that may be indicative is that the system always gets stuck with a worker waiting for a new job. Another trend is that when the system gets stuck the debug logs usually say something about having converged within X iterations. Is it possible a livelock situation is occurring? See output below, which shows a 'getting stuck' instance including the two 'trends' I mention above.
Cheers,
Dan
call:
lda = gensim.models.ldamulticore.LdaMulticore(corpus=corpus, id2word=dictionary, num_topics=reduced_dimensionality, passes=100, batch=True, iterations=100, workers=8)
output:
[20141013-14:33PM] [gensim.models.ldamodel] [INFO] using symmetric alpha at 0.02
[20141013-14:33PM] [gensim.models.ldamodel] [INFO] using serial LDA version on this node
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] running batch LDA training, 50 topics, 100 passes over the supplied corpus of 15234 documents, updating every 15234 documents, evaluating every ~15234 documents, iterating 100x with a convergence threshold of 0.001000
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] training LDA model using 8 processes
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] PROGRESS: pass 0, dispatched chunk #0 = documents up to #2000/15234, outstanding queue size 1
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processing chunk #0 of 2000 documents
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] performing inference on a chunk of 2000 documents
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] PROGRESS: pass 0, dispatched chunk #1 = documents up to #4000/15234, outstanding queue size 2
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processing chunk #1 of 2000 documents
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] performing inference on a chunk of 2000 documents
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] PROGRESS: pass 0, dispatched chunk #2 = documents up to #6000/15234, outstanding queue size 3
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processing chunk #2 of 2000 documents
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] performing inference on a chunk of 2000 documents
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] PROGRESS: pass 0, dispatched chunk #3 = documents up to #8000/15234, outstanding queue size 4
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processing chunk #3 of 2000 documents
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] performing inference on a chunk of 2000 documents
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] PROGRESS: pass 0, dispatched chunk #4 = documents up to #10000/15234, outstanding queue size 5
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] PROGRESS: pass 0, dispatched chunk #5 = documents up to #12000/15234, outstanding queue size 6
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] PROGRESS: pass 0, dispatched chunk #6 = documents up to #14000/15234, outstanding queue size 7
[20141013-14:33PM] [gensim.models.ldamulticore] [INFO] PROGRESS: pass 0, dispatched chunk #7 = documents up to #15234/15234, outstanding queue size 8
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processing chunk #4 of 2000 documents
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] performing inference on a chunk of 2000 documents
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processing chunk #5 of 2000 documents
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] performing inference on a chunk of 2000 documents
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processing chunk #6 of 2000 documents
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] performing inference on a chunk of 2000 documents
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processing chunk #7 of 1234 documents
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] performing inference on a chunk of 1234 documents
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] worker process entering E-step loop
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
[20141013-14:33PM] [gensim.models.ldamodel] [DEBUG] 299/1234 documents converged within 100 iterations
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] processed chunk, queuing the result
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] result put
[20141013-14:33PM] [gensim.models.ldamulticore] [DEBUG] getting a new job
The text was updated successfully, but these errors were encountered: