multicore LDA #232

piskvorky · 2014-09-10T17:32:08Z

This PR parallelizes LDA training, using multiprocessing. By default it will use all existing cores, to train the LDA model faster.

This functionality is implemented as a new class gensim.models.ldamodel.LdaModelMulticore, which inherits from the existing gensim.models.ldamodel.LdaModel. The original class is not affected.

LdaModelMulticore supports batch training, online training and most other parameters the old implementation did. It doesn't support distributed computing and it doesn't support hyperparameter auto-optimization with alpha='auto'.

… + several bug fixes

fix bugs in state reset and state init

…og to see when is performed batch version queue merging. This version was tested both in terms of quality and time performance.

This reverts commit 0aa9b79.

Ziky90 develop

…icore

…0-develop

Ziky90 develop

py3k compatibility fix in LdaMulticore

…on and fixed eval_every=0 case

py3k fix

ziky90 · 2014-09-16T14:51:13Z

Results of time performance experiments on the English Wikipedia, 3.5m documents, 100k vocabulary. Using http://www.hetzner.de/en/hosting/produkte_rootserver/ex40ssd (i7 with 4 real cores, 8 "fake" hyperthread cores).

just iterating over input data, no LDA training
real 20m21.720s
user 20m17.126s
sys 0m1.515s

1 worker
real 150m5.235s
user 267m30.608s
sys 33m56.005s

2 workers
real 84m35.688s
user 224m1.428s
sys 25m29.380s

3 workers
real 66m8.102s
user 220m4.559s
sys 22m53.731s

4 workers
real 63m42.413s
user 231m39.043s
sys 22m30.636s

5 workers
real 62m21.117s
user 247m50.718s
sys 22m16.507s

old LdaModel, for comparison
real 222m52.331s
user 205m22.386s
sys 16m54.866s

multicore LDA

lerela · 2014-10-08T19:08:44Z

Getting the following exception with LdaMulticore:

2014-10-08 21:04:24,903 : INFO : accepted corpus with 682440 documents, 200000 features, 77570757 non-zero entries
2014-10-08 21:04:24,958 : INFO : using symmetric alpha at 0.00125
2014-10-08 21:04:24,958 : INFO : using serial LDA version on this node
2014-10-08 21:04:49,624 : INFO : running online LDA training, 800 topics, 20 passes over the supplied corpus of 682440 documents, updating every 150000 documents, evaluating every ~450000 documents, iterating 100x with a convergence threshold of 0.001000
2014-10-08 21:04:49,634 : INFO : training LDA model using 6 processes
2014-10-08 21:05:05,330 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #25000/682440, outstanding queue size 1
Traceback (most recent call last):
  File "/usr/lib/python3.3/multiprocessing/queues.py", line 249, in _feed
    send(obj)
  File "/usr/lib/python3.3/multiprocessing/connection.py", line 207, in send
    self._send_bytes(buf.getbuffer())
  File "/usr/lib/python3.3/multiprocessing/connection.py", line 400, in _send_bytes
    self._send(struct.pack("!i", n))
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

Yet the processing goes on. Not sure if the results are gonna be okay, it's still running as you can imagine. But any exception is a problem, right? :)

lerela · 2014-10-08T19:53:40Z

50 minutes that the main process is the only one to work (the children use 0% CPU), stuck here: 2014-10-08 21:06:42,249 : INFO : PROGRESS: pass 0, dispatched chunk #11 = documents up to #300000/682440, outstanding queue size 12

piskvorky · 2014-10-08T20:21:43Z

Seems like a limitation of Python's multiprocessing library, which cannot send large objects: http://stackoverflow.com/questions/16576386/byte-limit-when-transferring-python-objects-between-processes-using-a-pipe

What chunksize are you using? Try lowering that, to lower the memory footprint.

Failing that, you'll probably have to use either smaller dictionary, or fewer topics (or both)... or monkey around patching multiprocessing manually.

I know that's unfortunate, and it's a silly limitation, but not much I can help with :(

Thanks for reporting though, I'll give it more thought, maybe there's some way.

lerela · 2014-10-11T23:21:27Z

Sorry for the late response. Indeed, this exception vanishes with smaller parameters (smaller dic, smaller chunksize). The bottleneck is the memory (even 4 workers is too much for my setup, I have 16GB).
But with 3 workers, no exception and enough RAM, yet the computation has been stuck for 6 hours until I decided to stop it (the 3 threads + the main thread were each using 100% of the cpu, but no output for 6 hours):

2014-10-11 17:02:45,551 : INFO : accepted corpus with 682440 documents, 100000 features, 76197320 non-zero entries
2014-10-11 17:02:45,576 : INFO : using symmetric alpha at 0.00125
2014-10-11 17:02:45,576 : INFO : using serial LDA version on this node
2014-10-11 17:02:58,302 : INFO : running online LDA training, 800 topics, 20 passes over the supplied corpus of 682440 documents, updating every 4000 documents, evaluating every ~12000 documents, iterating 100x with a convergence threshold of 0.001000
2014-10-11 17:02:58,310 : INFO : training LDA model using 2 processes
2014-10-11 17:03:00,224 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #2000/682440, outstanding queue size 1
2014-10-11 17:03:03,499 : INFO : PROGRESS: pass 0, dispatched chunk #1 = documents up to #4000/682440, outstanding queue size 2
2014-10-11 17:03:06,472 : INFO : PROGRESS: pass 0, dispatched chunk #2 = documents up to #6000/682440, outstanding queue size 3
2014-10-11 17:03:09,114 : INFO : PROGRESS: pass 0, dispatched chunk #3 = documents up to #8000/682440, outstanding queue size 4
2014-10-11 17:03:10,269 : INFO : PROGRESS: pass 0, dispatched chunk #4 = documents up to #10000/682440, outstanding queue size 5
2014-10-11 17:03:11,465 : INFO : PROGRESS: pass 0, dispatched chunk #5 = documents up to #12000/682440, outstanding queue size 6
^CTraceback (most recent call last):
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamulticore.py", line 243, in update
    job_queue.put((chunk_no, chunk, self), block=False, timeout=0.1)
  File "/usr/lib/python3.3/multiprocessing/queues.py", line 79, in put
    raise Full
queue.Full

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "model.py", line 54, in <module>
    prepareGensimLda(args.corpus, args.ntopic, args.l)
  File "model.py", line 24, in prepareGensimLda
    lda = gensim.models.ldamulticore.LdaMulticore(corpus=tfidf_corpus, id2word=id2word, num_topics=ntopic, chunksize=2000, passes=20, workers=2, iterations=100, eval_every=3)
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamulticore.py", line 136, in __init__
    gamma_threshold=gamma_threshold)
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamodel.py", line 313, in __init__
    self.update(corpus)
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamulticore.py", line 243, in update
    job_queue.put((chunk_no, chunk, self), block=False, timeout=0.1)
KeyboardInterrupt
^C

I guess that must come from this specific dataset. I had trained the regular LDA model on it 3 months ago and it worked fine though (even if it was slow, of course)... I'll try to run it again to make sure the issue does not come from the multicore implementation. Thank you Radim for your answer.

lerela · 2014-10-16T06:17:48Z

Well, I do think there is a problem here. I've launched the multicore lda on a much much smaller corpus, and it's been stuck for more than 7 hours on the same perplexity estimate than previously (ie. the first one). When I ^C the job, it's again stuck in a queue.Full loop. That doesn't seem right to me.

2014-10-16 01:43:27,845 : INFO : accepted corpus with 35360 documents, 70313 features, 8148306 non-zero entries
2014-10-16 01:43:27,863 : INFO : using symmetric alpha at 0.00125
2014-10-16 01:43:27,863 : INFO : using serial LDA version on this node
2014-10-16 01:43:36,513 : INFO : running online LDA training, 800 topics, 20 passes over the supplied corpus of 35360 documents, updating every 8000 documents, evaluating every ~24000 documents, iterating 100x with a convergence threshold of 0.001000
2014-10-16 01:43:36,516 : INFO : training LDA model using 4 processes
2014-10-16 01:43:38,234 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #2000/35360, outstanding queue size 1
2014-10-16 01:43:41,200 : INFO : PROGRESS: pass 0, dispatched chunk #1 = documents up to #4000/35360, outstanding queue size 2
2014-10-16 01:43:44,558 : INFO : PROGRESS: pass 0, dispatched chunk #2 = documents up to #6000/35360, outstanding queue size 3
2014-10-16 01:43:48,692 : INFO : PROGRESS: pass 0, dispatched chunk #3 = documents up to #8000/35360, outstanding queue size 4
2014-10-16 01:43:52,333 : INFO : PROGRESS: pass 0, dispatched chunk #4 = documents up to #10000/35360, outstanding queue size 5
2014-10-16 01:43:55,636 : INFO : PROGRESS: pass 0, dispatched chunk #5 = documents up to #12000/35360, outstanding queue size 6
2014-10-16 01:43:57,597 : INFO : PROGRESS: pass 0, dispatched chunk #6 = documents up to #14000/35360, outstanding queue size 7
2014-10-16 01:43:59,589 : INFO : PROGRESS: pass 0, dispatched chunk #7 = documents up to #16000/35360, outstanding queue size 8
2014-10-16 01:44:00,993 : INFO : PROGRESS: pass 0, dispatched chunk #8 = documents up to #18000/35360, outstanding queue size 9
2014-10-16 01:44:02,122 : INFO : PROGRESS: pass 0, dispatched chunk #9 = documents up to #20000/35360, outstanding queue size 10
2014-10-16 01:44:03,280 : INFO : PROGRESS: pass 0, dispatched chunk #10 = documents up to #22000/35360, outstanding queue size 11
2014-10-16 01:44:04,401 : INFO : PROGRESS: pass 0, dispatched chunk #11 = documents up to #24000/35360, outstanding queue size 12

^CTraceback (most recent call last):
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamulticore.py", line 243, in update
    job_queue.put((chunk_no, chunk, self), block=False, timeout=0.1)
  File "/usr/lib/python3.3/multiprocessing/queues.py", line 79, in put
    raise Full
queue.Full

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "model.py", line 54, in <module>
    prepareGensimLda(args.corpus, args.ntopic, args.l)
  File "model.py", line 24, in prepareGensimLda
    lda = gensim.models.ldamulticore.LdaMulticore(corpus=tfidf_corpus, id2word=id2word, num_topics=ntopic, chunksize=2000, passes=20, workers=4, iterations=100, eval_every=3)
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamulticore.py", line 136, in __init__
    gamma_threshold=gamma_threshold)
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamodel.py", line 313, in __init__
    self.update(corpus)
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamulticore.py", line 252, in update
    process_result_queue()
  File "/usr/local/lib/python3.3/dist-packages/gensim/models/ldamulticore.py", line 225, in process_result_queue
    while not result_queue.empty():
  File "/usr/lib/python3.3/multiprocessing/queues.py", line 123, in empty
    return not self._poll()
  File "/usr/lib/python3.3/multiprocessing/connection.py", line 254, in poll
    def poll(self, timeout=0.0):
KeyboardInterrupt

ziky90 and others added 30 commits September 10, 2014 14:15

implemented first working draft of LDA via multiprocessing

2c5a705

bugfix in multiprocess LDA model

f0310f6

cleaned and simplified code of LDA implementation via multiprocessing…

a407179

… + several bug fixes

removed lda_dispatcher and lda_worker

0aa9b79

fix bugs in state reset and state init

3aa7c24

Merge pull request #1 from piskvorky/ziky90-develop

83dda8d

fix bugs in state reset and state init

implemented batch version - not tested

670c155

distributed ldamodel, cleaned code from old TODO comments and added l…

5bd3ceb

…og to see when is performed batch version queue merging. This version was tested both in terms of quality and time performance.

performance optimization

87a0a47

Revert "removed lda_dispatcher and lda_worker"

dbc2c51

This reverts commit 0aa9b79.

refactored multicore LDA to inherit from LdaModel

7255016

added LDA multicore file import in __init__.py

bfd46c6

cleaned added comments and several bug fixes

5855af5

improve logging

a690b97

improve logging & minor refactor

a41f896

Merge pull request #2 from piskvorky/ziky90-develop

8f28676

Ziky90 develop

fixed bug with eval_every and implemented unit tests for LdaModelMult…

2561bc1

…icore

add sphinx docs for LdaMulticore

fc0d681

Merge branch 'develop' of https://github.com/ziky90/gensim into ziky9…

24eb87e

…0-develop

fix docs typo

f07ead1

Merge pull request #3 from piskvorky/ziky90-develop

868014b

Ziky90 develop

LdaModelMulticore refactored to LdaMulticore

49d1bf5

refactored the file ldamodelmulticore to ldamodel

60255e0

py3k compatibility fix in LdaMulticore

539cec9

Merge pull request #4 from piskvorky/ziky90-develop

57c1d16

py3k compatibility fix in LdaMulticore

refactored ldamodelmulticore to lea multicore also in the documentati…

a88edde

…on and fixed eval_every=0 case

Merge branch 'develop' of https://github.com/ziky90/gensim into develop

85d2f1a

py3k fix

8add57b

Merge pull request #5 from piskvorky/ziky90-develop

1316390

py3k fix

Added milticore hyper threading warning to the documentation

f345eb5

Merge branch 'develop' of https://github.com/ziky90/gensim into develop

817099d

piskvorky added a commit that referenced this pull request Sep 16, 2014

Merge pull request #232 from ziky90/develop

0c2535d

multicore LDA

piskvorky merged commit 0c2535d into piskvorky:develop Sep 16, 2014

piskvorky added a commit that referenced this pull request Sep 16, 2014

re #232: add Wiki timing measurement, clean up docs

da5e45e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multicore LDA #232

multicore LDA #232

piskvorky commented Sep 10, 2014

ziky90 commented Sep 16, 2014

lerela commented Oct 8, 2014

lerela commented Oct 8, 2014

piskvorky commented Oct 8, 2014

lerela commented Oct 11, 2014

lerela commented Oct 16, 2014

multicore LDA #232

multicore LDA #232

Conversation

piskvorky commented Sep 10, 2014

ziky90 commented Sep 16, 2014

lerela commented Oct 8, 2014

lerela commented Oct 8, 2014

piskvorky commented Oct 8, 2014

lerela commented Oct 11, 2014

lerela commented Oct 16, 2014