Under utilization of CPU cores when running Word2Vec #1617

manneshiva · 2017-10-10T04:05:44Z

Description

I am training a word2vec model with a preprocessed wiki corpus(~8GB) on a dedicated Softlayer cloud instance with the following system configuration:
56 cores x 2.0GHz, 128GB RAM, 100GB(SAN), Ubuntu Linux 16.04 LST Minimal Install (64 bit).
I run the code in a docker with 56 workers. While I can see 56 processes(in training phase), the aggregated CPU utilization is around 1100%. Screenshots of CPU utilization of each process can be seen below.
Why is total CPU utilization not around 5600%? Is this behavior expected? Am I missing something trivial?

Steps/Code/Corpus to Reproduce

Link to gensim code
Link to Dockerfile

Expected Results

Total CPU utilization > 1100%. Should be around 5600%.

Actual Results

Link to INFO logs.

top -H -p <PID>

htop

Versions

Linux-4.10.0-21-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('gensim', '2.1.0')
('FAST_VERSION', 1)

The text was updated successfully, but these errors were encountered:

gojomo · 2017-10-10T05:48:56Z

Known limitation of current implementation. See related discussion in issues like #1486, #1291, #532, & #336. There are tips in those issues to improve parallelization, for example by optimizing the corpus iteration in the master thread or choosing different training parameters, but even after that the optimal throughput you'll find (via experimentation) will likely be with a workers count in the 3-16 range, rather than the full number of cores available.

gojomo · 2017-10-10T07:45:55Z

#336 will be the preferred issue for this limitation from here forward.

menshikh-iv assigned gojomo Oct 10, 2017

menshikh-iv added bug Issue described a bug difficulty hard Hard issue: required deep gensim understanding & high python/cython skills labels Oct 10, 2017

gojomo closed this as completed Oct 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Under utilization of CPU cores when running Word2Vec #1617

Under utilization of CPU cores when running Word2Vec #1617

manneshiva commented Oct 10, 2017 •

edited by menshikh-iv

Loading

gojomo commented Oct 10, 2017

gojomo commented Oct 10, 2017

Under utilization of CPU cores when running Word2Vec #1617

Under utilization of CPU cores when running Word2Vec #1617

Comments

manneshiva commented Oct 10, 2017 • edited by menshikh-iv Loading

Description

Steps/Code/Corpus to Reproduce

Expected Results

Actual Results

Versions

gojomo commented Oct 10, 2017

gojomo commented Oct 10, 2017

manneshiva commented Oct 10, 2017 •

edited by menshikh-iv

Loading