Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Under utilization of CPU cores when running Word2Vec #1617

Closed
manneshiva opened this issue Oct 10, 2017 · 2 comments
Closed

Under utilization of CPU cores when running Word2Vec #1617

manneshiva opened this issue Oct 10, 2017 · 2 comments
Assignees
Labels
bug Issue described a bug difficulty hard Hard issue: required deep gensim understanding & high python/cython skills

Comments

@manneshiva
Copy link
Contributor

manneshiva commented Oct 10, 2017

Description

I am training a word2vec model with a preprocessed wiki corpus(~8GB) on a dedicated Softlayer cloud instance with the following system configuration:
56 cores x 2.0GHz, 128GB RAM, 100GB(SAN), Ubuntu Linux 16.04 LST Minimal Install (64 bit).
I run the code in a docker with 56 workers. While I can see 56 processes(in training phase), the aggregated CPU utilization is around 1100%. Screenshots of CPU utilization of each process can be seen below.
Why is total CPU utilization not around 5600%? Is this behavior expected? Am I missing something trivial?

Steps/Code/Corpus to Reproduce

Link to gensim code
Link to Dockerfile

Expected Results

Total CPU utilization > 1100%. Should be around 5600%.

Actual Results

Link to INFO logs.

top -H -p <PID>
top

htop
selection_006

Versions

Linux-4.10.0-21-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('gensim', '2.1.0')
('FAST_VERSION', 1)

@menshikh-iv menshikh-iv added bug Issue described a bug difficulty hard Hard issue: required deep gensim understanding & high python/cython skills labels Oct 10, 2017
@gojomo
Copy link
Collaborator

gojomo commented Oct 10, 2017

Known limitation of current implementation. See related discussion in issues like #1486, #1291, #532, & #336. There are tips in those issues to improve parallelization, for example by optimizing the corpus iteration in the master thread or choosing different training parameters, but even after that the optimal throughput you'll find (via experimentation) will likely be with a workers count in the 3-16 range, rather than the full number of cores available.

@gojomo gojomo closed this as completed Oct 10, 2017
@gojomo
Copy link
Collaborator

gojomo commented Oct 10, 2017

#336 will be the preferred issue for this limitation from here forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty hard Hard issue: required deep gensim understanding & high python/cython skills
Projects
None yet
Development

No branches or pull requests

3 participants