Under utilization of CPU cores when running Word2Vec #1617
Labels
bug
Issue described a bug
difficulty hard
Hard issue: required deep gensim understanding & high python/cython skills
Description
I am training a word2vec model with a preprocessed wiki corpus(~8GB) on a dedicated Softlayer cloud instance with the following system configuration:
56 cores x 2.0GHz, 128GB RAM, 100GB(SAN), Ubuntu Linux 16.04 LST Minimal Install (64 bit).
I run the code in a docker with 56 workers. While I can see 56 processes(in training phase), the aggregated CPU utilization is around 1100%. Screenshots of CPU utilization of each process can be seen below.
Why is total CPU utilization not around 5600%? Is this behavior expected? Am I missing something trivial?
Steps/Code/Corpus to Reproduce
Link to gensim code
Link to Dockerfile
Expected Results
Total CPU utilization > 1100%. Should be around 5600%.
Actual Results
Link to INFO logs.
top -H -p <PID>
htop
Versions
Linux-4.10.0-21-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('gensim', '2.1.0')
('FAST_VERSION', 1)
The text was updated successfully, but these errors were encountered: