Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newest version not working; OpenBLAS Warining : The number of CPU/Cores(144) is beyond the limit(64). Terminated. #356

Closed
jguhlin opened this issue Mar 9, 2020 · 7 comments

Comments

@jguhlin
Copy link

jguhlin commented Mar 9, 2020

josephguhlin@biochemcompute /V/a/d/g/s/OrthoFinder [SIGINT]> ./orthofinder -h                                      (base) 

OpenBLAS Warining : The number of CPU/Cores(144) is beyond the limit(64). Terminated.
josephguhlin@biochemcompute /V/a/d/g/s/OrthoFinder [1]> ./orthofinder -t 64 -h                                     (base) 

OpenBLAS Warining : The number of CPU/Cores(144) is beyond the limit(64). Terminated.
josephguhlin@biochemcompute /V/a/d/g/s/OrthoFinder [1]> env OPENBLAS_NUM_THREADS=^C
josephguhlin@biochemcompute /V/a/d/g/s/OrthoFinder [1]> set -x OPENBLAS_NUM_THREADS 64                             (base) 
josephguhlin@biochemcompute /V/a/d/g/s/OrthoFinder [1]> ./orthofinder -t 64 -h                                     (base) 

OpenBLAS Warining : The number of CPU/Cores(144) is beyond the limit(64). Terminated.
josephguhlin@biochemcompute /V/a/d/g/s/OrthoFinder [1]> echo $OPENBLAS_NUM_THREADS                                 (base) 
64
josephguhlin@biochemcompute /V/a/d/g/s/OrthoFinder> env OPENBLAS_NUM_THREADS=64 ./orthofinder -h                   (base) 

OpenBLAS Warining : The number of CPU/Cores(144) is beyond the limit(64). Terminated.
@jguhlin
Copy link
Author

jguhlin commented Mar 9, 2020

I've been able to run it using the git checkout (been using it awhile too, so I have most of the tools installed manually!) -- So not urgent over here, but would be good to document how to solve it somewhere.

@davidemms
Copy link
Owner

Hi

I think this isn't an OrthoFinder issue as such. As I understand it your open OpenBLAS library is parallelising and this is interacting with a job manager you have running? If this combines with OrthoFinder's parallelisation then you exceed a limit on the number of cores and your job manager throws an error?There was some discussion of it here: #323 (which I think you may have seen already). Setting this to 1 appears to be the solution to this issue. I don't have any system configured in this way myself to test this but will put this info in a FAQ, but any extra information you have would also be useful. Do you have a sysadmin for this computer? They would be a good person to approach for this and I'd be interested to know what they have to say. And could you also confirm if
export OPENBLAS_NUM_THREADS=1
works for you?

All the best
David

@jguhlin
Copy link
Author

jguhlin commented Mar 10, 2020

Ah yes, setting OPENBLAS_NUM_THREADS=1 worked. But so did running it from the github tag release version via python directly. So I'm wondering if it's something with how you are compiling the python version to a standalone executable?

No job manager, just running from the command line. It's definitely a weird/unique situation.

@davidemms
Copy link
Owner

That's interesting about the compiled version. A question for you but only if you have time, but I'm wondering if you could confirm that the difference behaviour between the compiled vs python versions is observed consistently over a number of repetitions since my first guess is that it would actually be a race condition from the parallelisation (two parallel tasks happen to multiple matrices at the same point in time).

I see now what you mean about no job manager, it looks like it's actually OpenBLAS that is terminating it. I will try and have a look at their issues page/documentation to see if anything needs to be reported to them or anything changed in how OrthoFinder behaves. Currently it only uses numpy operations and I assume under the hood numpy is using OpenBLAS. In fact, it could be that the compiled version is using an old version of numpy that is missing a more recent fix. An old version is used currently to provide support for users on old machines, but that might need to be changed in some way.

Thanks for your help
David

@jguhlin
Copy link
Author

jguhlin commented Mar 25, 2020

I don't think it's a race condition since I can get the error with the help command (./orthofinder -h). But I do seem to get the error each time I try to run it (10 times so far).

How are you compiling the python and the lib? Could be worth checking out.

Edit: Nevermind on the JAX front, it'd be too much effort and probably have the same issues.

@davidemms
Copy link
Owner

Hi

That's really useful info, thanks! That'll save me from a long excursion in the wrong direction. I use the pyinstaller program. You can recreate it yourself but downloading the orthofinder source code and running it on the orthofinder.py file:

pyinstaller -F orthofinder.py

All the best
David

@davidemms
Copy link
Owner

There is some good information on this issue here:
https://github.com/obspy/obspy/wiki/Notes-on-Parallel-Processing-with-Python-and-ObsPy

I've hard-coded the "OPENBLAS_NUM_THREADS=1" into OrthoFinder now, I think this is the best thing to do. If someone who previously had the issue could try running the latest version of the code from the master branch that would be really useful. If it works then I'll create a new release package with this fix in.

Many thanks
David

@davidemms davidemms reopened this Apr 2, 2020
davidemms added a commit that referenced this issue May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants