Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two much threads. #323

Closed
FredericBGA opened this issue Dec 17, 2019 · 11 comments
Closed

Two much threads. #323

FredericBGA opened this issue Dec 17, 2019 · 11 comments

Comments

@FredericBGA
Copy link
Contributor

Hi,

we are using PBS Pro as job scheduler. We inforce memory and cpu usage.

When we launch like this:

python3 /softs/bioinfo/orthofinder-2.3.8/orthofinder.py -t 12 -f proteomes/20191205 -X -S blast -og

PBS kills the job:

PBS: job killed: ncpus 544.6 exceeded limit 24 (burst)

It means that we asked for 24 cpus and orthofinder used at least(!) 544!
So there is something that we don't understand. Is our command line good?

thank you for your ideas.

@davidemms
Copy link
Owner

Hi

Do you know what stage OrthoFinder got to and what it's most recent output was? I'm not sure how these systems are interacting but OrthoFinder only has control of how many processes it launches and what these do, it doesn't have control over the cpus as such.

In terms of what OrthoFinder will do with your command: it will run 12 blast processes at once. It might be worth checking the individual blast processes aren't each running in parallel. You can see the blastp command OrthoFinder uses using the '-op' option instead of '-og'. When I wrote this command, there was no need to specify the number of threads as this would run in blastp in serial, you could check for the blast version you are using if this is still the case, I don't know if this has changed.

Edit: the blast command looks like this:
blastp -outfmt 6 -evalue 0.001 -query Sp1.fa -db Sp2DB -out results1_2.txt

In terms of how your job scheduler counts CPUs, there will be a thread for the main orthofinder process and there might be threads associated with running each of the 12 blast processes but these will all be inactive--you can confirm this by looking at top, there will only be 12 processes actively using CPU cycles at any one time. I don't know how the job scheduler works but maybe it could be counting threads rather than CPU usage?

They are the best ideas I can come up with for why PBS is reporting that. But the command is good as far as I'm concerned and (unless blast has started running in parallel) it should only be using 12 cpus at once, which can be confirmed using top. Let me know if you find anything.

All the best
David

@FredericBGA
Copy link
Contributor Author

Thank you David,

I have launch the job myself, in order to test more deeply and be able to give you more details.
The blast are fine, you were right. 12 of them are launched, and the top command confirms that.

I still have my job killed by PBS enforce system. It happens during the Running OrthoFinder algorithm step.

python3 /softs/bioinfo/orthofinder-2.3.8/orthofinder.py -t 12 -f sample/ -X -S blast -og

OrthoFinder version 2.3.8 Copyright (C) 2014 David Emms

2019-12-17 15:30:47 : Starting OrthoFinder
12 thread(s) for highly parallel tasks (BLAST searches etc.)
1 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "makeblastdb -help" - ok
Test can run "blastp -help" - ok
Test can run "mcl -h" - ok

Dividing up work for BLAST for parallel processing

2019-12-17 15:30:49 : Creating Blast database 1 of 6
2019-12-17 15:30:50 : Creating Blast database 2 of 6
2019-12-17 15:30:51 : Creating Blast database 3 of 6
2019-12-17 15:30:53 : Creating Blast database 4 of 6
2019-12-17 15:30:54 : Creating Blast database 5 of 6
2019-12-17 15:30:55 : Creating Blast database 6 of 6

Running BLAST all-versus-all

Using 12 thread(s)
2019-12-17 15:30:56 : This may take some time....
2019-12-17 15:30:56 : Done 0 of 36
2019-12-17 20:11:12 : Done 10 of 36
2019-12-18 00:22:34 : Done 20 of 36
2019-12-18 04:02:10 : Done all-versus-all sequence search

Running OrthoFinder algorithm

2019-12-18 04:02:11 : Initial processing of each species
=>> PBS: job killed: ncpus 34.3 exceeded limit 24 (burst)

What can I do in order to understand what happens?
For this run, I launched with less species than my colleague, so the cpu burst is lower (around 34). I asked PBS for 24.
Thank you.

Fred

@davidemms
Copy link
Owner

Hi Fred

I've checked in the code to confirm that OrthoFinder will only be using 1 thread here, that's the "1 thread(s) for OrthoFinder algorithm" bit in the output you posted above. You should be able to confirm this by monitoring 'top' at this point. You can restart at almost exactly this stage so as to see this (i.e. from the completed all-versus-all sequence search) using the '-b' option specifiying the result directory from your attempted run above. Unfortunately I don't know why PBS is behaving this way, but top should confirm that OrthoFinder isn't using 34 cpus.

All the best
David

@conchoecia
Copy link

conchoecia commented Feb 12, 2020

Hi David, not to hijack this thread, but I'm finding that OrthoFinder does not appear to be passing the -t option correctly on even the most basic command suggested in the tutorial. It is automatically using the number of threads on our machine.

OrthoFinder/orthofinder -t 20 -f OrthoFinder/ExampleData/

OpenBLAS Warining : The number of CPU/Cores(96) is beyond the limit(64). Terminated.

@davidemms
Copy link
Owner

Hi

Could you check it's usage using top or some other program. OrthoFinder shouldn't be running more than 20 high core usage threads at once, although there may be some idle threads at the same time but they won't be using CPU cycles. The '-t' option is used to control how many parallel tasks are run at once (diamond searches, alignments, tree inference) and it is definitely being propagated through the program. Do you have any info as to when in the run this occurred?

My only guess is that your BLAS library could be doing some parallelisation itself under the hood. Have you googled the error message it's returned? I don't know much about it, but could any of the results be relevant to your situation?

All the best
David

@conchoecia
Copy link

Hi David, thanks for your speculation - my colleague pointed out that the version of Orthofinder that I downloaded from the tutorial was old (v2.3.1). I do not have this error when using v2.3.7.

@IvanV87
Copy link

IvanV87 commented Feb 24, 2020

Hi David, Im currently using the last version v2.3.11 on CENTOS7 and I'm getting the same erros as posted before:

"OpenBLAS Warining : The number of CPU/Cores(96) is beyond the limit(64). Terminated."

the comand line that Im using to submit my job is Orthofinder/orthofinder -f ~/data/2020/test1.
moreover I've also tried by including the tag -t with different numbers, but the error still appears.
finally I decided to run Orthofinder on Ubuntu, and in this case it runs propertly, since for me is more useful to run Orthofinder from a computational cluster I feel a bit lost and hope tht you can help me out with this issue

Kind regards.

Ivan

@davidemms
Copy link
Owner

Hi Ivan

Thanks for confirming you're seeing similar behaviour on v.2.3.11. Could you check the suggestions in my previous reply and let me know how they relate to your case?

Many thanks
David

@IvanV87
Copy link

IvanV87 commented Feb 25, 2020

Hi David.

I managed to run Orthofinder 2.3.11 via a bash scrip using export OPENBLAS_NUM_THREADS=1 as explained in this link https://fossies.org/linux/OpenBLAS/USAGE.md, at the time of tis post im still wating for the whole software to end its run, but I will report when its finished.

Kind regards

Ivan

@davidemms
Copy link
Owner

That's great, thanks!

@IvanV87
Copy link

IvanV87 commented Feb 26, 2020

Ok, the run of Orthofinder is complete without any issues, so for other users if you encounter the OpenBLAS Warining the way to solve it is to use export OPENBLAS_NUM_THREADS=1.

Regards

Ivan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants