fsl.check_first(): Second attempt at refinement #2609

Lestropie · 2023-03-22T00:18:50Z

Response to suggestion by @glasserm in #2597.

fsl_sub seemingly has a built-in functionality for halting execution until all asynchronous jobs have been completed. Hopefully grabbing the job ID of the final executed job in run_first_all and waiting on that will be sufficient.

What I don't know is the point in time at which this functionality was added to fsl_sub, and therefore whether this change will only work on some minimum version of FSL. But failures of this test should be caught and check_first() will revert to prior behaviour.

Requires testing on a functioning SGE cluster.

If possible, use fsl_sub command to halt execution until all jobs have completed. Result of discussion in #2597. Addresses #2595. Replicates some contents of bd3f19e.

glasserm · 2023-03-22T01:50:32Z

This didn't work and I don't know enought python to fix it:

Command:  run_first_all -m none -s L_Accu,R_Accu,L_Caud,R_Caud,L_Pall,R_Pall,L_Puta,R_Puta,L_Thal,R_Thal,L_Amyg,R_Amyg,L_Hipp,R_Hipp -i T1.nii -o first -b -d

5ttgen: [ERROR] Unhandled Python exception:
5ttgen: [ERROR]   TypeError: bufsize must be an integer
5ttgen: [ERROR] Traceback:
5ttgen: [ERROR]   /home/brainmappers/miniconda3/bin/5ttgen:58 (in execute())
5ttgen: [ERROR]     alg.execute()
5ttgen: [ERROR]   /home/brainmappers/miniconda3/lib/mrtrix3/_5ttgen/fsl.py:192 (in execute())
5ttgen: [ERROR]     fsl.check_first('first', structures=sgm_structures, first_stdout=first_stdout)
5ttgen: [ERROR]   /home/brainmappers/miniconda3/lib/mrtrix3/fsl.py:53 (in check_first())
5ttgen: [ERROR]     return_code = subprocess.call('fsl_sub', '-j', str(job_id))
5ttgen: [ERROR]   /home/brainmappers/miniconda3/lib/python3.8/subprocess.py:340 (in call())
5ttgen: [ERROR]     with Popen(*popenargs, **kwargs) as p:
5ttgen: [ERROR]   /home/brainmappers/miniconda3/lib/python3.8/subprocess.py:757 (in __init__())
5ttgen: [ERROR]     raise TypeError("bufsize must be an integer")
5ttgen: Scratch directory retained; location: /media/myelin/brainmappers/Connectome_Project/YA_HCP_Final/100307/T1w/Diffusion/5TTfsl

Lestropie · 2023-03-22T02:59:14Z

Whoops sorry, that was me ruining correspondence between code tested and code pushed. 18e5910 should at least resolve that.

On my own system fsl_sub -j runs, but says "Either supply a command to run or a parallel task file". I'm wondering if it's not possible for the -j option to be used in isolation, it only allows for a submitted job to wait for some other job to complete before itself commencing. In which case perhaps the qstat solution is necessary?

glasserm · 2023-03-22T10:52:09Z

I get the same. I wasn't aware of any fsl_sub halting functionality.

Command:  run_first_all -m none -s L_Accu,R_Accu,L_Caud,R_Caud,L_Pall,R_Pall,L_Puta,R_Puta,L_Thal,R_Thal,L_Amyg,R_Amyg,L_Hipp,R_Hipp -i T1.nii -o first -b -d
usage: fsl_sub [-h] [-a ARCH] [-c COPROCESSOR] [--coprocessor_multi COPROCESSOR_MULTI] [--coprocessor_class COPROCESSOR_CLASS] [--coprocessor_class_strict] [--coprocessor_toolkit COPROCESSOR_TOOLKIT] [-F] [-j JOBHOLD]
               [--not_requeueable] [--array_hold ARRAY_HOLD] [-l LOGDIR] [-m MAILOPTIONS] [-M MAILTO] [-n] [-N NAME] [-p -1023-0] [-q QUEUE] [-r RESOURCE] [--delete_job DELETE_JOB] [--extra EXTRA] [-R GB]
               [-s [threads,]SLOTS] [-t ARRAY_TASK] [--array_native ARRAY_NATIVE] [-x NUMBER] [--keep_jobscript] [--has_coprocessor COPROCESSOR_NAME] [--has_queues] [--export EXPORT] [--project PROJECT] [-S] [-T MINUTES]
               [--show_config] [-v] [-V] [-z file]
               ...
fsl_sub: error: No command or array task file provided

5ttgen: [ERROR] FSL FIRST has failed; 0 of 14 structures were segmented successfully (check /media/myelin/brainmappers/Connectome_Project/YA_HCP_Final/100307/T1w/Diffusion/5TTfsl/first.logs)

You could submit a job with a hold on the last FIRST JobID that, for example, creates a file indicating that everything has completed and then check for the presence of that file if you don't want to interact with qstat directly.

Lestropie · 2023-03-24T00:33:39Z

You could submit a job with a hold on the last FIRST JobID

The thought had crossed my mind. Just wary of the extent to which this is becoming increasingly clumsy. Decided to avoid qstat as that would require an alternative usage of squeue in case fsl_sub uses SLURM. Let's see if f8477aa does the job; it has no issue running in the absence of SGE, and I've tested what I could.

glasserm · 2023-03-24T01:49:56Z

That worked. Thanks for your efforts here.

Conflicts: lib/mrtrix3/fsl.py

fsl.check_first(): Second attempt at refinement

0703abe

If possible, use fsl_sub command to halt execution until all jobs have completed. Result of discussion in #2597. Addresses #2595. Replicates some contents of bd3f19e.

Lestropie added bug scripts test wanted labels Mar 22, 2023

Lestropie self-assigned this Mar 22, 2023

fsl.check_first() fixes

18e5910

fsl.check_first(): Use dummy fsl_sub job to detect completion

f8477aa

Lestropie marked this pull request as ready for review March 24, 2023 04:17

Lestropie requested a review from a team March 24, 2023 04:18

fsl.check_first(): pylint fixes

7a661ad

Lestropie mentioned this pull request Apr 9, 2023

fsl.check_first(): Utilise PID #2597

Closed

Lestropie mentioned this pull request Aug 17, 2024

mrtrix3.path.wait_for(): Fix variable ghosting #2968

Merged

Lestropie added this to the 3.0.5 updates milestone Sep 16, 2024

Merge remote-tracking branch 'origin/master' into check_first_taketwo

0f1504e

Conflicts: lib/mrtrix3/fsl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fsl.check_first(): Second attempt at refinement #2609

fsl.check_first(): Second attempt at refinement #2609

Lestropie commented Mar 22, 2023

glasserm commented Mar 22, 2023 •

edited by Lestropie

Loading

Lestropie commented Mar 22, 2023

glasserm commented Mar 22, 2023 •

edited by Lestropie

Loading

Lestropie commented Mar 24, 2023

glasserm commented Mar 24, 2023

fsl.check_first(): Second attempt at refinement #2609

Are you sure you want to change the base?

fsl.check_first(): Second attempt at refinement #2609

Conversation

Lestropie commented Mar 22, 2023

glasserm commented Mar 22, 2023 • edited by Lestropie Loading

Lestropie commented Mar 22, 2023

glasserm commented Mar 22, 2023 • edited by Lestropie Loading

Lestropie commented Mar 24, 2023

glasserm commented Mar 24, 2023

glasserm commented Mar 22, 2023 •

edited by Lestropie

Loading

glasserm commented Mar 22, 2023 •

edited by Lestropie

Loading