You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running for a C768 run as part of global-workflow produces a specification with nodes=1, ppn=4, and tpp=1.
Running with ush/run_verif_global_in_global_workflow.sh produces a job with nproc=${npe_node_metp_gfs}=1.
When run on HERA, scripts/exgrid2grid_step1.sh launches the METplus job with srun --multi-prog /path/to/task-file, where task-file has nproc lines detailing commands to execute. srun then fails because it can't find as many tasks as it wants; I think it is defaulting to four tasks.
Changing scripts/exgrid2grid_step1.sh to specify --ntasks ${nproc} as part of the srun command allows the process to finish. A better solution probably involves changing how ush/run_verif_global_in_global_workflow.sh determines nproc: man sbatch suggests SLURM_NTASKS, but global-workflow probably has a variable to specify the number of threads that would be less closely tied to the job manager.
The text was updated successfully, but these errors were encountered:
@DWesl This was recently fixed in the global-workflow as part of an overhaul of the resource configuration system. The job now runs with a single task by default. See NOAA-EMC/global-workflow#2804 and let me know if updating your global-workflow resolves the issue.
Running for a C768 run as part of global-workflow produces a specification with
nodes=1
,ppn=4
, andtpp=1
.Running with
ush/run_verif_global_in_global_workflow.sh
produces a job withnproc=${npe_node_metp_gfs}=1
.When run on HERA,
scripts/exgrid2grid_step1.sh
launches the METplus job withsrun --multi-prog /path/to/task-file
, wheretask-file
hasnproc
lines detailing commands to execute.srun
then fails because it can't find as many tasks as it wants; I think it is defaulting to four tasks.Changing
scripts/exgrid2grid_step1.sh
to specify--ntasks ${nproc}
as part of thesrun
command allows the process to finish. A better solution probably involves changing howush/run_verif_global_in_global_workflow.sh
determinesnproc
:man sbatch
suggestsSLURM_NTASKS
, butglobal-workflow
probably has a variable to specify the number of threads that would be less closely tied to the job manager.The text was updated successfully, but these errors were encountered: