-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ecflow package for wcoss2 GFS transition #555
Update ecflow package for wcoss2 GFS transition #555
Conversation
WCOSS2 Migration and Porting NOAA-EMC#398
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed upto the file:
ecf/scripts/gfs/atmos/post_processing/bufr_sounding/jgfs_atmos_postsnd.ecf
But not head.h
as it is very complicated.
Also PR #553 has been merged, so please rebase this PR based on current feature/ops-wcoss2
Please see comments.
ecf/include/envir-p1.h
Outdated
#module load prod_envir/2.0.4 prod_util/2.0.9 envvar/1.0 | ||
echo debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented and unnecessary echo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -43,6 +37,7 @@ module list | |||
############################################################# | |||
export cyc=%CYC% | |||
export cycle=t%CYC%z | |||
export CDATE=$PDY$cyc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is set in envir-p1.h
. It does not need to be set here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
module load gempak/${gempak_ver} | ||
module load libjpeg/${libjpeg_ver} | ||
module load grib_util/${grib_util_ver} | ||
module load imagemagick/${imagemagick_ver} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is ${imagemagic_ver}
coming from? I don't see it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This job need module imagemagic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I understand that. But what is the value of imagemagik_ver
and where is it set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It need to be set in run.ver
#PBS -q %QUEUE% | ||
#PBS -A %PROJ%-%PROJENVIR% | ||
#PBS -l walltime=00:20:00 | ||
#PBS -l place=vscatter,select=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WenMeng-NOAA On WCOSS_Dell, this job (and the other jgfs_atmos_post_fXXX
jobs), they take 4 Nodes totaling to 112 tasks. See lines 17,18 on the left.
Am I reading this right that this job (and other jgfs_atmos_post_fXXX
jobs) are requesting 10 nodes totaling to 1280 tasks for a single forecast hour.
#PBS -q %QUEUE% | ||
#PBS -A %PROJ%-%PROJENVIR% | ||
#PBS -l walltime=04:00:00 | ||
#PBS -l place=vscatter,select=1:ncpus=128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this job only needs 1 task as it only sleeps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed from jgfs_atmos_post_anl.ecf to the end.
export dictionaries_ver="v3.4.0" | ||
export omb_ver="v4.4" | ||
|
||
## main "model" software |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these commented out lines should be removed.
export radarl2_ver=v1.2 | ||
|
||
#### NCO requested testing location is v16.2 | ||
#### export gfs_ver="v16.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed unused line
This PR has been partially reviewed. |
ecf/include/envir-p1.h
Outdated
#### Developer overwrite | ||
export SIPHONROOT=${UTILROOT}/fakedbn | ||
#### | ||
else | ||
export SIPHONROOT=${UTILROOT}/fakedbn | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is developer only. Not used by NCO and thus will not be addressed by NCO.
Please resolve the if-else logic as it is executed regardless of the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
ecf/include/envir-p1.h
Outdated
|
||
export DBNROOT=$SIPHONROOT | ||
|
||
####if [[ ! " prod para test " =~ " ${envir} " && " ops.prod ops.para " =~ " $(whoami) " ]]; then err_exit "ENVIR must be prod, para, or test [envir-p1.h]"; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out? If it is unused, what purpose does it serve?
Either put a comment on this explaining the purpose or remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored
ecf/include/envir-p1.h
Outdated
export DBNROOT=$SIPHONROOT | ||
|
||
####if [[ ! " prod para test " =~ " ${envir} " && " ops.prod ops.para " =~ " $(whoami) " ]]; then err_exit "ENVIR must be prod, para, or test [envir-p1.h]"; fi | ||
export ECF_PORT=34326 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line. This variable is set when ecflow
module is loaded. ecflow
module is loaded in head.h
. This variable is unique to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job tested without this line sometime failed silently. There is a good reason to include this in here.
As discussed, this file will be used only for reference. I prefer to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes failure is a symptom of the system instability. I would rather encounter this failure sometimes, than hard-wire some users personal ECF_PORT information. I suggest removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
|
||
module list | ||
|
||
############################################################# | ||
# WCOSS environment settings | ||
############################################################# | ||
export KMP_AFFINITY=scatter | ||
export OMP_NUM_THREADS=$threads | ||
export FORT_BUFFERED=true | ||
export OMP_NUM_THREADS_CY=28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is OMP_NUM_THREADS_CY=28
appropriate for WCOSS2? config.resources.nco.static
is setting this to npe_node_max
. 28
is the value for WCOSS_Dell_P3
and was hard-wired here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same is the case in jgfs_atmos_analysis.ecf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set to 128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set to 128
Are you sure, the value from this variable set here OMP_NUM_THREADS_CY
is being used?
I see https://github.com/NOAA-EMC/GSI/blob/a9561f4219e5a69ac1af250081ae027084c40c3b/scripts/exglobal_atmos_analysis.sh#L1017 setting the value based on NTHREADS_CYCLE
right before entering CYCLESH
|
||
module list | ||
|
||
############################################################# | ||
# WCOSS environment settings | ||
############################################################# | ||
export KMP_AFFINITY=scatter | ||
export OMP_NUM_THREADS=$threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the value for $threads
coming from? It was previously set on line 20 above and has been removed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
…re/ecflow-ops-ecf2
- Move enkf out of gdas and rename it to enkfgdas. Include all ecflow definition files job name Include all ecflow scripts name and job/log name - Move "model=gfs" to the top on each job except all jobs under obsproc. obsproc will no longer be part of GFS. Therefore leave it without change for testing purpose. - Remove the source of model_ver from each ecflow script except all jobs under obsproc. obsproc will no longer be part of GFS. Therefore leave it without change for testing purpose.
Update analysis ecflow script to use 128 for wcoss2 Remove extra CDATE
if [[ "$envir" == prod && "$SENDDBN" == YES ]]; then | ||
export eval=%EVAL:NO% | ||
if [ $eval == YES ]; then | ||
export SIPHONROOT=${UTILROOT}/para_dbn | ||
else | ||
export SIPHONROOT=/lfs/h1/ops/prod/dbnet_siphon | ||
fi | ||
export SIPHONROOT=${UTILROOT}/fakedbn | ||
else | ||
export SIPHONROOT=${UTILROOT}/fakedbn | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the logic here, regardless of the conditions, the result is:
export SIPHONROOT=${UTILROOT}/fakedbn
Also, a variable eval
is being exported for use somewhere down when envir
== prod
and SENDDBN
is YES
.
Is that the desired outcome?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The developer will always use fakedbn to test. eval is ok here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does eval
do? Is it used anywhere other than the if-test?
If the developer will always use fakeddbn, then the first if test "$envir" == prod
is not necessary (and this whole block just boiled down to this):
export SIPHONROOT=${UTILROOT}/fakedbn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see more comments.
@@ -1,6 +1,6 @@ | |||
extern /prod18/gdas/jgdas_forecast | |||
extern /prod18/gdas/atmos/post | |||
extern /prod18/gdas/enkf/post | |||
extern /prod18/enkfgdas/post |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are new changes added in these .def
files that were agreed to be added in a follow-up PR in the meeting this AM. This PR is already big to add any additional changes.
module load ecflow/$ecflow_ver | ||
echo "ecflow module location: $(module display ecflow |& head -2 | tail -1 | sed 's/:$//')" | ||
export ECF_ROOT=/apps/ops/prod/nco/core/ecflow.v5.6.0.7 | ||
export ECF_PORT=34326 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove user specific PORT information. This value is set in the ecflow
module for the user and is unique to the user.
echo "Running module load ecflow/$ecflow_ver" | ||
module load ecflow/$ecflow_ver | ||
echo "ecflow module location: $(module display ecflow |& head -2 | tail -1 | sed 's/:$//')" | ||
export ECF_ROOT=/apps/ops/prod/nco/core/ecflow.v5.6.0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are dual uses of the variable ecflow_ver
and 5.6.0.7
for ecflow version. Which one is it? Please be consistent in the use of the version.
echo "Running module load ecflow/$ecflow_ver" | ||
module load ecflow/$ecflow_ver | ||
echo "ecflow module location: $(module display ecflow |& head -2 | tail -1 | sed 's/:$//')" | ||
export ECF_ROOT=/apps/ops/prod/nco/core/ecflow.v5.6.0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is set by the ecflow
module. It should not be overwritten. Load the appropriate module instead.
Please remove.
export ECF_ROOT=/apps/ops/prod/nco/core/ecflow.v5.6.0.7 | ||
. ${ECF_ROOT}/versions/run.ver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am failing to understand what this is doing. Why export
? and why loading prod_util
in head.h
export ECF_ROOT=/apps/ops/prod/nco/core/ecflow.v5.6.0.7 | ||
export ECF_PORT=34326 | ||
export ECF_HOST=ddecflow02 | ||
export ECF_INCLUDE=/lfs/h2/emc/global/noscrub/$USER/git/feature-ops-wcoss2/ecf/include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hardwired to a specific branch which will not exist as soon as it is merged and deleted.
I am failing to understand the need for this entire section from 40-47. It seems to be tailored for a specific branch and specific location.
@@ -0,0 +1,51 @@ | |||
#PBS -S /bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These EnKF jobs are new and we agreed to add them in a followup PR to avoid further bloating this current PR.
Forcibly submitting this PR due to need. |
This PR:
QUEUE_TRANSFER, QUEUE_SHARED and COM
$HOMEgfs/modulefiles/module_base.wcoss2.lua
Include all ecflow definition files job name
Include all ecflow scripts name and job/log name
obsproc will no longer be part of GFS. Therefore leave it without change for testing purpose.
obsproc will no longer be part of GFS. Therefore leave it without change for testing purpose.
This work is part of WCOSS2 Migration and Porting #398