Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tech doc updates (for master) #209

Merged
merged 5 commits into from
Aug 22, 2019
Merged

Conversation

ligiabernardet
Copy link
Collaborator

Updates to the CCPP Technical Documentation to reflect changes in the code management and regression tests of CCPP when used with NEMSfv3gfs. No source code was altered, so no regression tests are needed.

…to reflect:

a) The gmtb-scm code repository is now public.
b) The EMC NEMS repository is no longer in VLab as it has been moved to GitHub in August 2019. GMTB is forking this repository and no longer mirrorring it.
c) Given that the CCPP-enabling capabilities in the FV3, NEMS, and NEMSfv3gfs repositories were committed to EMC's master in July 2019, the primary branch for CCPP development in the GitHub.com/NCAR fork or mirrors is no longer a special GMTB area (gmtb/ccpp) but is now the primary code (master for NEMSfv3gfs and FV3 and develop for NEMS). The difference in terminogy (master versus develop) is due to the transition of EMC's codes to GitHub and use of GitFlow (NEMS is now on GitHub using GitFlow, so develop is the proper origin/upstream).
d) A clarification on Section 7.4.1 (Creating a PR) asking developers to, when relevant, note in the PR message the existence of associated PRs in separate repositories.
… reflect changes in the NEMSfv3gfs regression tests implemented when the CCPP-enabling changes to FV3, NEMS, and NEMSfv3gfs were committed to EMC master in July 2019.
@codecov-io
Copy link

codecov-io commented Aug 18, 2019

Codecov Report

Merging #209 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #209   +/-   ##
=======================================
  Coverage   47.28%   47.28%           
=======================================
  Files          14       14           
  Lines        1343     1343           
=======================================
  Hits          635      635           
  Misses        708      708

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 063732f...efe917e. Read the comment docs.

Copy link
Collaborator

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least one typo plus some suggestions and questions.

@@ -342,7 +342,7 @@ Regression testing is the process of testing changes to the programs to make sur
Overview of the RTs
^^^^^^^^^^^^^^^^^^^

The RT configuration files are located in ``./tests`` relative to the top-level directory of NEMSfv3gfs and have names ``rt*.conf``. The default RT configuration file, supplied with the NEMSfv3gfs master, compares the results from the non-CCPP code to the *official baseline* and is called ``rt.conf``. Before running the RT script ``rt.sh`` in the same directory, the user has to set one or more environment variables and potentially modify the script to change the location of the automatically created run directories. The environment variables are ``ACCNR`` (mandatory unless the user is a member of the default project *nems*; sets the account to be charged for running the RTs), ``NEMS_COMPILER`` (optional for the ``intel`` compiler option, set to ``gnu`` to switch), and potentially ``RUNDIR_ROOT``. ``RUNDIR_ROOT`` allows the user to specify an alternative location for the RT run directories underneath which directories called ``rt_$PID`` are created (``$PID`` is the process identifier of the ``rt.sh`` invocation). This may be required on systems where the user does not have write permissions in the default run directory tree.
The RT configuration files are located in ``./tests`` relative to the top-level directory of NEMSfv3gfs and have names ``rt*.conf``. The default RT configuration file, supplied with the NEMSfv3gfs master is called ``rt.conf`` and runs four types of configurations: IPD PROD, IPD REPRO, CCPP PROD, and CCPP REPRO. For the IPD configurations, CCPP is not used, that is, the code is compiled with ``CCPP=N``. The PROD configurations use the compiler flags used in NCEP operations for for superior performance, while the REPRO configurations remove certain compiler flags to create b4b identical results between CCPP and IPD configurations. Before running the RT script ``rt.sh`` in directory ``./tests``, the user has to set one or more environment variables and potentially modify the script to change the location of the automatically created run directories. The environment variables are ``ACCNR`` (mandatory unless the user is a member of the default project *nems*; sets the account to be charged for running the RTs), ``NEMS_COMPILER`` (optional for the ``intel`` compiler option, set to ``gnu`` to switch), and potentially ``RUNDIR_ROOT``. ``RUNDIR_ROOT`` allows the user to specify an alternative location for the RT run directories underneath which directories called ``rt_$PID`` are created (``$PID`` is the process identifier of the ``rt.sh`` invocation). This may be required on systems where the user does not have write permissions in the default run directory tree.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"operations for for superior" ==> "operations for superior"
"location for the RT run directories underneath which directories" really needs a comma
==> "location for the RT run directories, underneath which directories"
I do not see where I would modify the script -- it looks like all environment variables. Did I miss something?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mods look fine to me. Good catch with the typo and comma.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address the typo and comma. Variables RUNDIR_ROOT and NEMS_COMPILER can be set in your shell or entered in the script. Variable ACCNR needs to be set in the script. So, I think this part of the text is okay.

Copy link
Collaborator

@climbfuji climbfuji Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, export ACCNR=gmtb and then running the script works fine.


This command and all others below produce log output in ``./tests/log_machine.compiler``. These log files contain information on the location of the run directories that can be used as templates for the user. Each ``rt*.conf`` contains one or more compile commands preceding a number of tests.


Baselines
^^^^^^^^^^^^^^^^^^^

Regression testing is only possible on machines for which baselines exist. EMC maintains *official baselines* of non-CCPP runs on *Jet* and *Wcoss* created with the Intel compiler. GMTB maintains additional baselines on *Theia, Jet, Cheyenne*, and *Gaea*. While GMTB is trying to keep up with changes to the official repositories, baselines maintained by GMTB are not guaranteed to be up-to-date.
Regression testing is only possible on machines for which baselines exist. EMC maintains *official baselines* on *Theia* and *Wcoss* created with the Intel compiler. GMTB maintains additional baselines on *Jet*, *Cheyenne*, and *Gaea*. While GMTB is trying to keep up with changes to the official repositories, baselines maintained by GMTB are not guaranteed to be up-to-date.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are not guaranteed to be up-to-date

For each GMTB-stored baseline, it is clear which code version was tested? If so, I would state this (along with how to determine the tested code revision).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression test baselines sit in directories .../trunk-YYYYMMDD/... - the procedure is to copy the latest baseline directory over from theia (because it also contains the input data and some o the config files) and then update them by running the tests in "create" mode with the version of the code that corresponds to this date tag. Thus, either the directories are there (and the baseline is up to date), or they aren't there at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection between the baseline and the version of the code used to create the baseline is done solely by the date in the directory name of the baseline. In order to obtain the code used to create a baseline dated YYYYMMDD, a person would obtain top of the master as of this date.
To make this more clear, I will change text from "Note that yyyymmdd is the year, month and day the RT was created." to "Note that yyyymmdd is the year, month and day the baseline was created using top of master code."

+---------------------------------------------+----------------------+
| https://github.com/NCAR/ccpp-physics | master |
+---------------------------------------------+----------------------+
| https://github.com/NCAR/ccpp-framework | master |
+---------------------------------------------+----------------------+
| https://github.com/NCAR/NEMS | gmtb/ccpp |
| https://github.com/NCAR/NEMS | develop |
+---------------------------------------------+----------------------+
| https://github.com/NCAR/FMS | GFS-FMS |
+---------------------------------------------+----------------------+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it unusual that users are pointed to branches rather than tags. What sort of users are targeted here?
Also, aren't most of the repositories checked out as submodules? Those never create branches (and usually do not even clone the full branch or repository).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unusual, but equivalent. We are always keeping the branches listed above in sync, i.e. they head of each of them is guaranteed to work with all the others.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should list branches and tags. The branch information is needed if someone is going to create a pull request for a certain branch of NEMSfv3gfs and its submodules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But tags will change much more often than the technical documentation!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation is for top of master. Target audience is model developers, which will likely be modifying one or more submodules of NEMSfv3gfs. Because they will be creating PR of innovations, they cannot work from fixed tags.
All repositories under NEMSfv3gfs are submodules. Those are ccpp-physics, ccpp-framework, FV3, NEMS etc. Developers will be modifying one or more of these submodules.
Note that we also offer a public release of CCPP with the Single-Column Model. In that case, we point to tags because the public release is stable and does not change.

@@ -67,9 +66,9 @@ https://github.com/NCAR/ccpp-framework

https://github.com/NCAR/ccpp-physics

Users have read-only access to these repositories and as such cannot accidentally destroy any important (shared) branches of these authoritative repositories. Both CCPP repositories are public (no GitHub account required) and may be used directly to read or create forks. Write permission is generally restricted, however. The SCM repository is private, to request access please send a message and your GitHub username to gmtb-help@ucar.edu.
Users have read-only access to these repositories and as such cannot accidentally destroy any important (shared) branches of these authoritative repositories. Both CCPP repositories and the SCM repositories are public (no GitHub account required) and may be used directly to read or create forks. Write permission is generally restricted, however.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up on line 63, I think "As for NEMSfv3gfs" would be easier to understand as "As with NEMSfv3gfs"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make this change, tks.

@@ -162,7 +161,7 @@ Start with checking out the main repository from the NCAR GitHub

.. code-block:: console

git clone -b gmtb/ccpp https://github.com/NCAR/NEMSfv3gfs
git clone -b https://github.com/NCAR/NEMSfv3gfs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? I thought the -b switch required a branch name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe you need to specify the branch name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the "-b", so it will read
git clone https://github.com/NCAR/NEMSfv3gfs

@@ -162,7 +161,7 @@ Start with checking out the main repository from the NCAR GitHub

.. code-block:: console

git clone -b gmtb/ccpp https://github.com/NCAR/NEMSfv3gfs
git clone -b https://github.com/NCAR/NEMSfv3gfs
cd NEMSfv3gfs
git submodule init
git submodule update
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this recipe is used instead of using --recurse-submodules? Something like:

git clone -b master --recurse-submodules https://github.com/NCAR/NEMSfv3gfs
cd NEMSfv3gfs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can answer this one. NEMS contains a nested submodule, which at this point still lives on Vlab. The code of this submodule is not required for the "normal" user (only when running the NEMSCompset regression test version, which is different from the regression tests run through rt.sh). Because it is in Vlab, a --recursive leads to a checkout error. Doing it the way Ligia described it avoids this problem.

Copy link

@JulieSchramm JulieSchramm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update of the tech document looks good to me, with the corrections suggested by Steve and Dom.

- Fix typo and comma use
- Make a few sentences more clear
- Fix bug in git clone command
@@ -342,7 +342,7 @@ Regression testing is the process of testing changes to the programs to make sur
Overview of the RTs
^^^^^^^^^^^^^^^^^^^

The RT configuration files are located in ``./tests`` relative to the top-level directory of NEMSfv3gfs and have names ``rt*.conf``. The default RT configuration file, supplied with the NEMSfv3gfs master, compares the results from the non-CCPP code to the *official baseline* and is called ``rt.conf``. Before running the RT script ``rt.sh`` in the same directory, the user has to set one or more environment variables and potentially modify the script to change the location of the automatically created run directories. The environment variables are ``ACCNR`` (mandatory unless the user is a member of the default project *nems*; sets the account to be charged for running the RTs), ``NEMS_COMPILER`` (optional for the ``intel`` compiler option, set to ``gnu`` to switch), and potentially ``RUNDIR_ROOT``. ``RUNDIR_ROOT`` allows the user to specify an alternative location for the RT run directories underneath which directories called ``rt_$PID`` are created (``$PID`` is the process identifier of the ``rt.sh`` invocation). This may be required on systems where the user does not have write permissions in the default run directory tree.
The RT configuration files are located in ``./tests`` relative to the top-level directory of NEMSfv3gfs and have names ``rt*.conf``. The default RT configuration file, supplied with the NEMSfv3gfs master is called ``rt.conf`` and runs four types of configurations: IPD PROD, IPD REPRO, CCPP PROD, and CCPP REPRO. For the IPD configurations, CCPP is not used, that is, the code is compiled with ``CCPP=N``. The PROD configurations use the compiler flags used in NCEP operations for superior performance, while the REPRO configurations remove certain compiler flags to create b4b identical results between CCPP and IPD configurations. Before running the RT script ``rt.sh`` in directory ``./tests``, the user has to set one or more environment variables and potentially modify the script to change the location of the automatically created run directories. The environment variables are ``ACCNR`` (mandatory unless the user is a member of the default project *nems*; sets the account to be charged for running the RTs), ``NEMS_COMPILER`` (optional for the ``intel`` compiler option, set to ``gnu`` to switch), and potentially ``RUNDIR_ROOT``. ``RUNDIR_ROOT`` allows the user to specify an alternative location for the RT run directories, underneath which directories called ``rt_$PID`` are created (``$PID`` is the process identifier of the ``rt.sh`` invocation). This may be required on systems where the user does not have write permissions in the default run directory tree.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why an environment variable needs to be set in the script.
Is there a reason there cannot be a standard bash environment override such as:

ACCNR=${ACCNR:-""}

with a test for a blank ACCNR located before it is used (say around line 555)?

if [ -z "${ACCNR}" ]; then
    echo "ERROR: set ACCNR to a valid account key"
fi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment somewhere else in this thread. export ACCR=gmtb works just as fine, I am doing this all the time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here: https://github.com/NCAR/NEMSfv3gfs/blob/8a1e235d57a13d40e8ba71d3ce174d62dfa4f265/tests/rt.sh#L569 for theia and some other platforms, no ACCNR is set a priori. The reason is that there used to be a script that was supposed to determine the account number automatically (based on the user's groups and certain rules), but this doesn't work anymore since the transition to slurm (and didn't work so well either beforehand), hence no default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the script could be modified (around line 555) to override a shell environment variable. However, at this time I am just documenting the script as it is. I will change the text from "the user has to set one or more environment variables and potentially modify the script to change the location of the automatically created run directories. The environment variables are..." to "the user has to set one or more environment variables on the working shell or in the script. The environment variables are..."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify the situation.

On the WCOSS platforms (all phases/partitions), the variable ACCNR is set explicitly in the script and the user has to modify the corresponding lines. On all other platforms (cheyenne, theia, jet, gaea), ACCNR is explicitly NOT set in the host environment, and for one of them there is even a comment as to why:

#  DO NOT SET AN ACCOUNT EVERYONE IS NOT A MEMBER OF
#  USE AN ENVIRONMENT VARIABLE TO SET ACCOUNT
#  ACCNR=cmp

This allows the user to do exactly what the comment says.

This is on purpose I think, because only the code managers and others involved in the actual hand-off to NCO are working on these systems. We should probably restrict our documentation to the user development platforms, in which case all variables can and should be set in the form of environment variables.

Copy link
Collaborator

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving but I believe the user interface could be improved and simplified with a few lines of code in rt.sh.

@ligiabernardet
Copy link
Collaborator Author

Let me know if the following language is acceptable. If not, please suggest a language.
Instead of:
" the user has to set one or more environment variables on the working shell or in the script. The environment variables are ACCNR (mandatory unless the user is a member of the default project nems; sets the account to be charged for running the RTs), NEMS_COMPILER (optional for the intel compiler option, set to gnu to switch), and potentially RUNDIR_ROOT. RUNDIR_ROOT allows the user to specify an alternative location for the RT run directories, underneath"
Use:
"the user has to set some environment variables on the working shell: ACCNR (account to be charged for running the RTs), NEMS_COMPILER (optional for the intel compiler option, set to gnu to switch), and potentially RUNDIR_ROOT (location for the RT run directories), underneath

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience with my picky comments on rt.sh. The script may not be the most user-friendly, but it was written (not by me!) for a specific purpose and it does serve this purpose quite well.

Approved, waiting to see if Grant wants to add something.

@climbfuji climbfuji changed the title Tech doc updates Tech doc updates (for master) Aug 21, 2019
@@ -435,25 +435,6 @@ In case a user does not have write permissions to ``$STMP (/scratch4/NCEPDEV/stm
RUNDIR_ROOT=${RUNDIR_ROOT:-${PTMP}/${USER}/FV3_RT}/rt_$$


Non-CCPP vs CCPP Tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked this explanation of creating your own baseline. Perhaps it is not the job of CCPP documentation to explain to users how to do so, but is this information found anywhere else? I realize that the specifics of non-CCPP vs CCPP in this section is longer valid, but the underlying instructions regarding personal baselines was nice and I'm sad to see it go.

Copy link
Collaborator

@grantfirl grantfirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@climbfuji climbfuji merged commit b8d5ea4 into NCAR:master Aug 22, 2019
@ligiabernardet ligiabernardet deleted the TechDoc_updates branch August 23, 2019 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants