Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update testing, in particular for diagnostics and decompositions #602

Merged
merged 3 commits into from
May 26, 2021

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented May 24, 2021

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium,
please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers

PR checklist

  • Short (1 sentence) summary of your PR:
    Update testing, in particular for diagnostics and decompositions
  • Developer(s):
    apcraig
  • Suggest PR reviewers from list in the column to the right.
  • Please copy the PR test results link or provide a summary of testing completed below.
    Full test suite run on cheyenne with 3 compilers. Tests are bit-for-bit except boxslotcyl which had a change to the namelist settings (tr_lvl=tr_pond_lvl=.false.) and the calchk unit test which was set to 1000 years of testing by default, but should have been 100,000 years. Several new tests were added.
  • How much do the PR code changes differ from the unmodified code?
    • bit for bit (except as noted above)
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on Icepack or any other models?
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please provide any additional information or relevant details below:

List of changes are below. Of particular note

  • a couple bugs were fixed for sections of the code that we were generally not using/testing but are now
    • error in size of allocation in 1d rake
    • out of bounds issue for fcontopn_d and fsurfn_d when ncat > 6
  • the dbug namelist was renamed to forcing_diag which could impact users
  • two new namelist were added, debug_model (logical) and debug_model_step (integer) and these trigger the debug_ice diagnostics after the timestep associated with debug_model_step.
  • there is a new file for testing the wghtfile decomposition option, CICE_data/grid/gx1/cice62_gx1_wghtmask.nc. This file is on cheyenne and I will move it to a few other testing platforms, but maybe it should be added to the gx1 input files. This file represents the number of months that ice existed at any grid point during the 2005-2009 gx1 production test run and is used to allocate blocks to procs at gx1 with the wghtfile test decomposition. The maximum value is 120 and these values are read and normalized during the decomposition step in CICE.

These changes address many issues in #437 as well as lines 7-17 and maybe 81-82 in https://docs.google.com/spreadsheets/d/12tIfm5OzvzH_LF_ie9WbOn_w5aY1hv4SrqTQQ8bqM-s. Code coverage should go up after this update and that will be evaluated separately.

Otherwise, a summary of all changes follows

  • Add alt06 test, ncat=7, kcatbound=3, nslyr=3
  • Add bigdiag test with lots of diagnostic output, debug_model=true, debug_blocks=true, etc
  • Add spiralcenter decomp test
  • Add gx1 weightfile decomp test
  • Add gbox180 with spacecurve decomp test to trigger Peano and Cinco decompositions
  • Update sectrobin decomp test to check land block elimination
  • Rename boxdyn to boxnodyn
  • Update calchk to check 100,000 years instead of 1000 years (which was set for quick debugging, but was not supposed to be checked in).
  • Modify boxslotcyl, turn off tr_lvl and tr_pond_lvl, changes answers
  • Clarify boxadv, boxnodyn, boxrestore, explicitly set shortwave=ccsm3, albedo_type=constant
  • Cleanup spacecurve implementation including debug flagging, recode print to write, module private, add printcurve for gridsize of 30, remove qsort and partition subroutines.
  • Deprecate IsLoadBalanced in ice_spacecurve
  • Remove CICE_RunMod.F90_debug and migrate high level debug checks to CICE_RunMod.F90 with new debug_model namelist. Also add debug_model_step to specify at what timestep to start writing debug_model output. debug_model writes data for the first point specified by lonpnt, latpnt.
  • Move debug_ice routine into ice_diagnostics.F90 and out of CICE.F90
  • Remove writeout_finished_file from CICE_Finalize in several drivers
  • Deprecate print_points_state via an ifdef
  • Rename l_stop in ice_transport_driver.F90 to ckflag. This looked like l_stop logic because of the naming but it isn't.
  • Fix issue of size of fcontopn_d and fsurfn_d when ncat greater than 6 in init_coupler_flux. Would go out of bounds, now initializes fcontopn and fsurfn to 6th value for any category greater than 6.
  • Rename dbug variable name in ice_forcing.F90 to forcing_diag and change namelist variable name.
  • Update ice_distributionGet output and implementation to support more general arguments
  • Add ice_distributionGet test in init_domain_distribution when debug_blocks is true.
  • Fix bug in create_distrb_rake for 1d rake of allocation size. Add a test that triggers the 1d rake.
  • Update debug flag in ice_distribution.F90 leverage debug_blocks.
  • Update complog part of baseline script to report missing data as MISS instead of FAIL
  • Update documentation.

- add alt06 test, ncat=7, kcatbound=3, nslyr=3
- add bigdiag test with lots of diagnostic output, debug_model=true, debug_blocks=true, etc
- add spiralcenter decomp test
- add gx1 weightfile decomp test
- add gbox180 with spacecurve decomp test to trigger Peano and Cinco decompositions
- rename boxdyn to boxnodyn
- modify boxslotcyl, turn off tr_lvl and tr_pond_lvl, changes answers
- clarify boxadv, boxnodyn, boxrestore, explicitly set shortwave=ccsm3, albedo_type=constant
- cleanup spacecurve implementation including debug flagging, recode print to write, module private,
  add printcurve for gridsize of 30, remove qsort and partition subroutines.
- remove CICE_RunMod.F90_debug and migrate high level debug checks to CICE_RunMod.F90 with new
  debug_model namelist.  Also add debug_model_step to specify at what timestep to start writing
  debug_model output.  debug_model writes data for the first point specified by lonpnt, latpnt.
- move debug_ice routine into ice_diagnostics.F90 and out of CICE.F90
- remove writeout_finished_file from CICE_Finalize in several drivers
- deprecate print_points_state via an ifdef
- rename l_stop in ice_transport_driver.F90 to ckflag.  This looked like l_stop logic because
  of the naming but it isn't.
- fix issue of size of fcontopn_d and fsurfn_d when ncat greater than 6 in init_coupler_flux.  Would
  go out of bounds, now initializes fcontopn and fsurfn to 6th value for any category greater than 6.
- rename dbug variable name in ice_forcing.F90 to forcing_diag and change namelist variable name.
- update ice_distributionGet output and implementation to support more general arguments
- add ice_distributionGet test in init_domain_distribution when debug_blocks is true.
- fix bug in create_distrb_rake for 1d rake.  add a test that triggers the 1d rake.
- update debug flag in ice_distribution.F90 leverage debug_blocks.
- update documentation.
- Update calchk to check 100,000 years
- Update sectrobin decomp test to check land block elimination
Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @apcraig. I have some questions but nothing that keeps me from approving this as it is.

Does anyone (or any modeling group) use the space-filling curves distribution?

In the diff of ug_case_setting.rst, it looks like the first many (200?) lines of the file are italized but later ones are not. Is that a problem in the file itself, or just with this diff?

@@ -29,14 +29,16 @@
diagfreq = 24
diag_type = 'stdout'
diag_file = 'ice_diag.d'
debug_model = .false.
debug_model_step = 999999999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to set debug_model_step to 0 so that it always starts printing immediately, as the default? Assuming it's ignored unless debug_model=T.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was 99999999 by default before, so I kept it that way. I'm happy to make the default 0, let me know if you prefer that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0 would be better, since you have a flag to turn it off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll make this change in the next test coverage PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it in this PR too. I sort of trying to avoid doing some extra testing. Given this change is minor, nobody is going to use this feature at the moment, and this will be updated on the trunk in the next week (hopefully), I thought I'd just wait. I have three options

  • make the change on this PR and rerun all or a subset of tests
  • make the change on the PR and not test
  • make the change in the next PR and carry out a full test suite (which I'd do anyway)

I was going for the 3rd option. I can do the 1st. I try to avoid doing the 2nd when changes occur in the source code or scripts as it's bad practice and risky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, whatever works best for you.

@@ -23,3 +23,4 @@ Ktens = 0.
e_ratio = 2.
seabed_stress = .true.
use_bathymetry = .true.
l_mpond_fresh = .true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If particular changes are addressing lines in the spreadsheet, could you add a note on the line that says which test script was changed, and how?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recommendation came from #437. #437 (comment). But it doesn't change answers which is a little concerning and something that we need to look into more probably.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what to look for: l_mpond_fresh might only be used for coupling, and the mixed layer model only calculates heat fluxes, not water/salt, so l_mpond_fresh might not be testable in our standalone configurations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far as I can tell, l_mpond_fresh shows up in CICE_RunMod.F90 in the standalone model as

            if (l_mpond_fresh) then
               fpond(i,j,iblk) = fpond(i,j,iblk) * rhofresh/dt
               fresh(i,j,iblk) = fresh(i,j,iblk) - fpond(i,j,iblk)
            endif

I'm not sure what happens to fpond or fresh after that, but it doesn't seem to impact the solution. I guess I think that's OK. When we turn on options, it would be nice to see an impact, but if that's not possible, I still think it worthwhile to have a test that turns the flag on and at least tests the code in a technical sense.

configuration/scripts/tests/baseline.script Show resolved Hide resolved
doc/source/user_guide/ug_implementation.rst Show resolved Hide resolved
doc/source/user_guide/ug_troubleshooting.rst Show resolved Hide resolved
@@ -138,11 +145,11 @@ conflicts in module dependencies.

`print\_points` (**ice\_in**)
If true, print numerous diagnostic quantities for two grid cells,
one near the north pole and one in the Weddell Sea. This utility
defined by `lonpnt` and `latpnt` in the namelist file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this will be generalized so that the user alternatively can insert the indices directly, perhaps in a later PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my plan. That feature will be added soon.

Copy link
Contributor

@dabail10 dabail10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are pretty extensive changes, but look to be completely diagnostic. I think these are nice features and am fine with the naming and structure.

@apcraig apcraig merged commit 97370d7 into CICE-Consortium:master May 26, 2021
@phil-blain
Copy link
Member

Hi @apcraig,

  • there is a new file for testing the wghtfile decomposition option, CICE_data/grid/gx1/cice62_gx1_wghtmask.nc. This file is on cheyenne and I will move it to a few other testing platforms, but maybe it should be added to the gx1 input files. This file represents the number of months that ice existed at any grid point during the 2005-2009 gx1 production test run and is used to allocate blocks to procs at gx1 with the wghtfile test decomposition. The maximum value is 120 and these values are read and normalized during the decomposition step in CICE.

This should indeed be added to the gx1 input files, since it is used in the decomp_suite . I've just ran that suite and the daley_intel_restart_gx1_64x1x16x16x10_dwghtfile test failed because it's missing that file.

Comment on lines +184 to +185
if (debug_blocks .and. my_task == master_task) then
write(nu_diag,'(2a,3i8)') &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @apcraig, I did not have time to review this PR before it was merged.

Here I notice that you prevented the block distribution from being outputed for all procs. Was this intended ? If we are debugging decompositions, I would expect we want output from all procs....

I was using debug_blocks just now to debug a segfault and was very confused as to why all blocks were listed on proc 1, and I dug up this PR by looking at the history of create_local_block_ids...

@apcraig apcraig deleted the cov01 branch August 17, 2022 20:58
phil-blain added a commit to phil-blain/CICE that referenced this pull request Sep 1, 2023
In our in-house CICE4 version, we had special initialization code in
ice_flux::init_coupler_flux to initialize the 'fcondtopn_d' and
'fsurfn_d' arrays for ncat != 6. This was needed to avoid going out of
bounds on the loop on ncat where these arrays are used below to
initialize fcondtopn_f and fsurfn_f.

This was fixed in CICE in 97370d7 (Update testing, in particular for
diagnostics and decompositions (CICE-Consortium#602), 2021-05-26) by using the last
(6th) value of the array for any higher category. This leads to
'fcondtopn_d' having different values in CICE6 for the last 4
categories compared to our in-house CICE4.

To make side-by-side debugging of CICE6 and CICE4 easier, minimize the
differences between the initialization in both models by adding a 7th
value to fcondtopn_d. This makes its fcondtopn_f have identical values
in both models. Adjust 'fsurfn_d' accordingly.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Sep 1, 2023
In 97370d7 (Update testing, in particular for diagnostics and
decompositions (CICE-Consortium#602), 2021-05-26), the 'debug_model' namelist flag was
added and calls to 'debug_ice' were added throughout the standalone
model's 'ice_step'. This is useful for debugging model runs (in the
scientific sense) since this subroutine prints the complete ice state,
so calling it several times per time step can help diagnose where things
go wrong.

Do the same in the 'nemo_concepts' driver.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Jan 26, 2024
In our in-house CICE4 version, we had special initialization code in
ice_flux::init_coupler_flux to initialize the 'fcondtopn_d' and
'fsurfn_d' arrays for ncat != 6. This was needed to avoid going out of
bounds on the loop on ncat where these arrays are used below to
initialize fcondtopn_f and fsurfn_f.

This was fixed in CICE in 97370d7 (Update testing, in particular for
diagnostics and decompositions (CICE-Consortium#602), 2021-05-26) by using the last
(6th) value of the array for any higher category. This leads to
'fcondtopn_d' having different values in CICE6 for the last 4
categories compared to our in-house CICE4.

To make side-by-side debugging of CICE6 and CICE4 easier, minimize the
differences between the initialization in both models by adding a 7th
value to fcondtopn_d. This makes its fcondtopn_f have identical values
in both models. Adjust 'fsurfn_d' accordingly.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Jan 26, 2024
In 97370d7 (Update testing, in particular for diagnostics and
decompositions (CICE-Consortium#602), 2021-05-26), the 'debug_model' namelist flag was
added and calls to 'debug_ice' were added throughout the standalone
model's 'ice_step'. This is useful for debugging model runs (in the
scientific sense) since this subroutine prints the complete ice state,
so calling it several times per time step can help diagnose where things
go wrong.

Do the same in the 'nemo_concepts' driver.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Feb 2, 2024
In our in-house CICE4 version, we had special initialization code in
ice_flux::init_coupler_flux to initialize the 'fcondtopn_d' and
'fsurfn_d' arrays for ncat != 6. This was needed to avoid going out of
bounds on the loop on ncat where these arrays are used below to
initialize fcondtopn_f and fsurfn_f.

This was fixed in CICE in 97370d7 (Update testing, in particular for
diagnostics and decompositions (CICE-Consortium#602), 2021-05-26) by using the last
(6th) value of the array for any higher category. This leads to
'fcondtopn_d' having different values in CICE6 for the last 4
categories compared to our in-house CICE4.

To make side-by-side debugging of CICE6 and CICE4 easier, minimize the
differences between the initialization in both models by adding a 7th
value to fcondtopn_d. This makes its fcondtopn_f have identical values
in both models. Adjust 'fsurfn_d' accordingly.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Feb 2, 2024
In 97370d7 (Update testing, in particular for diagnostics and
decompositions (CICE-Consortium#602), 2021-05-26), the 'debug_model' namelist flag was
added and calls to 'debug_ice' were added throughout the standalone
model's 'ice_step'. This is useful for debugging model runs (in the
scientific sense) since this subroutine prints the complete ice state,
so calling it several times per time step can help diagnose where things
go wrong.

Do the same in the 'nemo_concepts' driver.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Feb 9, 2024
In our in-house CICE4 version, we had special initialization code in
ice_flux::init_coupler_flux to initialize the 'fcondtopn_d' and
'fsurfn_d' arrays for ncat != 6. This was needed to avoid going out of
bounds on the loop on ncat where these arrays are used below to
initialize fcondtopn_f and fsurfn_f.

This was fixed in CICE in 97370d7 (Update testing, in particular for
diagnostics and decompositions (CICE-Consortium#602), 2021-05-26) by using the last
(6th) value of the array for any higher category. This leads to
'fcondtopn_d' having different values in CICE6 for the last 4
categories compared to our in-house CICE4.

To make side-by-side debugging of CICE6 and CICE4 easier, minimize the
differences between the initialization in both models by adding a 7th
value to fcondtopn_d. This makes its fcondtopn_f have identical values
in both models. Adjust 'fsurfn_d' accordingly.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Feb 9, 2024
In 97370d7 (Update testing, in particular for diagnostics and
decompositions (CICE-Consortium#602), 2021-05-26), the 'debug_model' namelist flag was
added and calls to 'debug_ice' were added throughout the standalone
model's 'ice_step'. This is useful for debugging model runs (in the
scientific sense) since this subroutine prints the complete ice state,
so calling it several times per time step can help diagnose where things
go wrong.

Do the same in the 'nemo_concepts' driver.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Feb 12, 2024
In our in-house CICE4 version, we had special initialization code in
ice_flux::init_coupler_flux to initialize the 'fcondtopn_d' and
'fsurfn_d' arrays for ncat != 6. This was needed to avoid going out of
bounds on the loop on ncat where these arrays are used below to
initialize fcondtopn_f and fsurfn_f.

This was fixed in CICE in 97370d7 (Update testing, in particular for
diagnostics and decompositions (CICE-Consortium#602), 2021-05-26) by using the last
(6th) value of the array for any higher category. This leads to
'fcondtopn_d' having different values in CICE6 for the last 4
categories compared to our in-house CICE4.

To make side-by-side debugging of CICE6 and CICE4 easier, minimize the
differences between the initialization in both models by adding a 7th
value to fcondtopn_d. This makes its fcondtopn_f have identical values
in both models. Adjust 'fsurfn_d' accordingly.
phil-blain added a commit to phil-blain/CICE that referenced this pull request Feb 12, 2024
In 97370d7 (Update testing, in particular for diagnostics and
decompositions (CICE-Consortium#602), 2021-05-26), the 'debug_model' namelist flag was
added and calls to 'debug_ice' were added throughout the standalone
model's 'ice_step'. This is useful for debugging model runs (in the
scientific sense) since this subroutine prints the complete ice state,
so calling it several times per time step can help diagnose where things
go wrong.

Do the same in the 'nemo_concepts' driver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants