Update testing, in particular for diagnostics and decompositions #602

apcraig · 2021-05-24T01:20:10Z

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium,
please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers

PR checklist

List of changes are below. Of particular note

a couple bugs were fixed for sections of the code that we were generally not using/testing but are now
- error in size of allocation in 1d rake
- out of bounds issue for fcontopn_d and fsurfn_d when ncat > 6
the dbug namelist was renamed to forcing_diag which could impact users
two new namelist were added, debug_model (logical) and debug_model_step (integer) and these trigger the debug_ice diagnostics after the timestep associated with debug_model_step.
there is a new file for testing the wghtfile decomposition option, CICE_data/grid/gx1/cice62_gx1_wghtmask.nc. This file is on cheyenne and I will move it to a few other testing platforms, but maybe it should be added to the gx1 input files. This file represents the number of months that ice existed at any grid point during the 2005-2009 gx1 production test run and is used to allocate blocks to procs at gx1 with the wghtfile test decomposition. The maximum value is 120 and these values are read and normalized during the decomposition step in CICE.

These changes address many issues in #437 as well as lines 7-17 and maybe 81-82 in https://docs.google.com/spreadsheets/d/12tIfm5OzvzH_LF_ie9WbOn_w5aY1hv4SrqTQQ8bqM-s. Code coverage should go up after this update and that will be evaluated separately.

Otherwise, a summary of all changes follows

Add alt06 test, ncat=7, kcatbound=3, nslyr=3
Add bigdiag test with lots of diagnostic output, debug_model=true, debug_blocks=true, etc
Add spiralcenter decomp test
Add gx1 weightfile decomp test
Add gbox180 with spacecurve decomp test to trigger Peano and Cinco decompositions
Update sectrobin decomp test to check land block elimination
Rename boxdyn to boxnodyn
Update calchk to check 100,000 years instead of 1000 years (which was set for quick debugging, but was not supposed to be checked in).
Modify boxslotcyl, turn off tr_lvl and tr_pond_lvl, changes answers
Clarify boxadv, boxnodyn, boxrestore, explicitly set shortwave=ccsm3, albedo_type=constant
Cleanup spacecurve implementation including debug flagging, recode print to write, module private, add printcurve for gridsize of 30, remove qsort and partition subroutines.
Deprecate IsLoadBalanced in ice_spacecurve
Remove CICE_RunMod.F90_debug and migrate high level debug checks to CICE_RunMod.F90 with new debug_model namelist. Also add debug_model_step to specify at what timestep to start writing debug_model output. debug_model writes data for the first point specified by lonpnt, latpnt.
Move debug_ice routine into ice_diagnostics.F90 and out of CICE.F90
Remove writeout_finished_file from CICE_Finalize in several drivers
Deprecate print_points_state via an ifdef
Rename l_stop in ice_transport_driver.F90 to ckflag. This looked like l_stop logic because of the naming but it isn't.
Fix issue of size of fcontopn_d and fsurfn_d when ncat greater than 6 in init_coupler_flux. Would go out of bounds, now initializes fcontopn and fsurfn to 6th value for any category greater than 6.
Rename dbug variable name in ice_forcing.F90 to forcing_diag and change namelist variable name.
Update ice_distributionGet output and implementation to support more general arguments
Add ice_distributionGet test in init_domain_distribution when debug_blocks is true.
Fix bug in create_distrb_rake for 1d rake of allocation size. Add a test that triggers the 1d rake.
Update debug flag in ice_distribution.F90 leverage debug_blocks.
Update complog part of baseline script to report missing data as MISS instead of FAIL
Update documentation.

- add alt06 test, ncat=7, kcatbound=3, nslyr=3 - add bigdiag test with lots of diagnostic output, debug_model=true, debug_blocks=true, etc - add spiralcenter decomp test - add gx1 weightfile decomp test - add gbox180 with spacecurve decomp test to trigger Peano and Cinco decompositions - rename boxdyn to boxnodyn - modify boxslotcyl, turn off tr_lvl and tr_pond_lvl, changes answers - clarify boxadv, boxnodyn, boxrestore, explicitly set shortwave=ccsm3, albedo_type=constant - cleanup spacecurve implementation including debug flagging, recode print to write, module private, add printcurve for gridsize of 30, remove qsort and partition subroutines. - remove CICE_RunMod.F90_debug and migrate high level debug checks to CICE_RunMod.F90 with new debug_model namelist. Also add debug_model_step to specify at what timestep to start writing debug_model output. debug_model writes data for the first point specified by lonpnt, latpnt. - move debug_ice routine into ice_diagnostics.F90 and out of CICE.F90 - remove writeout_finished_file from CICE_Finalize in several drivers - deprecate print_points_state via an ifdef - rename l_stop in ice_transport_driver.F90 to ckflag. This looked like l_stop logic because of the naming but it isn't. - fix issue of size of fcontopn_d and fsurfn_d when ncat greater than 6 in init_coupler_flux. Would go out of bounds, now initializes fcontopn and fsurfn to 6th value for any category greater than 6. - rename dbug variable name in ice_forcing.F90 to forcing_diag and change namelist variable name. - update ice_distributionGet output and implementation to support more general arguments - add ice_distributionGet test in init_domain_distribution when debug_blocks is true. - fix bug in create_distrb_rake for 1d rake. add a test that triggers the 1d rake. - update debug flag in ice_distribution.F90 leverage debug_blocks. - update documentation.

- Update calchk to check 100,000 years - Update sectrobin decomp test to check land block elimination

… instead of FAIL

eclare108213

Looks great, thanks @apcraig. I have some questions but nothing that keeps me from approving this as it is.

Does anyone (or any modeling group) use the space-filling curves distribution?

In the diff of ug_case_setting.rst, it looks like the first many (200?) lines of the file are italized but later ones are not. Is that a problem in the file itself, or just with this diff?

eclare108213 · 2021-05-24T22:10:51Z

configuration/scripts/ice_in

@@ -29,14 +29,16 @@
    diagfreq       = 24
    diag_type      = 'stdout'
    diag_file      = 'ice_diag.d'
+    debug_model    = .false.
+    debug_model_step = 999999999


Would it be better to set debug_model_step to 0 so that it always starts printing immediately, as the default? Assuming it's ignored unless debug_model=T.

It was 99999999 by default before, so I kept it that way. I'm happy to make the default 0, let me know if you prefer that.

I think 0 would be better, since you have a flag to turn it off.

Sounds good, I'll make this change in the next test coverage PR.

why not this PR?

I can do it in this PR too. I sort of trying to avoid doing some extra testing. Given this change is minor, nobody is going to use this feature at the moment, and this will be updated on the trunk in the next week (hopefully), I thought I'd just wait. I have three options

make the change on this PR and rerun all or a subset of tests

make the change on the PR and not test

make the change in the next PR and carry out a full test suite (which I'd do anyway)

I was going for the 3rd option. I can do the 1st. I try to avoid doing the 2nd when changes occur in the source code or scripts as it's bad practice and risky.

That's fine, whatever works best for you.

eclare108213 · 2021-05-24T22:14:01Z

configuration/scripts/options/set_nml.alt03

@@ -23,3 +23,4 @@ Ktens          = 0.
 e_ratio        = 2.
 seabed_stress    = .true.
 use_bathymetry = .true.
+l_mpond_fresh = .true.


If particular changes are addressing lines in the spreadsheet, could you add a note on the line that says which test script was changed, and how?

This recommendation came from #437. #437 (comment). But it doesn't change answers which is a little concerning and something that we need to look into more probably.

Here's what to look for: l_mpond_fresh might only be used for coupling, and the mixed layer model only calculates heat fluxes, not water/salt, so l_mpond_fresh might not be testable in our standalone configurations.

So far as I can tell, l_mpond_fresh shows up in CICE_RunMod.F90 in the standalone model as

if (l_mpond_fresh) then fpond(i,j,iblk) = fpond(i,j,iblk) * rhofresh/dt fresh(i,j,iblk) = fresh(i,j,iblk) - fpond(i,j,iblk) endif

I'm not sure what happens to fpond or fresh after that, but it doesn't seem to impact the solution. I guess I think that's OK. When we turn on options, it would be nice to see an impact, but if that's not possible, I still think it worthwhile to have a test that turns the flag on and at least tests the code in a technical sense.

configuration/scripts/tests/baseline.script

doc/source/user_guide/ug_implementation.rst

doc/source/user_guide/ug_troubleshooting.rst

eclare108213 · 2021-05-24T22:28:45Z

doc/source/user_guide/ug_troubleshooting.rst

@@ -138,11 +145,11 @@ conflicts in module dependencies.

 `print\_points` (**ice\_in**)
    If true, print numerous diagnostic quantities for two grid cells,
-    one near the north pole and one in the Weddell Sea. This utility
+    defined by `lonpnt` and `latpnt` in the namelist file.


I understand that this will be generalized so that the user alternatively can insert the indices directly, perhaps in a later PR.

That is my plan. That feature will be added soon.

dabail10

These are pretty extensive changes, but look to be completely diagnostic. I think these are nice features and am fine with the naming and structure.

phil-blain · 2021-06-14T16:25:55Z

Hi @apcraig,

there is a new file for testing the wghtfile decomposition option, CICE_data/grid/gx1/cice62_gx1_wghtmask.nc. This file is on cheyenne and I will move it to a few other testing platforms, but maybe it should be added to the gx1 input files. This file represents the number of months that ice existed at any grid point during the 2005-2009 gx1 production test run and is used to allocate blocks to procs at gx1 with the wghtfile test decomposition. The maximum value is 120 and these values are read and normalized during the decomposition step in CICE.

This should indeed be added to the gx1 input files, since it is used in the decomp_suite . I've just ran that suite and the daley_intel_restart_gx1_64x1x16x16x10_dwghtfile test failed because it's missing that file.

phil-blain · 2021-10-27T20:58:36Z

cicecore/shared/ice_distribution.F90

+            if (debug_blocks .and. my_task == master_task) then
+               write(nu_diag,'(2a,3i8)') &


Hi @apcraig, I did not have time to review this PR before it was merged.

Here I notice that you prevented the block distribution from being outputed for all procs. Was this intended ? If we are debugging decompositions, I would expect we want output from all procs....

I was using debug_blocks just now to debug a segfault and was very confused as to why all blocks were listed on proc 1, and I dug up this PR by looking at the history of create_local_block_ids...

In our in-house CICE4 version, we had special initialization code in ice_flux::init_coupler_flux to initialize the 'fcondtopn_d' and 'fsurfn_d' arrays for ncat != 6. This was needed to avoid going out of bounds on the loop on ncat where these arrays are used below to initialize fcondtopn_f and fsurfn_f. This was fixed in CICE in 97370d7 (Update testing, in particular for diagnostics and decompositions (CICE-Consortium#602), 2021-05-26) by using the last (6th) value of the array for any higher category. This leads to 'fcondtopn_d' having different values in CICE6 for the last 4 categories compared to our in-house CICE4. To make side-by-side debugging of CICE6 and CICE4 easier, minimize the differences between the initialization in both models by adding a 7th value to fcondtopn_d. This makes its fcondtopn_f have identical values in both models. Adjust 'fsurfn_d' accordingly.

In 97370d7 (Update testing, in particular for diagnostics and decompositions (CICE-Consortium#602), 2021-05-26), the 'debug_model' namelist flag was added and calls to 'debug_ice' were added throughout the standalone model's 'ice_step'. This is useful for debugging model runs (in the scientific sense) since this subroutine prints the complete ice state, so calling it several times per time step can help diagnose where things go wrong. Do the same in the 'nemo_concepts' driver.