Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tracer initialization #235

Merged
merged 18 commits into from
Nov 16, 2018

Conversation

eclare108213
Copy link
Contributor

Cleaning up tracer initialization and namelists

  • Developer(s): E. Hunke

  • Please suggest code Pull Request reviewers in the column at right.

  • Are the code changes bit for bit, different at roundoff level, or more substantial? BFB

  • Please include the link to test results or paste the summary block from the bottom of the testing output below.

161 of 165 tests PASSED
0 of 165 tests FAILED
3 of 165 tests PENDING

Note: "Documentation" includes information on the wiki and .rst files in doc/source/,
which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/.

  • Other Relevant Details:

This code is NOT READY to be merged. It's a mess (but it's a bit-for bit mess). It'll be a few days before I get back to this, so if anyone wants to jump in, go for it... I hope it's now arranged in the proper order to dynamically allocate the tracer arrays using ntrcr.

Here are the steps I suggest be done next, not necessarily in this order -- these are somewhat negotiable.

  1. combine input_data2 and input_zbgc0 into a routine called “count_tracers”

  2. keep n_trbgcz, n_aero, n_zaero, n_algae, n_doc etc in namelist, and remove the following n_[trcr] and set them in the code based on namelist logical flags:

     n_trbri = 0
     n_trzs = 0
     n_trbgcz = 0
     if (tr_brine) then n_trbri = 1
     if (tr_bgcs) then n_trzs = 1
     if (tr_bgcz) then n_trbgcz = 1
     if (.not. tr_bgc_C) then n_doc = 0
     etc
    

I did this for the age, FY, lvl and pnd tracers.

  1. in the combined input_data2 and input_zbgc0:
    a. instead of this:

    nt_[trcr] = max_ntrcr
    if (tr_[trcr]_ then
        ntrcr = ntrcr+1
        nt_[trcr] = ntrcr
    endif
    

do this:

  nt_[trcr] = 0  ! collect these for all [trcr]
  if (tr_[trcr]_ then
     ntrcr = ntrcr+1
     nt_[trcr] = ntrcr
 endif

 ! repeat above code blocks for all [trcr], adding up ntrcr
 ! then add 1 to ntrcr, to hold the unused tracer indices
 ntrcr = ntrcr + 1
 max_ntrcr = ntrcr 
 if (.not. tr_[trcr]) nt_[trcr] = max_ntrcr  ! collect these for all [trcr]

b. remove the old calculation of max_ntrcr (or use it as a sanity check)

  1. use ntrcr throughout the code instead of max_ntrcr
  2. allocate bgc arrays using n_[trcr] instead of max_[trcr]
  3. change namelist flags for consistency (this will affect icepack and not be backward compatible, so maybe we shouldn't):
    change skl_bgc to tr_bgcs
    change z_tracers to tr_bgcz
  4. fix hard-wired array sizes passed into subroutines (use (:) instead)
  5. clean up routines input_* and init_zbgc (remove extraneous tracer_queries and tracer_inits; write out variables when they are first read/initialized; remove unused variables)
  6. clean up CICE_InitMod.F90 and replicate changes in other drivers

FYI: for max numbers for bgc tracers, see options/set_env.bgc in icepack

@eclare108213
Copy link
Contributor Author

Some clarifying details:
I divided the original input_data and init_zbgc subroutines into pieces and rearranged them in order to get the calculation of ntrcr all in one place (currently in input_data2 and input_zbgc0, which need to be combined). Here is the order now in drivers/cice/CICE_InitMod.F90:

  call input_data           ! namelist variables 
  call input_zbgc           ! vertical biogeochemistry namelist
  call input_data2          ! count tracers
  call input_zbgc0          ! count additional bgc tracers
    ... initialize other things, including dynamical allocations of almost everything
  call init_zbgc            ! vertical biogeochemistry initialization 

I also started removing redundant info from the namelist, for the 'easy' tracers (age, etc). For those, I kept the on/off flags (tr_iage, etc) and set the number of tracers associated with that flag (n_trage etc) in the code instead of namelist. Note: it would be better to be consistent with the names of these numbers and use n_iage, n_FY, n_lvl, n_pnd instead of n_trage etc, if possible.

@apcraig
Copy link
Contributor

apcraig commented Nov 12, 2018

I have completed step 1 and think I understand the rest, but am having some trouble understanding step 2. Can you clarify. When I look at n_trage, it is a local variable and is used only in a limited way,

      n_trage = 0                 ! age
      if (tr_iage) n_trage = 1
      max_ntrcr =   ...
                + n_trage     & ! age

Looking at n_trbri, if this is a namelist, why are we keeping it a namelist and overwriting the value based on tr_brine (another namelist)? What is happening here? And should the right hand side of the equals sign always be 1? I see it's 2 for tr_lvl. Do we actually want this,

if (.not.tr_brine) n_trbri=0

instead of

if (tr_brine) then n_trbri = 1

thus preserving the namelist value?

Also, just to be clear, the n_ are the number of tracers for that tracer group? nt_ is the start index of that tracer group in the tracer array? tr_ is a logical indicating whether that tracer group is active. Correct? It seems like n_ could just be hardwired for most tracers and never changed. tr_ would indicate whether it's active and nt_ would index it (if it's active) in the tracer array. Why do we need to zero out the n_ variables if the tracer is inactive. It that used as a flag later in the code instead of the tr_ flag?

Also, the variable names and their relationship are not easy to figure out. Just look at this,

if (tr_brine) then n_trbri = 1
if (tr_bgcs) then n_trzs = 1
if (tr_bgcz) then n_trbgcz = 1
if (.not. tr_bgc_C) then n_doc = 0

why aren't we using n_trbgcs instead of n_trzs? It would be nice to have greater consistency between these variable names, could we require a direct naming relationship between n_tr[var] and tr_[var] or something similar. I know this might not be possible in all cases, but at least for as many as we can. I find it a bit confusing.

@eclare108213
Copy link
Contributor Author

Yes, the names are confusing and this is why I suggested simplifying them. It helps to remember where they came from -- the n_trage etc were originally the cpp defs, and they were there so they could be turned off to reduce the memory footprint of the big tracer array. Many of them were just 1, i.e. on/off, and we also had a namelist flag that could then turn off the process or capability at runtime. Now with dynamic allocation, the n_trage etc are redundant in namelist, but they are still useful in the code to count up the total number of tracers for the allocation. So I started using the logical flags tr_[trcr] as the primary input, setting the tracer numbers n_[trcr] in the code (except for BGC, see below). E.g. tr_brine should be in namelist, n_trbri should be moved out of namelist and preferably renamed to n_brine. It would be assumed 0 unless tr_brine=T, in which case it's 1. Some of the tracer flags turn on/off more than one tracer (e.g. lvl and some bgc).

Also, just to be clear, the n_ are the number of tracers for that tracer group? nt_ is the start index of that tracer group in the tracer array? tr_ is a logical indicating whether that tracer group is active. Correct? It seems like n_ could just be hardwired for most tracers and never changed. tr_ would indicate whether it's active and nt_ would index it (if it's active) in the tracer array. Why do we need to zero out the n_ variables if the tracer is inactive. It that used as a flag later in the code instead of the tr_ flag?

Yes to all of that. The n_ variables are zeroed out because of the (old) bit of code that adds them up all at once, which we can delete (item 3b above). I think n_aero and other bgc numbers are used elsewhere in the code; my preference is to initialize all of the n_ variables to 0 and then set them to whatever they need to be based on the logical flags. The logical flags should wrap ALL related tracer calculations -- the logicals are the primary, defining inputs.

The wrench in all of this is that we have the option of using an arbitrary (up to some max) number of bgc tracers. I decided not to hardwire the bgc tracers to a particular number, mainly because of the aerosols in CESM and I'm not sure if @njeffery varies the number that she uses, depending on the scenario. For those, we'll still have the logical for consistency with the other tracers, and it will override the numbers being nonzero. However if one of the bgc numbers is zero, then it makes sense to set the logical to false, to prevent loop weirdness.

I'd like to rename the flags and numbers to be more consistent -- need to make a list of them......

@apcraig
Copy link
Contributor

apcraig commented Nov 12, 2018

Thanks, that's helpful.

I'm actually writing up a list myself right now, just to better understand what's there. Having the count_tracers subroutine with everything together helps a lot. I'l share the list, should have it in a few minutes. May have more questions then.

@njeffery
Copy link
Contributor

njeffery commented Nov 13, 2018 via email

@apcraig
Copy link
Contributor

apcraig commented Nov 14, 2018

I just pushed a bunch of changes. These should address 1, 2, 3, and 4 on the list. Some of 8 is also done, abort checks on the bgc input have been added, and 9 variables were removed from CICE, 5 are namelist (max_ntrcr, max_nsw, n_bgc, nltrcr, n_trbgcz, n_trzs, n_trzaero, n_trbri, n_trbgcs). I am working on 5 and 7 now, I think they are related.

@apcraig
Copy link
Contributor

apcraig commented Nov 14, 2018

I have verified bit-for-bit on conrad for all 4 compilers. The only failure was the alt01 which requires an update to the namelist setting (not yet done) to be compatible with the updates to the input checks.

This was referenced Nov 14, 2018
@apcraig
Copy link
Contributor

apcraig commented Nov 15, 2018

I just pushed a few more changes, including updates to alt01. @eclare108213, have a look when you have a chance and lets figure out what still needs to be done.

@eclare108213
Copy link
Contributor Author

eclare108213 commented Nov 15, 2018

This looks good, @apcraig, thanks! I'll pull the changes and test them here at LANL. These are some notes just from looking through the code changes, which either of us might address:

Once we start changing Icepack, let's fix this to properly use the logical flag (unless nltrcr is a multiplier), from ice_step_mod.F90:
! tcraig, nltrcr used to be the number of zbgc tracers, but it's used as a zbgc flag in icepack
if (z_tracers) then
nltrcr = 1
else
nltrcr = 0
endif

Do we have a test with all pond parameterizations off and using the ccsm3 shortwave? Or do we need to keep this in alt01 and put the formdrag test in a different option set?

For abort_flag=16 in ice_init.F90, the error message should be

ERROR: formdrag=T and tr_pond=F

I'm a little concerned that we might get out-of-bounds errors when aerosols are off. For now, we are specifying a single array index for tracers that are not in use, but the aero tracers needs many indices because they are always treated in loops like

n=nt_aero, nt_aero+4*n_aero.

That's why we initialized nt_aero as

max_aero-4*n_aero

in the original "bad kludge". If we have a testing option that has aerosols turned off and debugging on, it should catch the problem, if it exists.

Remove the format statements from the bottom of input_data?

@apcraig
Copy link
Contributor

apcraig commented Nov 15, 2018

I will run some tests, especially debug with n_aero set to 0. I'll also try to address some of the comments with code changes. I will test on conrad.

@apcraig
Copy link
Contributor

apcraig commented Nov 15, 2018

running debug tests on conrad for all alt and box configs. getting some aborts. will try to fix.

format statements are needed in input_data from what I can see.

I'll look at the alt tests and try to move formdrag to a different alt test.

@eclare108213
Copy link
Contributor Author

I just read through ice_init.F90 and ice_init_column.F90, where the biggest changes are, and fixed a few things, nothing answer-changing. My base_suite built and the runs are still sitting in the queue.

From the original list above, it looks like 6 is not started, 5 and 7 are partially complete, and everything else is done along with some additional enhancements that weren't in the list. 5, 6, 7 could become a 'cleanup' task in a new PR. 7 is captured in issue #238.

@apcraig
Copy link
Contributor

apcraig commented Nov 16, 2018

I just pushed a new update. I added several new debug tests and updated the alt01 (now with cesm shortwave and no ponds) and alt04 (now with lvl ponds and formdrag) tests. Things are working pretty well. A couple of the new debug tests with the box configuration are failing, but all the alt tests with debug on are passing. The nt_aero and n_aero settings do not seem to be causing any problems in debug mode. Remember that when tr_aero is off, nt_aero is pointing at the unused tracer and n_aero=0. So it's likely the array sections in arguments are valid, and as long as they're not being used, should be OK. That seems to be true based on testing. See current results here, hash 770bd8f.

https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks

I am not particularly worried about the box failure. We should try to fix it, but I would not hold up the release for it either.

I also agree that 6 on the list was never done, 5 and 7 were started but need quite a bit more thought and refactoring.

@eclare108213 before or after the merge, could you review the model output for settings/namelist. I think we should make sure we are writing out the right stuff and that it's happening in the correct place during the init sequence. I also think if there are any other "abort checks" that could be added, that we should try to do that.

I think it would be good to merge this before the weekend if we can, and we feel it's ready. We can continue to additional changes on a new branch as needed. I think this refactor has been very good to do!

@eclare108213
Copy link
Contributor Author

Question for @njeffery:

We are checking the numbers of tracers that are allocated - are these checks correct, or e.g. should we check that n_fep+n_fed <= icepack_max_fe and similarly for doc, dic?

  if (n_algae > icepack_max_algae) then
     if (my_task == master_task) then
        write(nu_diag,*) subname//'ERROR: number of algal types exceeds icepack_max_algae'
     endif
     abort_flag = 112
  endif
  if (n_doc > icepack_max_doc) then
     if (my_task == master_task) then
        write(nu_diag,*) subname//'ERROR: number of doc types exceeds icepack_max_doc'
     endif
     abort_flag = 113
  endif
  if (n_dic > icepack_max_doc) then
     if (my_task == master_task) then
        write(nu_diag,*) subname//'ERROR: number of dic types exceeds icepack_max_dic'
     endif
     abort_flag = 114
  endif
  if (n_don > icepack_max_don) then
     if (my_task == master_task) then
        write(nu_diag,*) subname//'ERROR: number of don types exceeds icepack_max_don'
     endif
     abort_flag = 115
  endif
  if (n_fed  > icepack_max_fe ) then
     if (my_task == master_task) then
        write(nu_diag,*) subname//'ERROR: number of dissolved fe types exceeds icepack_max_fe '
     endif
     abort_flag = 116
  endif
  if (n_fep  > icepack_max_fe ) then
     if (my_task == master_task) then
        write(nu_diag,*) subname//'ERROR: number of particulate fe types exceeds icepack_max_fe '
     endif
     abort_flag = 117
  endif

@eclare108213
Copy link
Contributor Author

@apcraig I suggest to go ahead and merge, if you are happy with this so far, to get a new baseline since alt01 changed (right? That’s the only fail in my baseline tests yesterday). Then we can continue cleaning up in a new PR.

@apcraig apcraig merged commit 8b0ae03 into CICE-Consortium:master Nov 16, 2018
phil-blain added a commit to phil-blain/CICE that referenced this pull request Nov 6, 2020
This was forgotten back in 8b0ae03 (Refactor tracer initialization (CICE-Consortium#235), 2018-11-16)
phil-blain added a commit to phil-blain/CICE that referenced this pull request Dec 17, 2020
This was forgotten back in 8b0ae03 (Refactor tracer initialization (CICE-Consortium#235), 2018-11-16)
phil-blain added a commit to phil-blain/CICE that referenced this pull request Apr 28, 2021
This was forgotten back in 8b0ae03 (Refactor tracer initialization (CICE-Consortium#235), 2018-11-16)
phil-blain added a commit to phil-blain/CICE that referenced this pull request Jun 2, 2021
This was forgotten back in 8b0ae03 (Refactor tracer initialization (CICE-Consortium#235), 2018-11-16)
phil-blain added a commit to phil-blain/CICE that referenced this pull request Jun 2, 2021
This was forgotten back in 8b0ae03 (Refactor tracer initialization (CICE-Consortium#235), 2018-11-16)
apcraig pushed a commit that referenced this pull request Jun 10, 2021
* drivers/hadgem3: add missing 'subname' and use existing 'subname's

* drivers/hadgem3/CICE_InitMod: update 'init_lvl' call

Add the required 'iblk' argument.

* drivers/hadgem3/CICE_RunMod: remove uneeded 'dt' arguments

The subroutines 'prep_radiation', 'zsal_diags', 'bgc_diags' and 'hbrine_diags'
do not take a 'dt' argument anymore, so remove it.

* drivers/hadgem3/CICE_RunMod: get 'Lsub' from Icepack

* drivers/hadgem3/CICE_RunMod: remove 'da_state_update' subroutine

This subroutine is inside an 'ICE_DA' CPP, which is not used in
any configuration. Remove it.

* drivers/hadgem3/CICE_RunMod: remove stray '+'

This '+' sign was copy-pasted there in error in 29b99b6 (CICE: Floe size
distribution  (#382), 2019-12-07). Remove it.

* drivers/hadgem3: remove obsolete 'check_finished_file' subroutine

Remove the call to 'check_finished_file' as well as the definition
of the subroutine, as the 'hadgem3' driver is not used on machine 'bering'
and it's unclear if machine 'bering' still exists.

* drivers/hadgem3: fix cice_init so it calls 'count_tracers'

This was forgotten back in 8b0ae03 (Refactor tracer initialization (#235), 2018-11-16)

* drivers/hadgem3/CICE_RunMod: add call to 'save_init'

The hadgem3 driver was not updated when 'save_init' was added in 83686a3
(Implement box model test from 2001 JCP paper (#151), 2018-10-22). As
this subroutine is necessary to ensure proper initialization of the
model, add it now.

* drivers/hadgem3/CICE_RunMod: tweak loop indices in 'coupling_prep'

Other drivers use 'ilo,ihi' and 'jlo,jhi' here. Do the same.

* drivers/hagdem3: update driver to new time manager

* drivers/hadgem3: pass 'mpi_comm_opa' explicitely to init_communicate

In 066070e (Fix minor issues in documentation, key_ CPPs, bfbcomp return
codes (#532), 2020-11-23), the 'ice_communicate' module was updated to
remove CPP macros relating to the OASIS coupler (key_oasis*) and to the
NEMO ocean model (key_iomput). These CPPs were used to make the correct
MPI communicator accessible to the 'init_communicate' subroutine.
However, that subroutine already accepts an optional MPI communicator as
argument and it was deemed cleaner to require the driver layer to
explicitely pass the communicator instead of making it accessible
through 'use' statements.

Update the 'hadgem3' driver, used for coupling with NEMO, to explicitely
pass the NEMO MPI communicator 'mpi_comm_opa' to 'init_communicate'.

* drivers: add 'nemo_concepts' driver

Historically the 'hadgem3' driver has been used when compiling a single
NEMO-CICE executable at ECCC.

Going forward, all driver-level adjustements will be done in our own
driver, 'nemo_concepts', 'CONCEPTS' being the name of the
multi-departmental collaboration around using the NEMO ocean model.

Copy CICE_InitMod, CICE_RunMod and CICE_FinalMod from the 'hadgem3'
directory to a new 'nemo_concepts' directory under 'drivers/direct'.

The following commits will clean up this new driver and port over some
in-house adjustments.

* drivers/nemo_concepts: remove unused 'writeout_finished_file' subroutine

This subroutine was only called on machine 'bering', which is not an
ECCC machine and probably does not exist anymore anyway. Remove it.

* drivers/nemo_concepts: call 'scale_fluxes' with 'aice_init'

Since 'merge_fluxes' is called with aice_init, it is more consistent to
also call 'scale_fluxes', in 'coupling_prep' with 'aice_init'
instead of 'aice'.

Copy this in-house change to the new 'nemo_concepts' driver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants