Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent arrays detected in write_incr.f90 when running in debug mode #256

Closed
RussTreadon-NOAA opened this issue Nov 17, 2021 · 9 comments
Assignees

Comments

@RussTreadon-NOAA
Copy link
Contributor

global_gsix built on WCOSS and WCOSS2 with various intel compiler debug flags turned on. write_incr.f90 was flagged with the following message:

forrtl: warning (406): fort: (33): Shape mismatch: The extent of dimension 2 of array OUT3D is 64 and the corresponding extent of array USM is 63

The line of code in question is

out3d(:,:,krev) = transpose(usm(j1:j2,:,k))

in write_incr.f90.

This issue is opened to document this warning and apply a code fix to remove the warning.

@RussTreadon-NOAA RussTreadon-NOAA self-assigned this Nov 17, 2021
@RussTreadon-NOAA
Copy link
Contributor Author

Array indices (j1,j2) are not correct when working with tasks which include the south or north pole. As noted by Cory, the south and north pole are extra latitude rows in the gsi.

Array out3d is dimensioned grd%lat away from the poles. This is correct. We want all latitudes in *sm arrays to be copied into out3d. For tasks which include the southernmost row (ie, south pole), out3d is dimensioned grd%lat-1. This is correct. We do not want the south pole row in *sm arrays copied into out3d. The same is true for tasks which include the northernmost row (ie, north pole). out3d is dimensioned grd%lat-1.

When a task includes the south pole, the (j1,j2) index range should be (2,grd%lat). j=1 is the south pole row. When a task includes the north pole, the (j1,j2) index range should be (1,grd%lat-1). j=grd%lat is the north pole row.

write_incr.f90 is currently coded as

    j1 = 1
    j2 = grd%lat1
    ncstart = (/ jstart(mype+1), istart(mype+1)-1, 1 /)
    nccount = (/ grd%lon1, grd%lat1, grd%nsig /)
    if (istart(mype+1) == 1) then
      ncstart = (/ jstart(mype+1), 1, 1 /)
      nccount = (/ grd%lon1, grd%lat1-1, grd%nsig /)
      j1 = 2
      j2 = grd%lat1-1
    else if (istart(mype+1)+grd%lat1 == grd%nlat+1) then
      nccount = (/ grd%lon1, grd%lat1-1, grd%nsig /)
      j2 = grd%lat1-2
    end if
    call mpi_barrier(mpi_comm_world,ierror)
    allocate(out3d(nccount(1),nccount(2),grd%nsig))

We transpose *sm(j1,j2,:,k) into out3d. This represents (j2-j1+1) latitudes.

  • For tasks away from either pole, j2-j1+1 = grd%lat1 - 1 + 1 = grd%lat. This is correct. There are nccount(2) = grd%lat latitudes in out3d.
  • For tasks which include the south pole, j2-j1+1 = grd%lat-1 - 2 + 1 = grd%lat -2. However, nccount(2) = grd%lat-1. (j1,j2) is not consistent with the allocated size of out3d.
  • For tasks which include the north pole, j2-j1+1 = grd%lat-2 - 1 + 1 = grd%lat-2. However, nccount(2) = grd%lat-1. (j1,j2) is not consistent with the allocated size of out3d.

It seems the above code should read

    j1 = 1
    j2 = grd%lat1
    ncstart = (/ jstart(mype+1), istart(mype+1)-1, 1 /)
    nccount = (/ grd%lon1, grd%lat1, grd%nsig /)
    if (istart(mype+1) == 1) then
      ncstart = (/ jstart(mype+1), 1, 1 /)
      nccount = (/ grd%lon1, grd%lat1-1, grd%nsig /)
      j1 = 2
    else if (istart(mype+1)+grd%lat1 == grd%nlat+1) then
      nccount = (/ grd%lon1, grd%lat1-1, grd%nsig /)
      j2 = grd%lat1-1
    end if
    call mpi_barrier(mpi_comm_world,ierror)
    allocate(out3d(nccount(1),nccount(2),grd%nsig))

When a task includes the south pole row we start from j1=2. When a task includes the north pole row we end at grd%lat-1. With these (j1, j2), j2-j1+1 is the correct value at both poles and between the poles.

This change was made to write_incr.f90. The run time debug warning goes away since (j1:j2) maps into out3d. Puzzling, though, is the finding that the increment files, sigin*nc, created by the original and modified write_incr.f90 are identical. This change does not alter the increment files ... at least for the cases which have been run.. How can this be? How does the transpose intrinsic work?

@RussTreadon-NOAA
Copy link
Contributor Author

Tests using the modified write_incr.f90
Add integer j3 and array out3d_test to write_incr.f90. Load j3 with the j2 values in the original code. Let j2 be what is shown in the modified code above. Load out3d_test using transpose of usm over the range j1;j3. Write out3d and out3d_test on each task for all i,j,k. Also check the difference. The difference is 0.0 on all tasks at all points for usm.

Cory suggested running the same case with two task counts to change the mapping of tasks to geographic subdomains. This was done using 400 and 800 tasks. Comparison of the resulting netcdf increment files show no differences. The two sets of increments are identical.

These two sets of tests were run at operational resolution using a stand-alone GSI script. Tests were run for 00, 06, 12, and 18Z from 20211117. The global_gsi.x was built from the NOAA-EMC:master at f442020 .

These tests indicate that the suggested modifications to write_incr.f90 do not alter analysis increments. One may logically ask why modify the code. Examination of the current write_incr.f90 indicates that the j1 value at the south pole and j2 value at the north pole are incorrect. The proposed changes fix this.

@aerorahul
Copy link
Contributor

aerorahul commented Jul 29, 2022

@RussTreadon-NOAA
The discussion on the issue suggests that changes were proposed.
Were they merged? Can this issue be closed? I don't see any PR tagging this issue.

@RussTreadon-NOAA
Copy link
Contributor Author

@aerorahul , I looked at src/write_incr.f90 in develop The section of code in question is unaltered from what it was in the past

GSI/src/gsi/write_incr.f90

Lines 361 to 375 in e23204e

j1 = 1
j2 = grd%lat1
ncstart = (/ jstart(mype+1), istart(mype+1)-1, 1 /)
nccount = (/ grd%lon1, grd%lat1, grd%nsig /)
if (istart(mype+1) == 1) then
ncstart = (/ jstart(mype+1), 1, 1 /)
nccount = (/ grd%lon1, grd%lat1-1, grd%nsig /)
j1 = 2
j2 = grd%lat1-1
else if (istart(mype+1)+grd%lat1 == grd%nlat+1) then
nccount = (/ grd%lon1, grd%lat1-1, grd%nsig /)
j2 = grd%lat1-2
end if
call mpi_barrier(mpi_comm_world,ierror)
allocate(out3d(nccount(1),nccount(2),grd%nsig))

@MichaelLueken-NOAA , are you aware of the changes mentioned in this issue ever entering develop?

@MichaelLueken
Copy link
Contributor

@RussTreadon-NOAA, looking at feature/gfsda.v16.1.5_wcoss2_port branch, I don't see any indication that these changes were applied. Also, I don't see any indication that these changes were applied to the develop branch via a PR. So, I don't think the changes described in this issue were ever applied to the authoritative develop branch.

@RussTreadon-NOAA
Copy link
Contributor Author

RussTreadon-NOAA commented Jul 29, 2022 via email

@aerorahul
Copy link
Contributor

Ok. Safe to assume that neither the bugfix nor the enhancement is included in GFSv16.3.0 candidate.

@RussTreadon-NOAA
Copy link
Contributor Author

Tests of release/gfsda.v16.3.0 built with -check all on WCOSS2 have replicated the problem reported in this issue. This is not surprising since we didn't resolve the issue last fall.

When gsi.x is built and run with -check all, a shape mismatch is detected in write_incr.f90. The same line as noted in this issue is identified as the line on which the mismatch occurs. I'll continue the investigation tomorrow.

@RussTreadon-NOAA
Copy link
Contributor Author

Corrected by PR #526

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants