Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated oasis get calls #858

Merged

Conversation

ukmo-juan-castillo
Copy link
Collaborator

Pull Request Summary

Bug fix for a problem when happens in certain circumstances (using driving data coming from oasis or from file in certain combinations), when oasis_get is called twice for the same field and time, causing OASIS to crash

Description

It was discovered that, when reading water levels from file and driving the wave model from winds received via coupling, and coupling at T+0, the oasis_get call for winds at time start zero happens twice, which causes OASIS to crash because as expected the model that sends the winds only sends the fields once at each coupling step.

This is caused by a series of things: water levels are read before winds, and they are not time interpolated. This causes that the time step is not updated the first time the input fields are updated, so the input fields are updated twice the first time step.

The fix here proposed makes sure that oasis_get is called only once if the input fields are updated twice for the first time step.

To reproduce this error, I modified one ww3_tp2.14 test. The reviewer is encouraged to run the develop branch with the modified regtest to demonstrate that a bug is actually present, and that the changes in this branch fixes it. Given that this particular regtest is changed, it is expected to cause a change in answers in just that test.

Labels that should be added: bug

Issue(s) addressed

Commit Message

Fix duplicated calls to oasis_get happening in some mixed forced/coupled configurations

Check list

Testing

  • How were these changes tested?
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)?
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

@ukmo-ccbunney
Copy link
Collaborator

ukmo-ccbunney commented Dec 13, 2022

Hi @ukmo-juan-castillo - did you run the ww3_tp2.14 regtest with your fix prior to modifying the test? I just wanted to check that it was producing the same answers prior to your update.

I am going to run the full regtest suite today.

@JessicaMeixner-NOAA
Copy link
Collaborator

@ukmo-juan-castillo can you please merge in the noaa-emc/develop branch to your branch for this PR so that we can get the CI fixes here as well? Thanks!

@ukmo-ccbunney
Copy link
Collaborator

@ukmo-juan-castillo can you please merge in the noaa-emc/develop branch to your branch for this PR so that we can get the CI fixes here as well? Thanks!

I've done this now @JessicaMeixner-NOAA .

@ukmo-ccbunney
Copy link
Collaborator

@ukmo-juan-castillo - I am running the regression tests for this change and the model is hanging on the new ww3_tp2.14/OASACM6 regtest.

If I revert the change to that regtest (setting AIR_DENSITY="F") it still hangs.

@ukmo-ccbunney
Copy link
Collaborator

@ukmo-juan-castillo I've just pushed a small correction to the ww3_from_ftp.sh script - there was an incomplete destination path for the new density.nc file copied for input_oasacm6.

regtests/bin/matrix.base Outdated Show resolved Hide resolved
@ukmo-juan-castillo ukmo-juan-castillo marked this pull request as ready for review January 19, 2023 15:33
Copy link
Collaborator

@ukmo-ccbunney ukmo-ccbunney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukmo-juan-castillo
I've rerun the ww3_tp2.14 regression tests.
There are some differences in the nout.00000 files in all tests (see specific remarks below).

The OASACM6 test obviously has more difference due to the addition of the density field.
However, there are a couple of issues.

  • Firstly, the density.nc field used as input to ww3_prnc is for a different region (Adriatic Sea) so the input density.ww3 is all zero.
  • Secondly, in the output file ww3.201109.nc, the wind speeds are all zero (maybe a result of the density being zero?), so the resulting wave field is also zero.

I think we will need to add a new density.nc file for this regtest. It could be a small and very coarse resolution file with homogeneous values just to provide an input.

regtests/ww3_tp2.14/input_oasicm/namcouple Show resolved Hide resolved
@ukmo-ccbunney
Copy link
Collaborator

I have run the full regression test matrix and get the following expected differences to mww3_test_02

ww3_tp2.14/./work_OASACM                     (2 files differ)
ww3_tp2.14/./work_OASACM6                     (10 files differ)
ww3_tp2.14/./work_OASACM2                     (2 files differ)
ww3_tp2.14/./work_OASACM5                     (2 files differ)
ww3_tp2.14/./work_OASACM4                     (2 files differ)
ww3_tp2.14/./work_OASOCM                     (2 files differ)
ww3_tp2.14/./work_OASICM                     (2 files differ)

but there are also a few unexpected differences that remain even after rerunning:

  • mww3_test_02/./work_PR3_UQ_MPI_c_c - 2 log files differ (they are oddly empty in results from this branch)
  • mww3_test_07/./work_PR3_UQ - mod_def file is different

I am a bit suspicious that this is some oddity to our system, so I wait to see what the results from NOAA's regteests are . Certainly it makes no sense that the mod_def file should be different in mww3_test_07 as Juan has now changed anything that could have affected ww3_grid.

…acm6

Updated version number for ww3_from_ftp tar file
@MatthewMasarik-NOAA
Copy link
Collaborator

I am a bit suspicious that this is some oddity to our system, so I wait to see what the results from NOAA's regteests are . Certainly it makes no sense that the mod_def file should be different in mww3_test_07 as Juan has now changed anything that could have affected ww3_grid.

@ukmo-ccbunney I've submitted the regtests for this. We will report back on the differences you reported.

@JessicaMeixner-NOAA
Copy link
Collaborator

@ukmo-ccbunney can you update the version number to 7.14.1 instead of 7.12.7? The density.nc file should be there now in that file. Also, would you mind merging with the latest noaa-emc/develop? I'm starting regtests now (Matt is on leave).

@ukmo-ccbunney
Copy link
Collaborator

@ukmo-ccbunney can you update the version number to 7.14.1 instead of 7.12.7? The density.nc file should be there now in that file. Also, would you mind merging with the latest noaa-emc/develop? I'm starting regtests now (Matt is on leave).

Done! (online via github.dev, which was very exciting!)

@JessicaMeixner-NOAA
Copy link
Collaborator

@ukmo-ccbunney I ran the regtests yesterday and I didn't get either of the issues you mentioned. I do get mod_def diffs for unstructured grids on a regular basis (although not for tp2.7 oddly enough), so I think that's just the known issue that we really need to get fixed. I did just go and check the mww3_test_02/./work_PR3_UQ_MPI_c_c/ww3_multi.out file and it looks like we have some silent failures in that test that also exist in the develop branch. I'm getting: " integration stalled: num_subseg exceeded limit"
So perhaps something to look into separately but not something that should hold this PR up.

I'm going to re-run the regtests today just to do a sanity check on the tar-file, but otherwise think this will be ready to merge in this afternoon unless we need to wait for @mickaelaccensi to review and approve as well.

Copy link
Collaborator

@JessicaMeixner-NOAA JessicaMeixner-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR1_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2                     (16 files differ)
mww3_test_03/./work_PR1_MPI_d2                     (12 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (9 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (16 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (16 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (14 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (16 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (7 files differ)
ww3_tp2.14/./work_OASACM6                     (10 files differ)
ww3_tp2.14/./work_OASACM5                     (2 files differ)
ww3_tp2.14/./work_OASACM4                     (2 files differ)
ww3_tp2.14/./work_OASACM                     (2 files differ)
ww3_tp2.14/./work_OASICM                     (2 files differ)
ww3_tp2.14/./work_OASACM2                     (2 files differ)
ww3_tp2.14/./work_OASOCM                     (2 files differ)
ww3_tp2.16/./work_MPI_OMPH                     (4 files differ)
ww3_tp2.17/./work_ma                     (1 files differ)
ww3_tp2.17/./work_ma1                     (1 files differ)
ww3_ufs1.3/./work_a                     (3 files differ)

matrixCompFull.txt
matrixCompSummary.txt
matrixDiff.txt

@JessicaMeixner-NOAA
Copy link
Collaborator

@ukmo-ccbunney apologies for the slowness on the regtests on our end. The second time I ran this branch, I had a few " 0 diffs" but looking into a case or two, it doesn't seem to be anything concerning. @MatthewMasarik-NOAA or I will follow up with the mww3_test_02/./work_PR3_UQ_MPI_c_c test and make an issue to document this regression tests with the silent failure on our end and actual error on yours.

@mickaelaccensi --- if you'd like to continue to review this PR, please do and if you find any issues or have any concerns we'll address them when you raise them.

@JessicaMeixner-NOAA JessicaMeixner-NOAA merged commit 5ebed91 into NOAA-EMC:develop Jan 31, 2023
@ukmo-ccbunney
Copy link
Collaborator

@ukmo-ccbunney apologies for the slowness on the regtests on our end. The second time I ran this branch, I had a few " 0 diffs" but looking into a case or two, it doesn't seem to be anything concerning. @MatthewMasarik-NOAA or I will follow up with the mww3_test_02/./work_PR3_UQ_MPI_c_c test and make an issue to document this regression tests with the silent failure on our end and actual error on yours.

@mickaelaccensi --- if you'd like to continue to review this PR, please do and if you find any issues or have any concerns we'll address them when you raise them.

Great - Thanks Jessica.
Regarding the "0 diffs", I've had this before a few times
I think it is down to the existence or non-existence of ww3_prep.out files from the multigrid tests (if I remember rightly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Water levels time update issue
4 participants