Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/test prognostic run steppers #1033

Merged
merged 23 commits into from
Feb 25, 2021
Merged

Refactor/test prognostic run steppers #1033

merged 23 commits into from
Feb 25, 2021

Conversation

nbren12
Copy link
Contributor

@nbren12 nbren12 commented Feb 23, 2021

The steppers repeated code and were difficult to test. This PR refactors the nudging and machine learning steppers to enable more reuse, pushing the common/messy bits up to the TimeLoop class. This allows a better separation of concerns between the "Steppers" (see the docstrings of Stepper and TimeLoop for more information).

This more "pure" interface allowed regression testing more of the ML code. The nudging stepper is harder to test since it requires MPI and more data.

I tested each refactor using checksums, and fixed the non-determinism in the scikit-learn based tests.

Also added:

  • debugging conveniences
  • made DerivedFV3State an actual MutableMapping.

@@ -1 +1 @@
f66f3591c848109b47439bdf6a616eeb
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice that the keras checksum is not updated. this suggests the refactors didn't break modify the state updates by the ML code.

@@ -0,0 +1 @@
566da3dda9be0fd64285cec8f3b93bed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice that this checksum was introduce in 3a78a4b before any of the large refactors.

Copy link
Contributor

@oliverwm1 oliverwm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Definitely some more work to do in terms of unifying diagnostic calcs and perhaps generalizing how to know which tendencies should get applied to physics/dycore states. But this is a big step in the right direction I think.

Also we'll have to think about the right way to expand the number of "substeps" if we want to be able to apply updates between radiation/other physics computations. I could see the substeps getting a bit out of control.

)
tracer_names = set(v for v in fv3gfs.wrapper.get_tracer_metadata())
# see __getitem__
local_names = {"latent_heat_flux", "total_water"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay for now, but this does seem pretty bespoke. Should we have some kind of "diagnostics" table that these are a part of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper can list all the available diagnostics withfv3gfs.wrapper._get_diagnostic_info. We should probably make this public api.

for key in self.keys():
try:
data = self[key]
except: # noqa: E722
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not except KeyError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm deleting this method since it is not hard to copy paste where appropriate.

workflows/prognostic_c48_run/runtime/loop.py Outdated Show resolved Hide resolved
Each time step of the model evolutions proceeds like this::

step_dynamics,
compute_physics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably we will be splitting this up further if we want to apply radiation updates to the land surface model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. This docstring is repetitive but it did help me think about the current implementation.

Comment on lines +371 to +374
try:
del diagnostics[TOTAL_PRECIP]
except KeyError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
del diagnostics[TOTAL_PRECIP]
except KeyError:
pass
diagnostics.pop(TOTAL_PRECIP, None)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try is a little longer but more explicit in my opinion.

@nbren12
Copy link
Contributor Author

nbren12 commented Feb 25, 2021

Definitely some more work to do in terms of unifying diagnostic calcs

For sure. I think "net_heating" and "total_water_path" are each computed in at least a couple places. We'll have to think more carefully about what "diagnostics" and ML scheme should save.

and perhaps generalizing how to know which tendencies should get applied to physics/dycore states.
Also we'll have to think about the right way to expand the number of "substeps" if we want to be able to apply updates between radiation/other physics computations. I could see the substeps getting a bit out of control.

I had some ideas about this. Generally speaking, these state updates are a DAG, where the edges are input/output variables and the nodes are individual schemes. If each scheme described it's inputs/outputs and gave a rough "priority" then the time looper could determine the precise order of computations.

Copy link
Contributor

@frodre frodre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @nbren12. Everything looks good and overall, a nice simplification. I also learned a bit about testing/regression testing, which is always a bonus.

Comment on lines +253 to +256
# TODO fix this bug in a follow-up non-refactor PR
# this logic replicates a bug in the previous nudged run
# the SSTs were never actually applied to the fv3gfs-wrapper state
# This bug could be scientifically significant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serious enough that we'd want to throw an error here until it's fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is executed in every nudged run. It's a trivial fix now with this refactor. This scheme just needs to return the dict above rather than an empty dict, but I didn't want to make any bit-incompatible changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #1037

@nbren12 nbren12 merged commit 58da2ce into master Feb 25, 2021
@nbren12 nbren12 deleted the krasnopolsky-stepper branch February 25, 2021 23:53
nbren12 added a commit that referenced this pull request Feb 26, 2021
The steppers repeated code and were difficult to test. This PR refactors the nudging and machine learning steppers to enable more reuse, pushing the common/messy bits up to the TimeLoop class. This allows a better separation of concerns between the "Steppers" (see the docstrings of `Stepper` and `TimeLoop` for more information).

This more "pure" interface allowed regression testing more of the ML code. The nudging stepper is harder to test since it requires MPI and more data.

I tested each refactor using checksums, and fixed the non-determinism in the scikit-learn based tests.

Also added:
- debugging conveniences
- made `DerivedFV3State` an actual MutableMapping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants