Correct `AveragedTimeInterval` to use actuations #3721

liuchihl · 2024-08-21T20:12:16Z

This is a cleanup version of the closed PR #3717

hdrake · 2024-08-21T20:24:41Z

@liuchihl, thanks for cleaning up these changes by separating them from the background flux PR—it's much clearer now.

Consolidating @glwagner and @navidcy's earlier comments, it seems there are three things that need to be done before this can be merged:

Review the the existing OutputWriter tests and verify that they still pass with the new implementation in this PR
Create a new, more rigorous, test that is capable of flagging the bizarre behavior you found in your issue but (hopefully) now passes thanks to the changes in this branch.
Add some warnings to let users know that TimeInterval and AveragedTimeInterval (and probably other diagnostic schedules) are currently broken and give incorrect results after picking up from a checkpoint whenever the checkpoint interval is not an integer multiple of the scheduled time interval.

glwagner · 2024-08-22T15:33:29Z

Add some warnings to let users know that TimeInterval and AveragedTimeInterval (and probably other diagnostic schedules) are currently broken and give incorrect results after picking up from a checkpoint whenever the checkpoint interval is not an integer multiple of the scheduled time interval.

Can we make it so this warning gets thrown only if one is using a Checkpointer? I think the majority of simulations do not use a Checkpointer so the warning would be irrelevant in most cases. Maybe we should put the warning within the checkpointer constructor.

Checkpointing certainly needs love. I think it's only used for barebones stuff right now, not complicated simulations. To be fully featured we have to somehow have a system for checkpointing all callbacks. It's not just AveragedTimeInterval that would have a problem.

hdrake · 2024-08-22T15:59:33Z

I think the majority of simulations do not use a Checkpointer so the warning would be irrelevant in most cases.

I don't get this. How are people not using a Checkpointer? Is no one else limited by HPC wall times or running long simulations? It seems like one of the most fundamental capabilities of any time-stepped numerical model.

But yes, this warning should only be issued if both OutputWriter and Checkpointers are being used and if the checkpointer interval is not an integer multiple of the OutputWriter interval.

glwagner · 2024-08-22T16:03:56Z

I think the majority of simulations do not use a Checkpointer so the warning would be irrelevant in most cases.

I don't get this. How are people not using a Checkpointer? Is no one else limited by HPC wall times or running long simulations? It seems like one of the most fundamental capabilities of any time-stepped numerical model.

But yes, this warning should only be issued if a Checkpointer is used (and maybe also when a simulation picks up from an existing Checkpoint).

I think for sophisticated research Checkpointing is common, but for simpler classroom and LES applications the checkpointer is used less. After all, probably the most simulations are actually run in our examples on CI -- and there are no examples with a checkpointer! (It would be nice to change that)

I can't speak for others, but for boundary layer parameterization work the LES typically run in less than 24 hours of wall time. We also only utilize very simple diagnostics, like the horizontally-averaged solution at the final time step. So in those rare cases that we need a checkpointer (I have used a handful of times) barebones checkpointing is sufficient.

Of course we are currently working on building a OMIP simulation and that will require much longer runs, so we will definitely need more sophisticated checkpointing very soon.

@simone-silvestri and @tomchor might have more to add. Or @sandreza, what do you use for the neverworld work?

I'm not saying we don't want to develop this, I'm just providing some context about why this hasn't been resolved / developed yet.

In an ideal world the simulations would run fast enough that we wouldn't need checkpointing, after all 😄

tomchor · 2024-08-22T16:21:39Z

I think for sophisticated research Checkpointing is common, but for simpler classroom and LES applications the checkpointer is used less. After all, probably the most simulations are actually run in our examples on CI -- and there are no examples with a checkpointer! (It would be nice to change that)

Agreed!

I can't speak for others, but for boundary layer parameterization work the LES typically run in less than 24 hours of wall time. We also only utilize very simple diagnostics, like the horizontally-averaged solution at the final time step. So in those rare cases that we need a checkpointer (I have used a handful of times) barebones checkpointing is sufficient.

Of course we are currently working on building a OMIP simulation and that will require much longer runs, so we will definitely need more sophisticated checkpointing very soon.

@simone-silvestri and @tomchor might have more to add. Or @sandreza, what do you use for the neverworld work?

For context, 100% of my simulations have used checkpoints. As far as I know, 100% of the simulations from others in my group also use checkpoints. The only exceptions for my case are very early scripts still in the development phase, and still with very coarse grids. As soon as I try to get more serious with it, I need checkpoints. So this a PR I'm very much looking forward to seeing merged ;)

correct windowed_time_average.jl

a52812b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct `AveragedTimeInterval` to use actuations #3721

Correct `AveragedTimeInterval` to use actuations #3721

liuchihl commented Aug 21, 2024

hdrake commented Aug 21, 2024 •

edited

Loading

glwagner commented Aug 22, 2024 •

edited

Loading

hdrake commented Aug 22, 2024 •

edited

Loading

glwagner commented Aug 22, 2024 •

edited

Loading

tomchor commented Aug 22, 2024

Correct AveragedTimeInterval to use actuations #3721

Are you sure you want to change the base?

Correct AveragedTimeInterval to use actuations #3721

Conversation

liuchihl commented Aug 21, 2024

hdrake commented Aug 21, 2024 • edited Loading

glwagner commented Aug 22, 2024 • edited Loading

hdrake commented Aug 22, 2024 • edited Loading

glwagner commented Aug 22, 2024 • edited Loading

tomchor commented Aug 22, 2024

Correct `AveragedTimeInterval` to use actuations #3721

Correct `AveragedTimeInterval` to use actuations #3721

hdrake commented Aug 21, 2024 •

edited

Loading

glwagner commented Aug 22, 2024 •

edited

Loading

hdrake commented Aug 22, 2024 •

edited

Loading

glwagner commented Aug 22, 2024 •

edited

Loading