Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance MTD to process time series with non-uniform time steps, such as monthly data #1971

Closed
14 of 20 tasks
JohnHalleyGotway opened this issue Nov 16, 2021 · 17 comments · Fixed by #2221
Closed
14 of 20 tasks
Assignees
Labels
MET: Object Verification Object-based feature Verification priority: medium Medium Priority requestor: Community General Community type: enhancement Improve something that it is currently doing
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Nov 16, 2021

Describe the Enhancement

This issue arose via METplus Discussions: dtcenter/METplus#1249

The user is passing monthly data as input to MTD. However, since the number of days in a month do not remain constant, the interval between timesteps does not remain constant. This certainly is a reasonable use of MTD and should be supported.

Recommend providing the simplest fix, which is switching that ERROR message to a WARNING instead. However when making this change, be sure to check the code to see if/where the timestep is used. Make sure that the timestamps written to the output actually do match the timesteps read from the input.

Time Estimate

4 hours.

Sub-Issues

Consider breaking the enhancement down into sub-issues.
No sub-issues needed.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2700042

Define the Metadata

Assignee

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added type: enhancement Improve something that it is currently doing priority: medium Medium Priority requestor: Community General Community alert: NEED ACCOUNT KEY Need to assign an account key to this issue MET: Object Verification Object-based feature Verification labels Nov 16, 2021
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.1.0 milestone Nov 16, 2021
@TaraJensen TaraJensen modified the milestones: MET 10.1.0, MET 11.0.0 Dec 17, 2021
@davidalbo
Copy link
Contributor

@CPKalb I'm working on this today. A few questions:

Can you think of other circumstances where it would legitimately be an error, not this particular 'days per month' problem? If so maybe the code could check for values that are days per month (30,31,28,29), and go with 'warning' only if that is the case.

Would it be possible to set up a test to run this in debugger? (example data and config file)? Otherwise I'll have to examine the code carefully to see if the timestep stuff is used correctly with the change, which is a little riskier compared to using a debugger. I'd need hand holding on getting some data and setting a config.

Thanks
Dave

@davidalbo
Copy link
Contributor

davidalbo commented Jun 30, 2022

Note to self: A few lines like this could be problematic, not sure. delta_t is set constant in the object but in the case in question it is not constant.

  att_2.set_valid_time(fcst_obj.start_valid_time() + t*(fcst_obj.delta_t()));

att_2.set_valid_time(fcst_obj.start_valid_time() + t*(fcst_obj.delta_t()));

@davidalbo davidalbo moved this from To Do to In Progress in MET-11.0.0-beta2 (8/9/22) Jun 30, 2022
@CPKalb
Copy link
Contributor

CPKalb commented Jul 5, 2022

Hi Dave:
Apologies, I was out Thursday and Friday. I think there are situations where this would be an error. However, so long as a warning is triggered, then it would be up to the user to determine whether they need to fix it. I think a warning would be sufficient for this reason.

And good point about the delta t. I'm not sure how to best handle this.

@davidalbo
Copy link
Contributor

Hope you had a good time out. Handling that delta_t code when delta_t is not constant would be a fairly big deal, coding wise. I'm not sure how far to take this issue. Simply changing the code to a warning instead of error, and just keeping going instead of exiting would be very easy to implement. BUt... the output would (pretty sure) have constant delta_t, which would be wrong and not respond to John HG's initial comment.

Any example data/config out there to play with?

@CPKalb
Copy link
Contributor

CPKalb commented Jul 5, 2022

The data came from a user. My guess is that we don't still have it, but Iet me check the ftp and see if it survived.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Jul 5, 2022

There are 2 pretty reasonably scenarios I can think of in which the non-const time step issue comes up.

  1. When processing monthly climate data where the number of days in a month is not constant (i.e. 28 vs 30 vs 31).
  2. When processing satellite or radar data with high temporal frequency. For example, sometimes the timestamp is 1 minute before or a couple minutes after the top of the hour... but we process it as being representative for that hour.

The goal here is to make MTD work in both of these cases rather than having it error out. We don't want to change the MTD algorithm in any substantive way.

In the original issue, I stated that we should attempt to write the ACTUAL timestamps of the input data to the MTD output files. For example, if processing monthly climo data, reporting the output times as the 1st (or maybe 15th) of each month seems desirable. The alternative of reporting Jan 1st + 30 days = Jan 31st seems clumsy and confusing.

So we'll need to store the actual input valid time for each timestep if the code is not already doing so.

As for how to pick which timestep to use in the algorithm, ideally, we'd construct a list of the input timesteps and select the mathematical mode of them. The NumArray class in MET has a "mode()" member function you could call. That will likely work for the number of days in a month issue but I'm less sure how to handle the satellite or radar data where the number of minutes would vary. Perhaps just stick with the NumArray::mode() option for now until a preferable solution presents itself.

The point here is that when the timesteps are "close enough" we still want MTD to run and produce output.

We could consider some maximum allowable variance in the timestep and still error out if the diffs are too large. But I'll leave that to @CPKalb to advise on. For example, you may want to process 30 days of data but have two days of missing data in the list. That'd result in 2 large jumps in time steps. Tina, in that case, is it better for MTD to error out or still run using the most common timestep of 1 day?

@davidalbo
Copy link
Contributor

Good ideas from @JohnHalleyGotway. Do we happen to have any data so I could run this in the debugger? The reason I keep asking for examples that is I'm still learning the MET code and how it is organized. I find I have to look around to find the source code every time app code goes into library code, which is tedious and slow. Data that does have regular time steps would be useful for understanding this app, especially how it creates its output data and writes it.

If I'm understanding, several problems need solving, one is running the algorithm with a fixed time step when the time step is variable, the other is to write the output so it preserves the actual time steps. A third problem is identifying actual errors that are too severe to run the algorithm.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Jul 6, 2022

Sure @davidalbo, makes sense. To get started you could try running some of the monthly climo data from NCEP through MTD. Please take a look on seneca in:

cd /d1/projects/MET/MET_test_data/unit_test/climatology_data/NCEP_2.5deg
ls pgba_mean.1959*
pgba_mean.19590115  pgba_mean.19590315  pgba_mean.19590515  pgba_mean.19590715  pgba_mean.19590915  pgba_mean.19591115
pgba_mean.19590215  pgba_mean.19590415  pgba_mean.19590615  pgba_mean.19590815  pgba_mean.19591015  pgba_mean.19591215

These 12 GRIB files contain monthly climo means for a handful of variables. And for each variable there are 4 records, for 00, 06, 12, and 18Z. You could try running MTD on 500mb height. Please take a look at run_mtd.sh in:

cd /d1/personal/johnhg/MET/MET_development/MET-feature_1971
tail run_mtd.log

To see this error message:

ERROR  : 
ERROR  :   mtd_read_data() -> file time increments are not constant!
ERROR  : 

@davidalbo
Copy link
Contributor

Got it and copied to my area for testing. Is 1959 a year? (must be as I'm getting negative unix time values). That's almost as old as me. :)

Turning the error into a warning and just keeping it going yielded 3 output files here:
/d1/personal/dave/test-issue1971/out
mtd_19590115_000000V_2d.txt
mtd_19590115_000000V_3d_single_simple.txt
mtd_19590115_000000V_obj.nc

The file names go with the first oldest time.
The nc file does not seem to have any time information except as a dimension.
The 2d text has some columns that do assume constant delta time, I'd say, probably need changing.
The 3d text file not sure.

I should be able to latch onto the actual time steps and do the mode() like you suggest, if I can figure out where the result would be applied/used.

@JohnHalleyGotway
Copy link
Collaborator Author

@davidalbo I took a look in these output files and would advise...

  1. In mtd_19590115_000000V_2d.txt, update the contents of the FCST_VALID and OBS_VALID columns to match the actual valid times of the input data. The FCST_LEAD and OBS_LEAD columns are all 0 in this example. But that timing info should also be stored separately for each timestep so that it's reported accurately. Continuing to report the T_DELTA as 744 hours (= 31 days) is probably fine. I assume that's just the first timestep encountered (time 1 - time 0). It'd probably be best to report the "most common" (mode) timestemp.
  2. In mtd_19590115_000000V_3d.txt I believe the FCST_VALID and OBS_VALID columns report the beginning valid time of the 3D space/time object. In your output, that's the first timestamp. Just make sure that it reports that the timestamp reported there is the accurate timestamp for the starting timestep. e.g. If the object starts at timestep 5, it should report 19590515... and not 19590519 that I see in the 2d output file.
  3. Look in /d1/projects/MET/MET_regression/develop/NB20220707/MET-develop/test_output/mtd and note that more output files can be written:
mtd_BASIC_20100517_010000V_2d.txt
mtd_BASIC_20100517_010000V_3d_pair_cluster.txt
mtd_BASIC_20100517_010000V_3d_pair_simple.txt
mtd_BASIC_20100517_010000V_3d_single_cluster.txt
mtd_BASIC_20100517_010000V_3d_single_simple.txt
mtd_BASIC_20100517_010000V_obj.nc

The change to the timing info in 2. (above) also applies to the the 3d_pair_simple, 3d_pair_cluster, 3d_single_simple, and 3d_single_cluster outputs.

@davidalbo
Copy link
Contributor

davidalbo commented Jul 12, 2022

I've got it working as best I can tell, with NO checking for really crazy time steps, right now the code, no matter what, takes the mode of the time differences as 'the' time difference used everywhere. In the example @JohnHalleyGotway gave me this time difference shows up in the 2d and 3d text files as 7440000 i.e. 744 hours, 0 minutes, 0 seconds, which is 31 days as expected. I can't see that this time difference gets used in any way as part of an algorithm, maybe I'm missing that.

So @CPKalb any thoughts on @JohnHalleyGotway comments?:

**_The point here is that when the timesteps are "close enough" we still want MTD to run and produce output.

We could consider some maximum allowable variance in the timestep and still error out if the diffs are too large. But I'll leave that to @CPKalb to advise on. For example, you may want to process 30 days of data but have two days of missing data in the list. That'd result in 2 large jumps in time steps. Tina, in that case, is it better for MTD to error out or still run using the most common timestep of 1 day?_**

I could consider the issue done, but figure some other pathological case could make me want to test something and give an error exit in some cases.

@davidalbo
Copy link
Contributor

Pinging @CPKalb: I left this issue with a question for you, and actually have some time to wrap it up. My initial thought is I can easily test to see if this is the 'month' situation and deal with that appropriately, and continue to ERROR out if it's not that, or do some other kind of test for 'close enough'. Any opinion or other wisdom?

@CPKalb
Copy link
Contributor

CPKalb commented Jul 26, 2022

Hi Dave:
Sorry for the slow response on this. The only other thing I can think of is that someone may want to do a seasonal time step (for example, use an average over a season like December, January, February and then skip to the next year), so some combination of 3-4 months. I've done that in the past with statistics, although never used it with mtd. I think the concern here is that the time step would be pretty large (since it would be 9 months or even a year depending on how the season is time stamped).

I'm not sure what the best way to handle something like this would be. If the timestep isn't used, then maybe a warning is all that is needed followed by the calculation proceeding? I can't think of a situation where a time step would be larger than a year, but climate is also not my specialty so maybe there is something

@davidalbo
Copy link
Contributor

Hey @JohnHalleyGotway . Is there any way to set up another test to create these additional types of 3d files, even if the time steps are constant? I could then step through in a debugger and see what the code would do if the time steps were variable (i.e. make sure the changes I've made would apply in these cases).

Look in /d1/projects/MET/MET_regression/develop/NB20220707/MET-develop/test_output/mtd and note that more output files can be written:
mtd_BASIC_20100517_010000V_2d.txt
mtd_BASIC_20100517_010000V_3d_pair_cluster.txt
mtd_BASIC_20100517_010000V_3d_pair_simple.txt
mtd_BASIC_20100517_010000V_3d_single_cluster.txt
mtd_BASIC_20100517_010000V_3d_single_simple.txt
mtd_BASIC_20100517_010000V_obj.nc
The change to the timing info in 2. (above) also applies to the the 3d_pair_simple, 3d_pair_cluster, 3d_single_simple, and 3d_single_cluster outputs.

@davidalbo
Copy link
Contributor

@JohnHalleyGotway slightly confused here, in mtd_txt_output.cc, method do_2d_txt_output()

for (j=0; j<(fcst_simple_att.n()); ++j)  {

   ++r;

   t = fcst_simple_att.time_index(j);

   table.set_entry(r, fcst_lead_column, sec_to_hhmmss(fcst_simple_att.lead_time(j)));

   table.set_entry(r, fcst_valid_column, unix_to_yyyymmdd_hhmmss(fcst_simple_att.valid_time(j)));

   table.set_entry(r,  obs_valid_column, unix_to_yyyymmdd_hhmmss(obs_raw.valid_time(t)));

   table.set_entry(r,  obs_lead_column,  sec_to_hhmmss(obs_raw.lead_time(t)));

}

the obs_raw.valid_time() is incorrect, whereas the fcst_simple_att.valid_time() is correct. My solution is to pass in the actual valid times (obs and fcst) and use them instead of the raw incorrect times. Can you explain the use of time_index()? In the testing I'm doing time_index(j) = j always. If it were not the same, would it be an index to the correct actual obs valid time?
The same pattern is seen in several such for loops in this method.

@JohnHalleyGotway
Copy link
Collaborator Author

@davidalbo I always recommend starting with the existing unit tests. The first unit test for mtd does generate the 3D ascii output files.

You can run this on seneca by setting up the test environment and then running:

cd internal/test_unit
perl/unit.pl xml/unit_mtd.xml

Or add the --cmd command line option to just print the commands that'll be executed rather than actually executing them. That way you can copy/paste the command to be run via a debugger:

perl/unit.pl xml/unit_mtd.xml --cmd

I use the following testenv alias setup my environment:

alias testenv
alias testenv='export MET_TEST_BASE=`pwd`; export MET_BUILD_BASE=`pwd`/../..; export MET_BASE=`pwd`/../../share/met; printenv | egrep "MET_TEST_BASE|MET_BUILD_BASE|MET_BASE";'

I just run this from the test_unit directory:

cd internal/test_unit
testenv

You do also need to define $MET_TEST_OUTPUT to define where the output should be written.

@davidalbo
Copy link
Contributor

@JohnHalleyGotway useful! I ran the unit tests in the debugger to see what would happen in the places I made some changes. Everything worked as expected, meaning I didn't break something, at least not something common, so I'm going to do a pull request today.

@davidalbo davidalbo linked a pull request Aug 3, 2022 that will close this issue
15 tasks
JohnHalleyGotway added a commit that referenced this issue Aug 5, 2022
…vision history and reformatted some of the error messages.
Repository owner moved this from To Do to Done in MET-11.0.0-beta3 (9/21/22) Aug 5, 2022
@JohnHalleyGotway JohnHalleyGotway moved this from To Do to Done in MET-11.0.0-beta2 (8/9/22) Aug 9, 2022
@JohnHalleyGotway JohnHalleyGotway changed the title Switch MTD checking of a consistent time steps from an ERROR to a WARNING. Enhance MTD to process time series with non-uniform time steps, such as monthly data Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: Object Verification Object-based feature Verification priority: medium Medium Priority requestor: Community General Community type: enhancement Improve something that it is currently doing
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants