Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treat time of sample consistently #34

Open
jordansread opened this issue May 17, 2018 · 1 comment
Open

treat time of sample consistently #34

jordansread opened this issue May 17, 2018 · 1 comment

Comments

@jordansread
Copy link
Member

Some timeseries files have subdaily time that isn't used (dropped) and others include it, but then filter out all measurements that aren't taken at noon. The result of the former is that the earliest measurement per day is used (often a cold bias) and the latter uses a single time but excludes other times that grab samples were taken.

I'd be in favor of treating them consistently, but also would be in favor of deferring the downsampling behavior to a much later stage instead of handling it on a per-file basis. Reason being is that we could choose to evaluate models on an hourly basis (and would therefore want hourly obs if we have them), or we could drop samples from certain times of the day/time, or we could interpolate to a single time point during the day. We'd want that logic to live in one function and probably be applied closer to the end of the chain.

@wdwatkins
Copy link
Collaborator

My intention was to try to treat things similar to the lowest common sampling frequency denominator, which (I think) are generally biweekly hand measurements. If the sample time is ignored that was probably an oversight on my part (definitely for the giant MPCA tsv).

I agree on dealing with sample frequency all at once. A universal method would need to account for single daily measurements, and several different frequencies of automated collection — there are many files with measurements every 6 hours, and some hourly or 15 minutes.

There is also higher-resolution data available for the LTER sites — I am just downloading a daily summary of them right now, e.g. here: https://github.com/USGS-R/pgml_temperature_prediction/blob/master/1_all_raw_data.yml#L179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants