-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to connect to local timeseries target data #77
base: main
Are you sure you want to change the base?
Conversation
Including removing a couple of unnecessary explicit retuns
This allows for correct schema determination when timeseries data is partitioned on a date column that does not correspond to valid date task ID column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I think the date_col
argument is a good compromise.
The only thing I would suggest is to use {gert} instead of {git2r}. There are a few advantages:
- 10x smaller than {git2r}
- {usethis} uses it on the backend, so devs will likely already have it installed
- it's more "batteries included" than {git2r} (https://docs.ropensci.org/gert/#differences-with-git2r)
|
||
structure(ts_data, | ||
class = c("target_timeseries", class(ts_data)), | ||
ts_path = as.character(fs::path_rel(ts_path, hub_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So glad you did this---adding the as.character()
is always something that I forget to do with fs paths and it always bites me in the end.
|
||
expect_error( | ||
create_timeseries_schema(tmp_hub_path), | ||
"No .*date.* type column found in .*time-series.*." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is a clever solution to getting around formatting issues!
This PR resolves #71
It's the first step in accessing target data and focuses on accessing local timeseries target as a first pass.
It also adds a function for creating an expected schema for the timeseries target data that is applied during read.
Validation is not tackled here nor is reading target data from the cloud.
The tests demonstrate that the function can be used to successfully read time-series files that are:
time-series.csv
format.time-series
directory (including with sub-directories)time-series
directoryNote that when partitioning on a column that does not match any date task IDs (e.g. date column in timeseries is called
date
and the data are partitioned on it but the equivalent task ID istarget_end_date
), the date type is not automatically detectable, even if the files are parquet files. As such, I've included adate_col
argument that allows us to specify that column explicitly. Eventually this should be configured via any target configuration file we decide on.Also removed a couple explicit returns to appease lintr