Add functionality to connect to local timeseries target data #77

annakrystalli · 2025-02-28T15:02:54Z

This PR resolves #71

It's the first step in accessing target data and focuses on accessing local timeseries target as a first pass.

It also adds a function for creating an expected schema for the timeseries target data that is applied during read.

Validation is not tackled here nor is reading target data from the cloud.

The tests demonstrate that the function can be used to successfully read time-series files that are:

in a single time-series.csv format.
in multiple complete files (i.e. all columns contained in each file) within a time-series directory (including with sub-directories)
in a hive-partitioned time-series directory

Note that when partitioning on a column that does not match any date task IDs (e.g. date column in timeseries is called date and the data are partitioned on it but the equivalent task ID is target_end_date), the date type is not automatically detectable, even if the files are parquet files. As such, I've included a date_col argument that allows us to specify that column explicitly. Eventually this should be configured via any target configuration file we decide on.

Also removed a couple explicit returns to appease lintr

github-actions · 2025-02-28T15:07:11Z

🚀 Deployed on https://67c1e2522617b739a9f61627--hubdata-pr-preview.netlify.app

Including removing a couple of unnecessary explicit retuns

This allows for correct schema determination when timeseries data is partitioned on a date column that does not correspond to valid date task ID column.

zkamvar

Looks good to me! I think the date_col argument is a good compromise.

The only thing I would suggest is to use {gert} instead of {git2r}. There are a few advantages:

10x smaller than {git2r}
{usethis} uses it on the backend, so devs will likely already have it installed
it's more "batteries included" than {git2r} (https://docs.ropensci.org/gert/#differences-with-git2r)

zkamvar · 2025-03-03T18:45:36Z

R/connect_target_timeseries.R

+
+  structure(ts_data,
+    class = c("target_timeseries", class(ts_data)),
+    ts_path = as.character(fs::path_rel(ts_path, hub_path))


So glad you did this---adding the as.character() is always something that I forget to do with fs paths and it always bites me in the end.

zkamvar · 2025-03-03T18:57:44Z

tests/testthat/test-create_timeseries_schema.R

+
+  expect_error(
+    create_timeseries_schema(tmp_hub_path),
+    "No .*date.* type column found in .*time-series.*."


oh this is a clever solution to getting around formatting issues!

annakrystalli added 6 commits February 28, 2025 16:42

Add tagert data utility functions

21a0621

Add create_timeseries_schema function

0309316

Add connect_target_timeseries function. Resolves #71

a51fad6

Add projectID

2821dd7

Remove experimental badge from create_timeseries_schema.

332a6ec

Update NEWS.md

dde4d4b

annakrystalli added 2 commits February 28, 2025 17:10

Appease lintr

c2bc01c

Including removing a couple of unnecessary explicit retuns

Add extra tests

e6bbaa2

annakrystalli marked this pull request as ready for review February 28, 2025 15:16

annakrystalli requested review from zkamvar and elray1 February 28, 2025 15:17

annakrystalli marked this pull request as draft February 28, 2025 15:45

Add date_col argument.

0f5abed

This allows for correct schema determination when timeseries data is partitioned on a date column that does not correspond to valid date task ID column.

annakrystalli marked this pull request as ready for review February 28, 2025 16:18

zkamvar reviewed Mar 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality to connect to local timeseries target data #77

Add functionality to connect to local timeseries target data #77

annakrystalli commented Feb 28, 2025 •

edited

Loading

github-actions bot commented Feb 28, 2025 •

edited

Loading

zkamvar left a comment

zkamvar Mar 3, 2025

zkamvar Mar 3, 2025

Add functionality to connect to local timeseries target data #77

Are you sure you want to change the base?

Add functionality to connect to local timeseries target data #77

Conversation

annakrystalli commented Feb 28, 2025 • edited Loading

github-actions bot commented Feb 28, 2025 • edited Loading

zkamvar left a comment

Choose a reason for hiding this comment

zkamvar Mar 3, 2025

Choose a reason for hiding this comment

zkamvar Mar 3, 2025

Choose a reason for hiding this comment

annakrystalli commented Feb 28, 2025 •

edited

Loading

github-actions bot commented Feb 28, 2025 •

edited

Loading