Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

Commit

Permalink
Merge pull request #4 from cct-datascience/problem_proposal
Browse files Browse the repository at this point in the history
Problem definition and proposal sections
  • Loading branch information
Aariq authored Mar 14, 2024
2 parents 45381b8 + 625a64d commit 79c0ede
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 12 deletions.
23 changes: 19 additions & 4 deletions proposal/problemdefinition.Qmd
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
---
bibliography: references.bib
editor:
markdown:
wrap: sentence
---

# The Problem

```{=html}
<!--
Outlining the issue / weak point / problem to be solved by this proposal. This should be a compelling section that sets the reader up for the next section - the proposed solution!
Expand All @@ -18,5 +16,22 @@ It is important to cover:
- [ ] Have there been previous attempts to resolve the problem
- [ ] Why it should be tackled
-->
```
An example in-text citation [@wickham2016].

Geospatial computations in R are made possible by a group of R packages aimed at [spatial](https://cran.r-project.org/web/views/Spatial.html) and [spatio-temporal](https://cran.r-project.org/web/views/SpatioTemporal.html) data such as `terra`, `sf`, and `stars` [@pebesma2023; @hijmans2024].
Computations on spatial or spatio-temporal data can often be computationally intensive and slow when the underlying data is large (e.g. high-resolution global rasters). Depending on the data source and operation, this can range from taking seconds to days or weeks.
Managing complex geospatial workflows can be confusing and re-running entire data pipelines is likely to be very time-consuming.
The `targets` R package aims to aid with confusing and long-running workflows by automatically detecting dependencies among steps and only re-running steps that need to be re-run [@landau2021].
This seems like a natural fit for complex geospatial workflows in R.
However, geospatial packages like `terra` and `sf` don't work well with `targets` without extensive customization.

One notable difficulty is that `targets`, by default, saves R objects generated by computational steps, but R objects generated by the `terra` package may not actually contain the data itself but rather a C++ pointer to the data.
When one of these `terra` objects is saved (e.g. as a .rds) and read back into R, it loses information about the data it represents and no longer works.
To make these R objects portable and suitable for use with `targets` they need to be "marshaled" and "unmarshaled" requiring complicated code for a custom format.

A second obstacle is that often geospatial data is written in multiple files—for example, [shapefiles](#0), which are actually a collection of up to 12 files with different extensions.
This limits compatibility with `targets` because the intermediate objects stored in a `targets` pipeline are required to be single files with no file extension.

Both of these challenges (and others) have been solved in bespoke ways for individual projects, but to date these solutions have not been formalized and distributed as an R package.
Other packages exist as part of a "[targetopia](https://wlandau.github.io/targetopia/)" that extend `targets` to work for specialized needs. For example [stantargets](https://docs.ropensci.org/stantargets/) for using targets with [stan](https://mc-stan.org/) models.
We hope that `geotargets` can join the "targetopia" and simplify geospatial data analysis with `targets`.
We believe this will unlock a powerful workflow management tool for a large group of R users that have previously been unable (or unwilling) to use it because of these challenges.
26 changes: 25 additions & 1 deletion proposal/proposal.Qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,26 @@ editor:
markdown:
wrap: sentence
---

# The proposal

<!--
This is where the proposal should be outlined.
-->

## Overview

<!--
At a high-level address what your proposal is and how it will address the problem identified. Highlight any benefits to the R Community that follow from solving the problem. This should be your most compelling section.
-->

Our goal is to create a package that makes using `targets` for geospatial analysis in R as seamless as possible.
To that end, `geotargets` will provide custom functions for defining geospatial targets that take care of translating and saving R objects for the user.
In addition, we will create vignettes demonstrating how to use various geospatial R packages with `targets`.
Where appropriate, we will identify contributions to existing R packages to make them easier to use with `targets` and `geotargets`.

## Detail

<!--
Go into more detail about the specifics of the project and it delivers against the problem.
Expand All @@ -23,4 +32,19 @@ Depending on project type the detail section should include:
- [ ] Minimum Viable Product
- [ ] Architecture
- [ ] Assumptions
-->
-->

In the `targets` package, analysis steps, or "targets", are defined with the `tar_target()` function.
Targetopia packages provide additional `tar_*()` functions that extend `targets` primarily in two ways: by providing custom storage formats or by acting as "target factories" (targets that create multiple targets on runtime).
The main contribution of `geotargets` will be a series of alternative `tar_*()` functions that create targets with pre-defined formats that take care of the details of how these R objects are written out and read in by downstream targets.
For example, to write a target that creates a raster using the `terra` package, one would use `geotargets::tar_terra_rast(name, command)`.
`tar_terra_rast()` would provide a pre-defined format created with `targets::tar_format()` with functions for marshaling, writing, reading, and unmarshaling `terra` `SpatRaster` objects.
In this case, marshaling/unmarshaling involves running `terra::wrap()` and `terra::unwrap()`, respectively, to make the R object "self-contained" rather than just containing a C++ pointer to the data.
This is especially necessary for parallel computing with `targets` since `SpatRaster` objects don't work outside of the R session they were created in without `wrap()`ing them first.

As a minimum viable product, we hope to deliver an R package, hosted on GitHub, supporting raster and vector data objects from the `terra` and `sf` packages with custom target functions.
Support for additional geospatial packages will be added based on feedback from the user community and through consultation with geospatial specialists.
In initial development we will choose sensible defaults for what file types targets will be stored as (e.g. GeoTIFF for raster data).
In the future we will develop a `filetype` argument for each `tar_*` function, since there are many options for how geospatial data can be stored on disk by these packages.
For example, "netCDF", "HEIF", and "BMP", and 161 other options listed in the [GDAL raster driver](https://gdal.org/drivers/raster/index.html).
This will offer flexibility in light of trade-offs between file size, read/write speed, and dependency requirements similar to the existing options for how objects are stored by the `targets` package (i.e. default '.rds' with options for faster/smaller file types).
37 changes: 30 additions & 7 deletions references.bib
Original file line number Diff line number Diff line change
@@ -1,9 +1,32 @@

@book{wickham2016,
title = {ggplot2: Elegant graphics for data analysis},
author = {Wickham, Hadley},
year = {2016},
date = {2016},
publisher = {Springer-Verlag New York},
url = {https://ggplot2.tidyverse.org}
@article{landau2021,
title = {The targets R package: a dynamic Make-like function-oriented pipeline toolkit for reproducibility and high-performance computing},
author = {Landau, William},
year = {2021},
month = {01},
date = {2021-01-15},
journal = {Journal of Open Source Software},
pages = {2959},
volume = {6},
number = {57},
doi = {10.21105/joss.02959},
url = {http://dx.doi.org/10.21105/joss.02959}
}

@article{pebesma2023,
title = {{Spatial Data Science: With applications in R}},
author = {Pebesma, Edzer and Bivand, Roger},
year = {2023},
date = {2023},
pages = {352},
doi = {10.1201/9780429459016},
url = {https://r-spatial.org/book/}
}

@article{hijmans2024,
title = {terra: Spatial Data Analysis},
author = {Hijmans, Robert J.},
year = {2024},
date = {2024},
url = {https://CRAN.R-project.org/package=terra}
}

0 comments on commit 79c0ede

Please sign in to comment.