Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle all location types: geopoint, geotrace, geoshape #69

Closed
5 tasks done
florianm opened this issue May 13, 2020 · 1 comment
Closed
5 tasks done

Handle all location types: geopoint, geotrace, geoshape #69

florianm opened this issue May 13, 2020 · 1 comment
Assignees
Labels
feature a feature request or enhancement help wanted ❤️ we'd love your help!

Comments

@florianm
Copy link
Collaborator

florianm commented May 13, 2020

Spatial musings

ODK Collect can capture points (geopoints), lines (geotrace), and polygons (geoshape).
ODK Central's 0.7 OData submissions export geopoints either as GeoJSON (ODK Central and ruODK default) or WKT.
ODK Central's 0.7 OData submissions export geotrace and geoshape always as GeoJSON.
Since v0.8, GeoJSON and WKT export seems to work across all geo field types.

What does a data analyst want to do with each location type?
How much parsing should ruODK provide?
As a data analyst, I need to:

  • extract simple lon/lat columns from each geo type - easy for points, harder for trace/shape: centroid lon/lat, or first point lon/lat
  • export data as spreadsheets to share with minimally tech savvy users - requires lon/lat in separate columns for each geo type
  • create a map (leaflet) and put geopoints, geotraces, geoshapes on it - requires a format leaflet can map plus working examples in a new vignette "spatial is not special"

Which formats should ruODK output:

  • sf::sfc_POLYGON via sf::st_as_sfc() (note sf github actions is broken, introducing sf to ruODK will shaft GHA)
  • st::SpatialPolygons via rgeos::readWKT()
  • data model considerations: ODK forms can have multiple location columns, R spatial objects have exactly and up to one geometry column
  • performance consideration when creating spatial types (by attribute or by feature?)

Implementation

Current behaviour

  • GeoPoint if WKT is split into _{longitude, latitude, altitude}
  • GeoPoint if GeoJSON is split during rectangling into anonymous fields _{11,12,13} (or other numbers)
  • Geotrace and geoshape are not handled and remain GeoJSON

Suggested new behaviour (discuss)

  • Retain the original column to allow parsing into spatially enabled formats (st_point, sfg and friends)
  • Annotate fields with _{longitude, latitude, altitude, accuracy}
  • For geotraces and geoshapes, which point should be extracted? Options: first point, centroid. I'd go for first point, as that's actually part of the shape.
  • An obvious scope boundary is introducing new dependencies. If we just extract point coordinates we don't need sf, st, rgeos, rgdal which can be a pain to install. We could provide examples for going full spatial as a starting point.

Probably out of scope

  • Do more with the geopoint - turn into sfg (simple feature geometry) in addition to splitting out coordinates (great to have for leaflet maps)
  • Parse geotrace and geoshape into sfg (keep GeoJSON? always parse?) First cut here.

Challenges

odata_submission_rectangle needs to exempt geo{point, trace, shape} from unnesting

odata_submission_rectangle will blindly (as it has no form introspection ...yet) unnest any list column. This works OK-ish for GeoJSON points as these have a set length. Unnesting geotraces and geoshapes will inevitably end in tears over their variable number of points.

This means that odata_submission_get(parse = TRUE, wkt = FALSE) will fail with geotraces or geoshapes.

Therefore odata_submission_rectangle needs a parameter form_schema = NULL to pass an optional form schema, extract field names of geo{point, trace, shape}s, and exempt those from rectangling. This would preserve GeoJSON and parse GeoJSON into list columns, which then could be further parsed into spatial classes.

Getting too funky with spatial classes will add dependencies

Downside: sf, st, rgdal, rgeos dependencies could be a pain for users to install.

Upside: Adding some nice helpers and working examples (mapping, spatial operations, coordinate extraction) would be immensely useful to less spatially versed R users.

Testing

Test form: https://sandbox.central.getodk.org/#/projects/14/forms/build_Locations_1589344221/submissions

Added to CONTRIBUTING.md / Test:

ODKC_TEST_FID_WKT="build_Locations_1589344221"

As DBCA still runs a production server on ODK Central v0.6, and the spatial output has changed since then, I will address this issue after migrating our server. ETA mid 2020.

Checklist

Once spatial parsing has changed, make sure to update all the docs and examples.

  • Add example data containing all geofield types for each option: GeoJSON / WKT, parsed / raw, plus form_schema. Use in tests and examples. Keep example data up to date with package functions.
  • Add source for geofield example form to inst/extdata
  • Function examples (odata_submission_get, _rectangle, handle_ru_geo*)
  • Update vignettes
  • Update skeleton.Rmd
@florianm florianm added feature a feature request or enhancement help wanted ❤️ we'd love your help! labels May 13, 2020
@florianm florianm added this to the Release 1.0 milestone May 13, 2020
@florianm florianm self-assigned this May 13, 2020
florianm pushed a commit that referenced this issue May 14, 2020
* add new test form `fid_wkt` containing geopoints, geotraces, geoshapes with each widget appearance
* add test form to settings, settings tests, instructions to contributing
* odata_submission_rectangle now takes a form_schema (and passes it on to unnest_all) to exempt geo{point, trace, shape} from unnesting
* geo fields GeoJSON become nested lists, geo fields WKT remain character strings
* TODO: add handlers to value-add geo fields
florianm pushed a commit that referenced this issue May 14, 2020
This was referenced May 15, 2020
florianm pushed a commit that referenced this issue May 18, 2020
* odata_submission_rectangle takes param form_schema to prevent unnesting of GeoJSON (nested list of fixed (geopoint) or arbitrary (geotrace, -shape) length == rectangling drama)
* odata_submission_get handballs all geopoint business to handle_ru_geopoints
* handle_ru_geopoints takes param form_schema to infer geopoint fields and wkt to tell split_geopoint whether to expect GeoJSON or WKT
* split_geopint takes param wkt from handle_ru_geopoints
* geopoint tests refactored

Remaining work: geotrace and geoshape
florianm pushed a commit that referenced this issue May 18, 2020
florianm pushed a commit that referenced this issue May 18, 2020
florianm pushed a commit that referenced this issue May 19, 2020
* handle_ru_geotrace() retains original format (GeoJSON or WKT) and extracts lon, lat, alt from the first point of the geotrace.
* Add package data with example geofields, use in tests and examples, document
* Add source for geolocation test form as odkbuild and xml
* Add ODKC_TEST_FID_WKT to GHA
* RMD skeleton updated with comments re geofields
* Bump dev version
florianm pushed a commit that referenced this issue May 19, 2020
* bump version
* handle_ru_geoshape, split_geoshape work very similar to geotrace equivalents
* refactor tests
* update packaged geofields test data with parsed geoshape
* make_data: bugfix geo_fs: set form_schema
@florianm
Copy link
Collaborator Author

florianm commented May 19, 2020

Current status:

  • GHA test suite passes
  • odata_submission_rectangle excludes spatial fields from unnesting if given a form_schema
  • ruODK handles geopoint, geotrace, geoshape
  • original values are parsed but retained (GeoJSON > nested list, WKT > text), additionally, point coords (using first point for trace and shape) are extracted into lon, lat, alt, and acc (where given)
  • location test data included
  • vignettes are updated (WIP - good enough for now but added comments with ideas for improvement)

This issue could make it into ropensci/software-review#335

florianm pushed a commit that referenced this issue May 19, 2020
* streamline spatial bits
* insert comments on how to split up and simplify vignettes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement help wanted ❤️ we'd love your help!
Projects
None yet
Development

No branches or pull requests

1 participant