How API Elements could all tie in together around a geospatial data concept #17

jerstlouis · 2019-04-03T23:28:37Z

Datasets
I am proposing that what ties in all the different elements of the OGC API is the abstract concept of a geospatial dataset (which I see as blending into the 'collection' concept, as a dataset being either a collection, or a collection of collections). Such a dataset could have the following characteristics:

A dataset can be a vector feature collection, a coverage/imagery, or a collection of datasets (therefore potentially a mix of any of these things)
A dataset has associated metadata, including some essential information:
- Information about what type of data set it is (vector features (and what type of vector features if limited to one type, e.g. polygons, lines, points), coverages (values) / imagery (pixels), or sub-datasets -- more than one of those things)
- A textual identifier (e.g. which figures in the resource path)
- A title (short name / description)
- Access point for the dataset (could be hosted locally or remote)
- Geospatial & temporal extent
- Resolution/scale
- Units/Range/Bit-Depth/Channels/Dimensions etc. for imagery/coverages
- A description of queryables, if applicable
Keywords/Tags, and longer descriptions are also a commonly useful piece of metadata information
Any other ISO 19115 metadata fields can also be associated with the dataset, but are nowhere near as essential to discovering and using geospatial data as those mentioned above. Meta data containing at minimum those essential elements can always be retrieved in ISO 19115 and potentially other formats.

Processes
Processes take one or more datasets as input, and parameters, and produce one or more datasets as output. This ties the processes together with the data delivery services on both ends.
I suggested so far 3 kinds of processes which can all run on a server where the data lives:

The complex process built as a container or executable, as typical of WPS
Process description languages such as WCPS
Pre-defined named processes such as 'vectorization', 'buffering', 'rasterization' or 'rendering of a styled map'

All of these kinds of processes could share aspects such as taking in an OGC API dataset as input and their output being usable as an OGC API dataset, for direct access and/or asynchronous delivery, and support multiple data partitioning/access mechanisms, estimates/billing elements, and so on.

Server-side rendering
Highlighting here that the rendering of a styled maps based on multiple source data sets and a style as a parameter, outputting a styled 'rendered map' imagery dataset as a result, fits perfectly well the description of a process (3).

Data partitioning
For all of data access / exchange mechanisms, e.g. to retrieve coverages/vector features directly, for a process to access its input, or to retrieve the output from a process, or throughout a daisy chain of processes, there are a variety of ways in which data can be partitioned for efficient access. The most efficient way most likely depends on the overall workflow and the implementations on both ends. I am suggesting most of the OGC API should be agnostic of this and support many such ways, so that both ends of such connection or a workflow manager can negotiate the best approach. Examples of different ways to partition/access data include:

Bounding boxes
Tiles
n-dimensional sub-setting
DGGS cells
Single point value

This is the overall diagram of these ideas that I presented a couple weeks back at the OWS Common telecon and got the chance to explain in more details in person to some of you:

You might notice that this is color-coded based on where these API blocks originated from in classic services:

Orange: Coverages (WCS)
Red: Server-side rendering (WMS)
Blue: Processes (WPS)
Green: Vector features (WFS)
Very light blue: Catalogs (CSW)
Violet: Tiles (WMTS)

cmheazel · 2019-12-16T00:55:46Z

A version of this proposal has been added to the Best Practices.

dblodgett-usgs · 2020-03-16T11:31:04Z

Too bad this proposal didn't get much attention till now.

I wonder if this can't be satisfied with a building block approach where the abstract dataset description is contained in an API implementing the OGC-API Records spec that points to APIs available for the data held in each record.

Will need to ponder some more, but it seems like, from a service developer's point of view, forcing everything (dataset and service hierarchy under 1 end-point introduces some tough requirements. It could lead to monolithic or highly-integrated systems where we actually want the ability to flexibly break things apart into many purpose-build implementations.

The idea that a single OGC-API conformant endpoint is only ever about 1 dataset also completely solves this problem and allows people to solve it one level higher. Thinking in terms of a server developer again, I would want that (or have to craft it) flexibility.

joanma747 · 2020-03-16T11:41:27Z

OGC-API Records spec that points to APIs available for the data held in each record.
In principle this is nice. In practice this is not going to work.

I'm saying that because is actually what we have today: CSW catalogues full of outdated 19115 records each one pointing to dataset distributions and dataset services that return a 404.

Separating the dataset description from the services and distributions is the main practical problem of the SDIs. You may say it is only an implementation problem but it is very real one.

jerstlouis · 2020-03-16T11:51:04Z

Too bad this proposal didn't get much attention till now.

And it is not for lack of trying! ;)

from a service developer's point of view, forcing everything (dataset and service hierarchy under 1 end-point introduces some tough requirements. It could lead to monolithic or highly-integrated systems where we actually want the ability to flexibly break things apart into many purpose-build implementations.

I don't see how this is the case. Nothing is forcing everything under 1 end-point.

You could still write services distributing only one dataset, and then implement an OGC API - Records linking to this (if that is appropriate).
Or you could write another service implementing /collections with hierarchies and search, which link to your individual micro-services for the representations of those collections.

dblodgett-usgs · 2020-03-16T11:56:54Z

@joanma747 --

I would argue that your point is a bit of a non sequitur. "What we have today" is not bad because of its architecture. It's bad because the development community didn't adopt XML (editorial).

The alternative architecture is monolithic, inflexible, complex, and potentially fragile.

Look, let's not let the pendulum swing too far in either direction. We all know there are trade offs here. If we want to support service modularity at the specification AND deployment levels, which I think we do, then we need to keep complexity down.

The DXWG has been hard at work engineering solutions to this problem. #111 (comment) is a really good point that we should not overlook.

@jerstlouis --

It does force everything under one though -- by allowing it, you are telling people implementing software to expect it and support it. By disallowing it, you keep that complexity where it belongs, in a metadata layer where the semantics and complexity can be handled on their face. By disallowing it, you allow implementers to make huge simplifying assumptions and sooner.

jerstlouis · 2020-03-16T12:17:30Z

@dblodgett-usgs --

I am assuming we are talking about a Collection of Collections, and a Collections-level Search capability.

From a services point of view, they can still provide micro-services. They have an option to implement the Collection of Collections and/or Collections-level Search conformance classes in a separate catalog-type service linking to the micro-services representation.

From a client point of view, it should still be optional to support either of these.
A flat list of collections would otherwise be returned.

The benefit I see is for users to simply connect to one simple OGC API service, and their GIS client automatically offering them powerful search and discovery capabilities for datasets organized on that one particular service (which itself could be a catalog of multiple services). And the ability to use that data in many interoperable ways, experimenting with and running processes combining these various datasets locally available on that service (and potentially on federated, or other open data services as well).

The same could apply to a web front-end at the landing page of that OGC API.

rob-metalinkage · 2020-03-16T22:32:29Z

This takes me back to 1999 - some of you will know what I'm talking about :-)

Agreeing with @joanma747 that "out of band" metadata is a problem with the "socio-technical architecture" - even if not the technical architecture. The big problem is that the people responsible for data, services, data metadata and services metadata implementation are unlikely to be the same, they involve different skills and happen at different times.

Making services self-describing - and responsible for describing the data too - means that cataloguing is not an out-of-band social problem but a solvable technical one. Making data more self-describing helps that process, but in the short term its likely to be a retrofitting process - where services describe the data better than it has been (from a FAIR perspective).

Incrementatlism (allowing evolution of better metadata) can be supported by the API common provided we dont try to overspecify that metadata too early - but focus on the mechanism by which it is attached, and ability to identify the particular flavour of metadata available, and allow multiple flavours to co-exist as they evolve, or different flavours provided for different uses. Thats a technical requirement for API-common right there!

The original proposal here, whilst good, does mix both the mechanism and the data model into one hard to digest chunk. Arguing about the metadata model might mean throwing away the concept of discoverability of the metadata. IMHO it needs to be split into two parts, with the data model having a separate specification lifecycle - i.e. push it to a profile of OGC-API for FAIR (aka SDI) (i.e. ogc-api does not need to support FAIR - it can be used in ad-hoc non reusable forms in private service architectures - but we need the FAIR version too...)

dblodgett-usgs · 2020-06-04T19:12:44Z

Based on where we've landed in #140, I think we can close this issue as agreed.

jerstlouis · 2020-07-28T01:22:58Z

The processing aspect of this approach is being investigated in the Modular OGC API Workflows (MOAW) project, and detailed in opengeospatial/ogcapi-processes#47 (comment) .

jerstlouis · 2020-07-28T01:39:08Z

At this point, I believe the idea of a dataset being either a collection, or a collection of collections has been rejected, and the distinction between a dataset (DCAT definition / data being published by a single entity) and a collection (data layer) made in OGC API - Features is kept.

Apart from this, most of the rest of the modular approach described in this issue is still valid.

As with Features, OGC API - Common - Part 2: Geospatial Data would define a dataset has having one or more collections of geospatial data.
Separate types of API landing pages are needed as a result, both offering APIs conforming to OGC API - Common - Part 1: Core (and potentially other OGC API standards):

A landing page not associated with any particular dataset, e.g. a landing page for a service offering multiple datasets, and potentially processing capabilities on them, or only processing.{serviceAPI} could be used to identify the root of such a service
A landing page specifically associated with a dataset, whose root could be identified as {datasetAPI} and which may be or not part of a bigger {serviceAPI}. e.g. {serviceAPI}/datasets/{datasetId} could be a {datasetAPI} landing page. As demonstrated at https://eratosthenes.pvretano.com/Projects/tb16/ogcapi/datastores.html

cmheazel · 2020-08-17T14:11:46Z

The content provided by @jerstlouis has been included in the Users Guide along with a table of resources and paths (URIs). This content can be updated as we mature the underlying architecture for OGC APIs.
This issue will be kept open for now to track the discussion and lessons learned.

jerstlouis mentioned this issue Jul 30, 2019

Highly scalable, chainable and flexible processes centered around 'Collections' as inputs & outputs opengeospatial/ogcapi-processes#47

Open

cmheazel added Hackathon Resources of Collections type Issues related to the /collections path labels Sep 5, 2019

cmheazel added general approach and removed Hackathon labels Dec 23, 2019

This was referenced Mar 15, 2020

Resource specific collections #36

Closed

The name Collections, intended to mean Geospatial Data Layers, is causing confusion #111

Closed

Collection of Collections #11

Open

jerstlouis mentioned this issue Mar 27, 2020

Unified API Standards #62

Closed

cmheazel added the Collections Applicable to Collections (consider to use Part 2 instead) label May 11, 2020

dblodgett-usgs mentioned this issue May 21, 2020

Collections Discussion #140

Open

jerstlouis changed the title ~~How API Elements could all tie in together around a flexible Dataset concept~~ How API Elements could all tie in together around a flexible geospatial data concept Jul 28, 2020

jerstlouis changed the title ~~How API Elements could all tie in together around a flexible geospatial data concept~~ How API Elements could all tie in together around a geospatial data concept Jul 28, 2020

cmheazel added Guide and removed Collections Applicable to Collections (consider to use Part 2 instead) labels Aug 17, 2020

cmheazel mentioned this issue Aug 17, 2020

Add "processes" to "collections" and "items" #19

Closed

jerstlouis mentioned this issue Aug 24, 2020

link relation type "data" has implications for Process API #132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How API Elements could all tie in together around a geospatial data concept #17

How API Elements could all tie in together around a geospatial data concept #17

jerstlouis commented Apr 3, 2019 •

edited

Loading

cmheazel commented Dec 16, 2019

dblodgett-usgs commented Mar 16, 2020

joanma747 commented Mar 16, 2020

jerstlouis commented Mar 16, 2020

dblodgett-usgs commented Mar 16, 2020

jerstlouis commented Mar 16, 2020 •

edited

Loading

rob-metalinkage commented Mar 16, 2020

dblodgett-usgs commented Jun 4, 2020

jerstlouis commented Jul 28, 2020

jerstlouis commented Jul 28, 2020 •

edited

Loading

cmheazel commented Aug 17, 2020

How API Elements could all tie in together around a geospatial data concept #17

How API Elements could all tie in together around a geospatial data concept #17

Comments

jerstlouis commented Apr 3, 2019 • edited Loading

cmheazel commented Dec 16, 2019

dblodgett-usgs commented Mar 16, 2020

joanma747 commented Mar 16, 2020

jerstlouis commented Mar 16, 2020

dblodgett-usgs commented Mar 16, 2020

jerstlouis commented Mar 16, 2020 • edited Loading

rob-metalinkage commented Mar 16, 2020

dblodgett-usgs commented Jun 4, 2020

jerstlouis commented Jul 28, 2020

jerstlouis commented Jul 28, 2020 • edited Loading

cmheazel commented Aug 17, 2020

jerstlouis commented Apr 3, 2019 •

edited

Loading

jerstlouis commented Mar 16, 2020 •

edited

Loading

jerstlouis commented Jul 28, 2020 •

edited

Loading