Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How API Elements could all tie in together around a geospatial data concept #17

Open
jerstlouis opened this issue Apr 3, 2019 · 11 comments
Labels
general approach Guide Resources of Collections type Issues related to the /collections path

Comments

@jerstlouis
Copy link
Member

jerstlouis commented Apr 3, 2019

Datasets
I am proposing that what ties in all the different elements of the OGC API is the abstract concept of a geospatial dataset (which I see as blending into the 'collection' concept, as a dataset being either a collection, or a collection of collections). Such a dataset could have the following characteristics:

  • A dataset can be a vector feature collection, a coverage/imagery, or a collection of datasets (therefore potentially a mix of any of these things)
  • A dataset has associated metadata, including some essential information:
    • Information about what type of data set it is (vector features (and what type of vector features if limited to one type, e.g. polygons, lines, points), coverages (values) / imagery (pixels), or sub-datasets -- more than one of those things)
    • A textual identifier (e.g. which figures in the resource path)
    • A title (short name / description)
    • Access point for the dataset (could be hosted locally or remote)
    • Geospatial & temporal extent
    • Resolution/scale
    • Units/Range/Bit-Depth/Channels/Dimensions etc. for imagery/coverages
    • A description of queryables, if applicable
  • Keywords/Tags, and longer descriptions are also a commonly useful piece of metadata information
  • Any other ISO 19115 metadata fields can also be associated with the dataset, but are nowhere near as essential to discovering and using geospatial data as those mentioned above. Meta data containing at minimum those essential elements can always be retrieved in ISO 19115 and potentially other formats.

Processes
Processes take one or more datasets as input, and parameters, and produce one or more datasets as output. This ties the processes together with the data delivery services on both ends.
I suggested so far 3 kinds of processes which can all run on a server where the data lives:

  1. The complex process built as a container or executable, as typical of WPS
  2. Process description languages such as WCPS
  3. Pre-defined named processes such as 'vectorization', 'buffering', 'rasterization' or 'rendering of a styled map'

All of these kinds of processes could share aspects such as taking in an OGC API dataset as input and their output being usable as an OGC API dataset, for direct access and/or asynchronous delivery, and support multiple data partitioning/access mechanisms, estimates/billing elements, and so on.

Server-side rendering
Highlighting here that the rendering of a styled maps based on multiple source data sets and a style as a parameter, outputting a styled 'rendered map' imagery dataset as a result, fits perfectly well the description of a process (3).

Data partitioning
For all of data access / exchange mechanisms, e.g. to retrieve coverages/vector features directly, for a process to access its input, or to retrieve the output from a process, or throughout a daisy chain of processes, there are a variety of ways in which data can be partitioned for efficient access. The most efficient way most likely depends on the overall workflow and the implementations on both ends. I am suggesting most of the OGC API should be agnostic of this and support many such ways, so that both ends of such connection or a workflow manager can negotiate the best approach. Examples of different ways to partition/access data include:

  • Bounding boxes
  • Tiles
  • n-dimensional sub-setting
  • DGGS cells
  • Single point value

This is the overall diagram of these ideas that I presented a couple weeks back at the OWS Common telecon and got the chance to explain in more details in person to some of you:
OGCAPI

You might notice that this is color-coded based on where these API blocks originated from in classic services:

Orange: Coverages (WCS)
Red: Server-side rendering (WMS)
Blue: Processes (WPS)
Green: Vector features (WFS)
Very light blue: Catalogs (CSW)
Violet: Tiles (WMTS)

@cmheazel
Copy link
Contributor

A version of this proposal has been added to the Best Practices.

@dblodgett-usgs
Copy link

Too bad this proposal didn't get much attention till now.

I wonder if this can't be satisfied with a building block approach where the abstract dataset description is contained in an API implementing the OGC-API Records spec that points to APIs available for the data held in each record.

Will need to ponder some more, but it seems like, from a service developer's point of view, forcing everything (dataset and service hierarchy under 1 end-point introduces some tough requirements. It could lead to monolithic or highly-integrated systems where we actually want the ability to flexibly break things apart into many purpose-build implementations.

The idea that a single OGC-API conformant endpoint is only ever about 1 dataset also completely solves this problem and allows people to solve it one level higher. Thinking in terms of a server developer again, I would want that (or have to craft it) flexibility.

@joanma747
Copy link
Contributor

OGC-API Records spec that points to APIs available for the data held in each record.
In principle this is nice. In practice this is not going to work.

I'm saying that because is actually what we have today: CSW catalogues full of outdated 19115 records each one pointing to dataset distributions and dataset services that return a 404.

Separating the dataset description from the services and distributions is the main practical problem of the SDIs. You may say it is only an implementation problem but it is very real one.

@jerstlouis
Copy link
Member Author

Too bad this proposal didn't get much attention till now.

And it is not for lack of trying! ;)

from a service developer's point of view, forcing everything (dataset and service hierarchy under 1 end-point introduces some tough requirements. It could lead to monolithic or highly-integrated systems where we actually want the ability to flexibly break things apart into many purpose-build implementations.

I don't see how this is the case. Nothing is forcing everything under 1 end-point.

You could still write services distributing only one dataset, and then implement an OGC API - Records linking to this (if that is appropriate).
Or you could write another service implementing /collections with hierarchies and search, which link to your individual micro-services for the representations of those collections.

@dblodgett-usgs
Copy link

@joanma747 --

I would argue that your point is a bit of a non sequitur. "What we have today" is not bad because of its architecture. It's bad because the development community didn't adopt XML (editorial).

The alternative architecture is monolithic, inflexible, complex, and potentially fragile.

Look, let's not let the pendulum swing too far in either direction. We all know there are trade offs here. If we want to support service modularity at the specification AND deployment levels, which I think we do, then we need to keep complexity down.

The DXWG has been hard at work engineering solutions to this problem. #111 (comment) is a really good point that we should not overlook.

@jerstlouis --

It does force everything under one though -- by allowing it, you are telling people implementing software to expect it and support it. By disallowing it, you keep that complexity where it belongs, in a metadata layer where the semantics and complexity can be handled on their face. By disallowing it, you allow implementers to make huge simplifying assumptions and :shipit: sooner.

@jerstlouis
Copy link
Member Author

jerstlouis commented Mar 16, 2020

@dblodgett-usgs --

I am assuming we are talking about a Collection of Collections, and a Collections-level Search capability.

From a services point of view, they can still provide micro-services. They have an option to implement the Collection of Collections and/or Collections-level Search conformance classes in a separate catalog-type service linking to the micro-services representation.

From a client point of view, it should still be optional to support either of these.
A flat list of collections would otherwise be returned.

The benefit I see is for users to simply connect to one simple OGC API service, and their GIS client automatically offering them powerful search and discovery capabilities for datasets organized on that one particular service (which itself could be a catalog of multiple services). And the ability to use that data in many interoperable ways, experimenting with and running processes combining these various datasets locally available on that service (and potentially on federated, or other open data services as well).

The same could apply to a web front-end at the landing page of that OGC API.

@rob-metalinkage
Copy link

This takes me back to 1999 - some of you will know what I'm talking about :-)

Agreeing with @joanma747 that "out of band" metadata is a problem with the "socio-technical architecture" - even if not the technical architecture. The big problem is that the people responsible for data, services, data metadata and services metadata implementation are unlikely to be the same, they involve different skills and happen at different times.

Making services self-describing - and responsible for describing the data too - means that cataloguing is not an out-of-band social problem but a solvable technical one. Making data more self-describing helps that process, but in the short term its likely to be a retrofitting process - where services describe the data better than it has been (from a FAIR perspective).

Incrementatlism (allowing evolution of better metadata) can be supported by the API common provided we dont try to overspecify that metadata too early - but focus on the mechanism by which it is attached, and ability to identify the particular flavour of metadata available, and allow multiple flavours to co-exist as they evolve, or different flavours provided for different uses. Thats a technical requirement for API-common right there!

The original proposal here, whilst good, does mix both the mechanism and the data model into one hard to digest chunk. Arguing about the metadata model might mean throwing away the concept of discoverability of the metadata. IMHO it needs to be split into two parts, with the data model having a separate specification lifecycle - i.e. push it to a profile of OGC-API for FAIR (aka SDI) (i.e. ogc-api does not need to support FAIR - it can be used in ad-hoc non reusable forms in private service architectures - but we need the FAIR version too...)

@cmheazel cmheazel added the Collections Applicable to Collections (consider to use Part 2 instead) label May 11, 2020
@dblodgett-usgs
Copy link

Based on where we've landed in #140, I think we can close this issue as agreed.

@jerstlouis
Copy link
Member Author

The processing aspect of this approach is being investigated in the Modular OGC API Workflows (MOAW) project, and detailed in opengeospatial/ogcapi-processes#47 (comment) .

@jerstlouis jerstlouis changed the title How API Elements could all tie in together around a flexible Dataset concept How API Elements could all tie in together around a flexible geospatial data concept Jul 28, 2020
@jerstlouis
Copy link
Member Author

jerstlouis commented Jul 28, 2020

At this point, I believe the idea of a dataset being either a collection, or a collection of collections has been rejected, and the distinction between a dataset (DCAT definition / data being published by a single entity) and a collection (data layer) made in OGC API - Features is kept.

Apart from this, most of the rest of the modular approach described in this issue is still valid.

As with Features, OGC API - Common - Part 2: Geospatial Data would define a dataset has having one or more collections of geospatial data.
Separate types of API landing pages are needed as a result, both offering APIs conforming to OGC API - Common - Part 1: Core (and potentially other OGC API standards):

  • A landing page not associated with any particular dataset, e.g. a landing page for a service offering multiple datasets, and potentially processing capabilities on them, or only processing.{serviceAPI} could be used to identify the root of such a service
  • A landing page specifically associated with a dataset, whose root could be identified as {datasetAPI} and which may be or not part of a bigger {serviceAPI}. e.g. {serviceAPI}/datasets/{datasetId} could be a {datasetAPI} landing page. As demonstrated at https://eratosthenes.pvretano.com/Projects/tb16/ogcapi/datastores.html

@jerstlouis jerstlouis changed the title How API Elements could all tie in together around a flexible geospatial data concept How API Elements could all tie in together around a geospatial data concept Jul 28, 2020
@cmheazel cmheazel added Guide and removed Collections Applicable to Collections (consider to use Part 2 instead) labels Aug 17, 2020
@cmheazel
Copy link
Contributor

The content provided by @jerstlouis has been included in the Users Guide along with a table of resources and paths (URIs). This content can be updated as we mature the underlying architecture for OGC APIs.
This issue will be kept open for now to track the discussion and lessons learned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general approach Guide Resources of Collections type Issues related to the /collections path
Projects
None yet
Development

No branches or pull requests

5 participants