-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How API Elements could all tie in together around a geospatial data concept #17
Comments
A version of this proposal has been added to the Best Practices. |
Too bad this proposal didn't get much attention till now. I wonder if this can't be satisfied with a building block approach where the abstract dataset description is contained in an API implementing the OGC-API Records spec that points to APIs available for the data held in each record. Will need to ponder some more, but it seems like, from a service developer's point of view, forcing everything (dataset and service hierarchy under 1 end-point introduces some tough requirements. It could lead to monolithic or highly-integrated systems where we actually want the ability to flexibly break things apart into many purpose-build implementations. The idea that a single OGC-API conformant endpoint is only ever about 1 dataset also completely solves this problem and allows people to solve it one level higher. Thinking in terms of a server developer again, I would want that (or have to craft it) flexibility. |
I'm saying that because is actually what we have today: CSW catalogues full of outdated 19115 records each one pointing to dataset distributions and dataset services that return a 404. Separating the dataset description from the services and distributions is the main practical problem of the SDIs. You may say it is only an implementation problem but it is very real one. |
And it is not for lack of trying! ;)
I don't see how this is the case. Nothing is forcing everything under 1 end-point. You could still write services distributing only one dataset, and then implement an OGC API - Records linking to this (if that is appropriate). |
@joanma747 -- I would argue that your point is a bit of a non sequitur. "What we have today" is not bad because of its architecture. It's bad because the development community didn't adopt XML (editorial). The alternative architecture is monolithic, inflexible, complex, and potentially fragile. Look, let's not let the pendulum swing too far in either direction. We all know there are trade offs here. If we want to support service modularity at the specification AND deployment levels, which I think we do, then we need to keep complexity down. The DXWG has been hard at work engineering solutions to this problem. #111 (comment) is a really good point that we should not overlook. @jerstlouis -- It does force everything under one though -- by allowing it, you are telling people implementing software to expect it and support it. By disallowing it, you keep that complexity where it belongs, in a metadata layer where the semantics and complexity can be handled on their face. By disallowing it, you allow implementers to make huge simplifying assumptions and |
I am assuming we are talking about a Collection of Collections, and a Collections-level Search capability. From a services point of view, they can still provide micro-services. They have an option to implement the Collection of Collections and/or Collections-level Search conformance classes in a separate catalog-type service linking to the micro-services representation. From a client point of view, it should still be optional to support either of these. The benefit I see is for users to simply connect to one simple OGC API service, and their GIS client automatically offering them powerful search and discovery capabilities for datasets organized on that one particular service (which itself could be a catalog of multiple services). And the ability to use that data in many interoperable ways, experimenting with and running processes combining these various datasets locally available on that service (and potentially on federated, or other open data services as well). The same could apply to a web front-end at the landing page of that OGC API. |
This takes me back to 1999 - some of you will know what I'm talking about :-) Agreeing with @joanma747 that "out of band" metadata is a problem with the "socio-technical architecture" - even if not the technical architecture. The big problem is that the people responsible for data, services, data metadata and services metadata implementation are unlikely to be the same, they involve different skills and happen at different times. Making services self-describing - and responsible for describing the data too - means that cataloguing is not an out-of-band social problem but a solvable technical one. Making data more self-describing helps that process, but in the short term its likely to be a retrofitting process - where services describe the data better than it has been (from a FAIR perspective). Incrementatlism (allowing evolution of better metadata) can be supported by the API common provided we dont try to overspecify that metadata too early - but focus on the mechanism by which it is attached, and ability to identify the particular flavour of metadata available, and allow multiple flavours to co-exist as they evolve, or different flavours provided for different uses. Thats a technical requirement for API-common right there! The original proposal here, whilst good, does mix both the mechanism and the data model into one hard to digest chunk. Arguing about the metadata model might mean throwing away the concept of discoverability of the metadata. IMHO it needs to be split into two parts, with the data model having a separate specification lifecycle - i.e. push it to a profile of OGC-API for FAIR (aka SDI) (i.e. ogc-api does not need to support FAIR - it can be used in ad-hoc non reusable forms in private service architectures - but we need the FAIR version too...) |
Based on where we've landed in #140, I think we can close this issue as agreed. |
The processing aspect of this approach is being investigated in the Modular OGC API Workflows (MOAW) project, and detailed in opengeospatial/ogcapi-processes#47 (comment) . |
At this point, I believe the idea of a dataset being either a collection, or a collection of collections has been rejected, and the distinction between a dataset (DCAT definition / data being published by a single entity) and a collection (data layer) made in OGC API - Features is kept. Apart from this, most of the rest of the modular approach described in this issue is still valid. As with Features, OGC API - Common - Part 2: Geospatial Data would define a dataset has having one or more collections of geospatial data.
|
The content provided by @jerstlouis has been included in the Users Guide along with a table of resources and paths (URIs). This content can be updated as we mature the underlying architecture for OGC APIs. |
Datasets
I am proposing that what ties in all the different elements of the OGC API is the abstract concept of a geospatial dataset (which I see as blending into the 'collection' concept, as a dataset being either a collection, or a collection of collections). Such a dataset could have the following characteristics:
Processes
Processes take one or more datasets as input, and parameters, and produce one or more datasets as output. This ties the processes together with the data delivery services on both ends.
I suggested so far 3 kinds of processes which can all run on a server where the data lives:
All of these kinds of processes could share aspects such as taking in an OGC API dataset as input and their output being usable as an OGC API dataset, for direct access and/or asynchronous delivery, and support multiple data partitioning/access mechanisms, estimates/billing elements, and so on.
Server-side rendering
Highlighting here that the rendering of a styled maps based on multiple source data sets and a style as a parameter, outputting a styled 'rendered map' imagery dataset as a result, fits perfectly well the description of a process (3).
Data partitioning
For all of data access / exchange mechanisms, e.g. to retrieve coverages/vector features directly, for a process to access its input, or to retrieve the output from a process, or throughout a daisy chain of processes, there are a variety of ways in which data can be partitioned for efficient access. The most efficient way most likely depends on the overall workflow and the implementations on both ends. I am suggesting most of the OGC API should be agnostic of this and support many such ways, so that both ends of such connection or a workflow manager can negotiate the best approach. Examples of different ways to partition/access data include:
This is the overall diagram of these ideas that I presented a couple weeks back at the OWS Common telecon and got the chance to explain in more details in person to some of you:

You might notice that this is color-coded based on where these API blocks originated from in classic services:
Orange: Coverages (WCS)
Red: Server-side rendering (WMS)
Blue: Processes (WPS)
Green: Vector features (WFS)
Very light blue: Catalogs (CSW)
Violet: Tiles (WMTS)
The text was updated successfully, but these errors were encountered: