Discovery spec in first draft

TREEcg · Oct 1, 2024 · 830aff5 · 830aff5
1 parent 534c750
commit 830aff5
Showing 1 changed file with 73 additions and 48 deletions.
diff --git a/03-discovery-specification.bs b/03-discovery-specification.bs
@@ -11,89 +11,114 @@ Mailing List: public-treecg@w3.org
 Mailing List Archives: https://lists.w3.org/Archives/Public/public-treecg/
 Editor: Pieter Colpaert, https://pietercolpaert.be
 Abstract:
-    This specification defines how a client can find specific search trees of interest, as well as list the context information.
+    This specification defines how a client selects a specific dataset and search tree, as well as extracts relevant context information.
 </pre>
 
-# The overview # {#overview}
+# Definitions # {#overview}
 
-A <code>tree:Collection</code> is a subclass of <code>dcat:Dataset</code> ([[!vocab-dcat-3]]).
+A `tree:Collection` is a subclass of `dcat:Dataset` ([[!vocab-dcat-3]]).
 The specialization being that this particular dataset is a collection of _members_.
 
-A <code>tree:SearchTree</code> is a subClassOf <code>dcat:Distribution</code>.
+A `tree:SearchTree` is a subClassOf `dcat:Distribution`.
 The specialization being that it uses the main TREE specification to publish a search tree.
 
-A node from which all other nodes can be found is a `tree:RootNode`, which MAY be explicitely typed as such.
+A node from which all other nodes can be found is a `tree:RootNode`.
 
 Note: The `tree:SearchTree` and the `tree:RootNode` MAY be identified by the same IRI when no disambiguation is needed.
 
 A TREE client MUST be provided with a URL to start from, which we call the _entrypoint_.
 
 # Initializing a client with a url # {#starting-from}
 
-The goal of the client is to understand what `tree:Collection` it is using, and to find a `tree:RootNode` or search form to start the traversal phase from.
+The goal of the client is to understand what `tree:Collection` it is using, and to find a `tree:RootNode` to start the traversal phase from.
+This discovery specification extends the initialization step in the TREE specification for the cases in which multiple options are possible.
 
-```
-IN: E: a URL of the entrypoint
-OUT: N: tree:RootNode IRI and/or S: search form
- ```
+The client MUST dereference the URL, which will result in a set of quads. The client now MUST first perform the init step from the main specification.
+If that did not return any result, then the client MUST check whether the URL before redirects (`E`)  has been used in one of the following discovery patterns described in the subsections:
+ 1. `E` is a `tree:Collection`: then the client needs to [select the right search tree](#tree-search-trees)
+ 2. `E` is a `dcat:Dataset`: then the client needs to [select the right distribution or dataservice from a catalog](#dcat-dataset)
+ 3. `E` is a `ldes:EventStream`: then the client MAY take into account [LDES specific properties](#ldes)
+ 4. `E` is a `dcat:Distribution`: then the client needs to [process it accordingly](#dcat-distribution)
+ 5  `E` is a `dcat:DataService`: then the client needs to [process it accordingly](#dcat-dataservice)
+ 6. `E` is a catalog or is not explicitly mentioned: then it needs to select a dataset based on [shape information](#tree-collection-shapes) and [DCAT Catalog information](#dcat-catalog)
 
-The client MUST dereference the URL, which will result in a set of quads.
-When the URL given to the TREE client, after all redirects, is used in a triple <code>ex:C tree:view <> .</code>, a client MUST assume the URL after redirects (`E'`) is an identifier of the intended `tree:RootNode` of the collection `ex:C`.
-The client MUST check for this `tree:view` property and return the result of the discovery algorithm with `<> → N`.
+## Selecting a collection via shapes ## {#tree-collection-shapes}
 
-If there is no such triple, then the client MUST check whether the URL before redirects (`E`)  has been used in one of the following patterns:
- * `E tree:view ?N.` where there’s exactly one `?N`, then the algorithm MUST return `?N → N`.
- * `E tree:rootNode ?N ; tree:search ?S .` then the algorithm MUST return `?N → N` and `?S → S`.
- * `?DS dcat:servesDataset E ; dcat:endpointURL ?U` or `E dcat:endpointURL ?U`, then the algorithm MUST repeat the algorithm with `?U` as the entrypoint.
+When multiple collections are found by a client, it can choose to prune the collections based on the `tree:shape` property.
+The `tree:shape` property will refer to a first `sh:NodeShape`.
+The collection MAY be pruned in case there is no overlap in properties the client needs.
 
-Note: When data about the dataset, data service or search tree is found, it is a good idea to also pass this on to the client.
+Issue: Will we document the precise algorithm to use? Should we extend shapes with cardinality approximations as well?
 
-## tree:Collection ## {#collection}
+## Selecting a collection via a catalog ## {#dcat-catalog}
 
-In order to prioritize a specific view link, the relations and search forms in the entry nodes can be studied for their relation types, path or remaining items.
-The class <code>tree:ViewDescription</code> indicates a specific TREE structure on a <code>tree:Collection</code>.
-Through the property <code>tree:viewDescription</code> a <code>tree:Node</code> can link to an entity that describes the view, and can be reused in data portals as the <code>dcat:DataService</code>.
+A DCAT Catalog is an overview of datasets, data services and distributions.
+As TREE clients first need to select a dataset, and then a search tree to use, it aligns wll with how DCAT-AP works.
+DCAT discovery extends upon the previous section in which a collection or dataset can be selected based on the `tree:shape` property.
 
-<div class="example">
-    ```turtle
-    ## What can be found in a tree:Node
-    ex:N1 a tree:Node ;
-      tree:viewDescription ex:View1 .
-
-    ex:C1 a tree:Collection ;
-      tree:view ex:N1 .
-
-    ## What can be found on a data portal
-    ex:C1 a dcat:Dataset .
-    ex:View1 a tree:ViewDescription, dcat:DataService ;
-      dcat:endpointURL ex:N1 ; # The entry point that can be advertised in a data portal
-      dcat:servesDataset ex:C1 .
-    ```
-</div>
+For now, we will assume the DCAT information is available in subject pages.
+
+Issue: Do we need more text on how to handle different types of DCAT interfaces?
+
+The dataset descriptions can be used for filtering the datasets available in a catalog to a list of datasets that can be useful for the client.
+Such properties may include the spatial extent, the time extent, or how it is possibly a part of another `dcat:Dataset`.
+
+Issue: How precise do we need to be in this specification?
+
+When the `dcat:Dataset` is a `tree:Collection`, the DCAT catalog is going to contain a `dct:type` property with `https://w3id.org/tree#Collection` or `https://w3id.org/ldes#EventStream` as the object.
+
+## Choosing from multiple SearchTrees with TREE ## {#tree-search-trees}
+
+Issue: This is yet to be done
+
+## Selecting a search tree via a DCAT dataset ## {#dcat-dataset}
+
+The are two ways in which you can find a search tree from a dataset: via the distributions and via the data services. Both need to be tested.
+Selecting a distribution or data service when multiple are available needs to be done based on [the search tree description](tree-search-trees).
+If nothing is available, all need to be tested by processing them as exemplifie din the next subsections.
 
-When there is no <code>tree:viewDescription</code> property in this page, a client either already discovered the description of this view in an earlier <code>tree:Node</code>, either the current <code>tree:Node</code> is implicitly the ViewDescription. Therefore, when the property path <code>tree:view → tree:viewDescription</code> does not yield a result, the view properties MUST be extracted from the object of the <code>tree:view</code> triple.
-A <code>tree:Node</code> can also be double typed as the <code>tree:ViewDescription</code>. A client must thus check for ViewDescriptions on both the current node without the <code>tree:viewDescription</code> qualification, as on the current node with the <code>tree:viewDescription</code> link.
+### Selecting a search tree via DCAT Distribution ### {#dcat-distribution}
 
-## dcat:Catalog ## {#collection}
+`E dcat:distribution ?D . ?D dcat:downloadURL  ?N .` then ?N is a rootnode of E.
 
-When multiple collections are found by a client, it can choose to prune the collections based on the <code>tree:shape</code> property.
-Therefore a data publisher SHOULD annotate a <code>tree:Collection</code> instance with a SHACL shape.
-The <code>tree:shape</code> points to a SHACL description of the shape (<code>sh:NodeShape</code>).
+Issue: This is yet to be done
 
-Note: the shape can be a blank node, or a named node on which you should follow your nose when it is defined at a different HTTP URL.
+### Selecting a search tree from a DCAT data service ### {#dcat-dataservice}
 
-# Context data # {#context}
+ * `?DS dcat:servesDataset E ; dcat:endpointURL ?U` or `E dcat:endpointURL ?U`, then the algorithm MUST repeat the algorithm with `?U` as the entrypoint.
+
+Issue: This is yet to be done
+
+## Linked Data Event Streams ## {#ldes}
+
+In case the client is not made for query answering, but only for setting up a replication and synchronization system, then there is a special type that can be used to indicate the search tree is made for this purpose: the `ldes:EventSource`.
+Clients that want to prioritize taking a _full_ copy MAY give full priority to this server hint.
+
+<div class="example">
+```turtle
+E a ldes:EventSource ;
+  tree:rootNode|dcat:downloadURL </node1> .
+```
+</div>
+
+# Extracting content information # {#context}
 
-Context information is important to understand who the creator of a certain dataset is, when it was last changed, what other datasets it was derived from, etc.
+Issue: This is yet to be done
 
-TODO
+Context information enabled a cliento understand who the creator of a certain dataset is, when it was last changed, what other datasets it was derived from, etc.
 
 ## DCAT and dcterms ## {#context-dcat}
 
+Issue: This is yet to be done
+
 ## Provenance ## {#context-prov}
 
+Issue: This is yet to be done
+
 ## Linked Data Event Streams ## {#context-ldes}
 
+Issue: This is yet to be done
+
 LDES (https://w3id.org/ldes/specification) is a way to evolve search trees in a consistent way. It defines every member as immutable, and a collection as append-only.
 Therefore, one can make sure to only process each member once.
 Extra terms are added, such as the concept of an EventStream, retention policies and a timestampPath.