Currently two identifiers are used - the PID URI and the Base URI. In addition to the requests from the customer, the Use Cases in which the identifiers are required and relevant are also included. Furthermore, the technical implementation is described and possible errors are pointed out.
The two identifiers are currently used in the two systems COLID and Data Marketplace. Since they are identifiers, both are individual and may only be identical under consideration of certain rules.
When creating an entry in COLID, the identifiers must be filled in and cannot be changed afterwards.
In particular, the PID URI has a special position in the COLID. The PID URI is used for all CRUD operations. Thus the PID URI is a central identifier and is used as ID to create, store and delete an entry.
The BaseUri is used especially for the resolution of the url. This makes it a central component of the proxy. The BaseUri can point to a created endpoint. If a TargetUrl is stored, this Url is resolved by the proxy.
Identifiers are considered as a separate property of a resource, so that the Ui and the backend must react separately to the identifiers during all CRUD operations.
The identifiers must be filled in to create a resource. Either a template can be entered manually by the user, or a corresponding PID URI Template can be used. This is not a template especially for the PID URI, but the templates are used for all identifiers. Identifiers are regarded as a further entity of a resource, so that the value of a property of the basic structure of an entity is built up. An entity always consists of a id and the corresponding properties. The properties are a list full of KeyValuePairs, where the value can be an entity.
In the following an identifier is shown which has the key http://pid.bayer.com/kos/19014/hasPID . This identifier was generated by a template. This can be seen in the properties of the identifier. The value of its property is the identifier of the PID URI Templates. Where to find them and how to create such a template is described in chapter xxx. Whether it is the response or the request, the data model looks the same.
"properties":[
{
"key":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"value":"http://pid.bayer.com/kos/19014/Ontology"
},
{
"key":"http://pid.bayer.com/kos/19014/hasPID",
"value":[
{
"id":"https://dev-pid.bayer.com/a01bf7a8-ac7d-4ecb-a855-f22577ee6caf/",
"properties":[
{
"key":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"value":"https://pid.bayer.com/kos/19050/Identifier"
},
{
"key":"https://pid.bayer.com/kos/19050/hasUriTemplate",
"value":"https://pid.bayer.com/kos/19050#13cd004a-a410-4af5-a8fc-eecf9436b58b"
}
]
}
]
}
]
If a new resource is created, the id is empty or zero, so that the backend generates an identifier for the attached template. A reponse of the following request would be a filled id.
"properties":[
{
"key":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"value":"http://pid.bayer.com/kos/19014/Ontology"
},
{
"key":"http://pid.bayer.com/kos/19014/hasPID",
"value":[
{
"id": null,
"properties":[
{
"key":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"value":"https://pid.bayer.com/kos/19050/Identifier"
},
{
"key":"https://pid.bayer.com/kos/19050/hasUriTemplate",
"value":"https://pid.bayer.com/kos/19050#13cd004a-a410-4af5-a8fc-eecf9436b58b"
}
]
}
]
}
]
If no template is used and an individual identifier is entered, the request looks as follows.
"properties":[
{
"key":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"value":"http://pid.bayer.com/kos/19014/Ontology"
},
{
"key":"http://pid.bayer.com/kos/19014/hasPID",
"value":[
{
"id": "https://dev-pid.bayer.com/custom-identifier-1",
"properties":[
{
"key":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"value":"https://pid.bayer.com/kos/19050/Identifier"
}
]
}
]
}
]
In the user interface, a component is essentially responsible for editing and creating user interfaces. The FormItemInputPidUriComponent is used for all Idetnifiers. As already described in chapter xxx, the component is part of the generic form of the frontend and is currently determined by the key of the properties.
One of the two identifiers, the PID URI, has a separate position in the COLID application so that the PID URI is used as the identifier. Therefore, this identifier is used in the following classes:
- ResourceController
- (I)ResourceService
- (I)ResourcePreprocessService
- (I)ResourceLinkingService
- (I)ResourceRepository
A very special topic of the identifiers is the check whether the identifiers may be duplicated between resources. In the following, the implemented rules are briefly defined. In all other cases, the identifiers must not be identical, so that editing and creating a resource is prevented.
- Identifier must have a valid absolute URI
- Identifiers must correspond to the http or https scheme
- Each instance has a unique host for the identifiers:
- Dev: dev-pid.bayer.com
- QA: qa-pid.bayer.com
- Production: pid.bayer.com
- The Identifier may not only consist of the above mentioned host
- The Host may not appear multiple times in the identifier
PID URIs may only be identical under the following conditions.
- Two resources connected by the edge "https://pid.bayer.com/kos/19050/hasDraft".
- The PID URIs of a DistributionEndpoint may be identical to the PID URI of another DistriubtionEndpoint if the resources to which the endpoints are attached are connected by the edge https://pid.bayer.com/kos/19050/hasDraft and the endpoints are of the same type.
BaseUris may only be identical under the following considerations.
- Two resources connected by the edge "https://pid.bayer.com/kos/19050/hasDraft".
- A resource can have a new version, so that a version chain can develop between resources. Two adjacent resources may have the same BaseUri.
TargetUris may normally be identical, since they are not identifiers but are treated similarly. If a TargetUri is identical, a warning is given, but this does not restrict the action with resources.
The component FormItemInputPidUriComponent has been implemented in the user interface. This component takes care of the complete handle link for propeties of the type "Identifier". As parameter a preselected PID URI Template and a list full of PID URI Templates can be entered into the component. The templates are available to the user and can be used to generate an identifier. If a template has been selected for the resource, this template is stored in the ResourceFormComponent and pre-selected for all other identifiers of the same type, so that the user can use the UI more easily.
How the templates are structured can be found in chapter xxx.
If the identifier field is changed by selecting a template or by the user manually entering an identifier, an event is triggered from the FormItemInputPidUriComponent with a delay of 300ms. This is passed on to the ResourceFormComponent by all superimposed components. This is the handleFormEvent, which is described in more detail elsewhere (chapter xxx). If the event is a PID URI or Base URI, the DuplicateCheck is triggered. This is a check of the current entry. For this purpose, the request is structured as in a CRUD operation and sent to the backend. There it is checked whether the rules for the individual identifiers have been adhered to.
After all, the identifier is treated as a normal property of a resource and is treated according to the normal data types. For further information please refer to chapter xxx.
A central function is implemented in the backend that handles the identifiers. First it is checked whether the identifier to be stored has a subject, rather an individual identifier, or an already generated identifier, and whether a template exists. If the identifier only has other properties, all properties are deleted. This leads to a critical error in the SHACL validation, which leads to the fact that the resource cannot be stored.
This is followed by the generation of the identifier. Since the identifier cannot be changed again after first saving, the identifier is only created during Create.
Once a template has been generated, the newly generated identifier or the identifier previously generated by the user is checked again for its basic rules. The rules are shown in chapter 3.5 Special Tops and have been implemented in the following code. The DuplicateCheck was outsourced in a separate function.
The generation of the identifier takes place in the function GenerateIdentifierTemplate. This function checks the entered identifier for the use of a template. If a template is in use and no idnivual or generated identifier has already been entered, a new identifier is generated from the existing template. Please note that the generation of an identifier from a template is described in chapter xxx.
One of the most important functions in COLID is to check a duplicate of the identifiers. It is a core function of the application, so if this function fails, or if it does not work correctly, a significant error can occur in the application. The Check is currently used for all identifiers and TargetUri. The corresponding rules of the Duplicate Check are described in chapter Special Topics.
The DuplicateCheck starts in the CheckDuplicates function in the ResourcePreProcessService class. First, two validations can be distinguished. Validating the identifiers within a resource so that an identifier may only occur once in the sum within a resource and validating between resources.
The second part is performed by the CheckDuplicatesInRepository function. The first part is edited in the function itself so that all identifiers inklsuvie its property key are stored in a list. The list contains a tuple of property key, id of the entry of an identifier and the identifier itself. If duplicate entries are found, a critical error message is generated for identifiers and a non-critical error message for TargetUri.
Since the testing of the BaseUri is also important beyond the versions, all versions of the resource are obtained first. A list of all identifiers that have the same value as the current identifier is then retrieved from the repository. A list is given consisting of the id of the entry, the corresponding Darft entry and the type of the entry. Based on this, the rules mentioned in chapter Special Topics are checked. Since distribution endpoints may also have identifiers, the appropriate check for each endpoint is performed at the end. The result of the check is a list of ValidationResults.