Skip to content

Latest commit

 

History

History
2801 lines (1536 loc) · 123 KB

dataset.md

File metadata and controls

2801 lines (1536 loc) · 123 KB

HDR UK Dataset Schema Schema

https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json

HDR UK Dataset Metadata JSONSchema

Abstract Extensible Status Identifiable Custom Properties Additional Properties Access Restrictions Defined In
Can be instantiated Yes Unknown status No Forbidden Forbidden none dataset.schema.json

HDR UK Dataset Schema Type

object (HDR UK Dataset Schema)

HDR UK Dataset Schema Properties

Property Type Required Nullable Defined by
identifier Merged Required cannot be null HDR UK Dataset Schema
version string Required cannot be null HDR UK Dataset Schema
revisions array Required cannot be null HDR UK Dataset Schema
issued string Required cannot be null HDR UK Dataset Schema
modified string Required cannot be null HDR UK Dataset Schema
summary Not specified Required cannot be null HDR UK Dataset Schema
documentation Not specified Optional cannot be null HDR UK Dataset Schema
coverage Not specified Optional cannot be null HDR UK Dataset Schema
provenance Not specified Optional cannot be null HDR UK Dataset Schema
accessibility Not specified Required cannot be null HDR UK Dataset Schema
enrichmentAndLinkage Not specified Optional cannot be null HDR UK Dataset Schema
observations array Required cannot be null HDR UK Dataset Schema
structuralMetadata array Optional cannot be null HDR UK Dataset Schema

identifier

System dataset identifier

http://purl.org/dc/terms/identifier

identifier

identifier Type

merged type (Dataset identifier)

any of

identifier Examples

[
  "226fb3f1-4471-400a-8c39-2b66d46a39b6",
  "https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6"
]

version

Dataset metadata version

version

version Type

string (Dataset Version)

version Constraints

pattern: the string must match the following regular expression:

^([0-9]+)\.([0-9]+)\.([0-9]+)$

try pattern

version Examples

"1.1.0"

revisions

Revisions of Dataset metadata

revisions

revisions Type

an array of merged types (Details)

issued

Dataset Metadata Creation Date

dcat:issued

issued

issued Type

string (Creation Date)

issued Constraints

date time: the string must be a date time string, according to RFC 3339, section 5.6

modified

Dataset Metadata Creation Date

dcat:modified

modified

modified Type

string (Modification Date)

modified Constraints

date time: the string must be a date time string, according to RFC 3339, section 5.6

summary

Summary metadata must be completed by Data Custodians onboarding metadata into the Innovation Gateway MVP.

summary

summary Type

unknown (Summary)

documentation

Documentation can include a rich text description of the dataset or links to media such as documents, images, presentations, videos or links to data dictionaries, profiles or dashboards. Organisations are required to confirm that they have permission to distribute any additional media.

documentation

documentation Type

unknown (Documentation)

coverage

This information includes attributes for geographical and temporal coverage, cohort details etc. to enable a deeper understanding of the dataset content so that researchers can make decisions about the relevance of the underlying data.

coverage

coverage Type

unknown (Coverage)

provenance

Provenance information allows researchers to understand data within the context of its origins and can be an indicator of quality, authenticity and timeliness.

provenance

provenance Type

unknown (Provenance)

accessibility

Accessibility information allows researchers to understand access, usage, limitations, formats, standards and linkage or interoperability with toolsets.

accessibility

accessibility Type

unknown (Accessibility)

enrichmentAndLinkage

This section includes information about related datasets that may have previously been linked, as well as indicating if there is the opportunity to link to other datasets in the future. If a dataset has been enriched and/or derivations, scores and existing tools are available this section allows providers to indicate this to researchers.

enrichmentAndLinkage

enrichmentAndLinkage Type

unknown (Enrichment and Linkage)

observations

Multiple observations about the dataset may be provided and users are expected to provide at least one observation (1..*). We will be supporting the schema.org observation model (https://schema.org/Observation) with default values. Users will be encouraged to provide their own statistical populations as the project progresses. Example: <b> Statistical Population 1 </b> type: StatisticalPopulation populationType: Persons numConstraints: 0 <b> Statistical Population 2 </b> type: StatisticalPopulation populationType: Events numConstraints: 0 <b> Statistical Population 3 </b> type: StatisticalPopulation populationType: Findings numConstraints: 0 typeOf: Observation observedNode: <b> Statistical Population 1 </b> measuredProperty: count measuredValue: 32937 observationDate: “2017”

https://schema.org/observation

observations

observations Type

an array of merged types (Details)

structuralMetadata

Descriptions of all tables and data elements that can be included in the dataset

First phase includes only column level metadata, future versions will include value level attributes

structuralMetadata

structuralMetadata Type

an array of merged types (Details)

HDR UK Dataset Schema Definitions

Definitions group revision

Reference this group by using

{"$ref":"#/definitions/revision#/definitions/revision"}
Property Type Required Nullable Defined by
version Not specified Required cannot be null HDR UK Dataset Schema
url Not specified Required cannot be null HDR UK Dataset Schema

version

Semantic Version

version

version Type

unknown

url

URL endpoint to obtain the version

url

url Type

unknown

Definitions group summary

Reference this group by using

{"$ref":"#/definitions/summary#/definitions/summary"}
Property Type Required Nullable Defined by
title Merged Required cannot be null HDR UK Dataset Schema
abstract Merged Required cannot be null HDR UK Dataset Schema
publisher Merged Required cannot be null HDR UK Dataset Schema
contactPoint Merged Required cannot be null HDR UK Dataset Schema
keywords Merged Required cannot be null HDR UK Dataset Schema
alternateIdentifiers Merged Optional cannot be null HDR UK Dataset Schema
doiName Merged Optional cannot be null HDR UK Dataset Schema

title

Title of the dataset limited to 80 characters. It should provide a short description of the dataset and be unique across the gateway. If your title is not unique, please add a prefix with your organisation name or identifier to differentiate it from other datasets within the Gateway. Please avoid acronyms wherever possible. Good titles should summarise the content of the dataset and if relevant, the region the dataset covers.

dct:title. title is reserved word in json schema

title

title Type

merged type (Title)

all of

title Examples

[
  "North West London COVID-19 Patient Level Situation Report"
]

abstract

Provide a clear and brief descriptive signpost for researchers who are searching for data that may be relevant to their research. The abstract should allow the reader to determine the scope of the data collection and accurately summarise its content. The optimal length is one paragraph (limited to 255 characters) and effective abstracts should avoid long sentences and abbreviations where possible

dct:abstract

abstract

abstract Type

merged type (Dataset Abstract)

all of

abstract Examples

"CPRD Aurum contains primary care data contributed by General Practitioner (GP) practices using EMIS Web® including patient registration information and all care events that GPs have chosen to record as part of their usual medical practice."

publisher

This is the organisation responsible for running or supporting the data access request process, as well as publishing and maintaining the metadata. In most this will be the same as the HDR UK Organisation (Hub or Alliance Member). However, in some cases this will be different i.e. Tissue Directory are an HDR UK Gateway organisation but coordinate activities across a number of data publishers i.e. Cambridge Blood and Stem Cell Biobank.

Conforms to spec, but this MAY be an an object of organisation. https://schema.org/publisher

publisher

publisher Type

merged type (Dataset publisher)

all of

contactPoint

Please provide a valid email address that can be used to coordinate data access requests with the publisher. Organisations are expected to provide a dedicated email address associated with the data access request process. Notes- An employee's email address can only be provided on a temporary basis and if one is provided an explicit consent must be obtained for this purpose.

dcat:contactPoint

contactPoint

contactPoint Type

merged type (Contact Point)

all of

contactPoint Default Value

The default value is:

"Defaulted to the contact point of the primary organisation of the user however, can be overridden for specific datasets"

contactPoint Examples

"SAILDatabank@swansea.ac.uk"

keywords

Please provide relevant and specific keywords that can improve the SEO of your dataset as a comma separated list. Notes: Onboarding portal will suggest keywords based on title, abstract and description. We are compiling a standardised list of keywords and synonyms across datasets to make filtering easier for users.

dcat:keyword. May be an array of strings or comma seperated list.

keywords

keywords Type

merged type (Keywords)

any of

alternateIdentifiers

Alternate dataset identifiers or local identifiers

DATA-CITE alternate-identifiers used. Note, will support comma separated list for backwards compatibility with other systems

alternateIdentifiers

alternateIdentifiers Type

merged type (Alternate dataset identifiers)

any of

doiName

All HDR UK registered datasets should either have a Digital Object Identifier (DOI) or be working towards obtaining one. If a DOI is available, please provide the DOI.

Vocabulary: DOI Data Dictionary

doiName

doiName Type

merged type (Digital Object Identifier)

all of

doiName Examples

"10.3399/bjgp17X692645"

Definitions group organisation

Reference this group by using

{"$ref":"#/definitions/organisation#/definitions/organisation"}
Property Type Required Nullable Defined by
identifier Merged Optional cannot be null HDR UK Dataset Schema
name Merged Required cannot be null HDR UK Dataset Schema
logo Merged Optional cannot be null HDR UK Dataset Schema
description Merged Optional cannot be null HDR UK Dataset Schema
contactPoint Merged Required cannot be null HDR UK Dataset Schema
memberOf Merged Optional cannot be null HDR UK Dataset Schema
accessRights Merged Optional cannot be null HDR UK Dataset Schema
deliveryLeadTime Merged Optional cannot be null HDR UK Dataset Schema
accessService Merged Optional cannot be null HDR UK Dataset Schema
accessRequestCost Merged Optional cannot be null HDR UK Dataset Schema
dataUseLimitation Merged Optional cannot be null HDR UK Dataset Schema
dataUseRequirements Merged Optional cannot be null HDR UK Dataset Schema

identifier

Please provide a Grid.ac identifier (see https://www.grid.ac/institutes) for your organisation. If your organisation does not have a Grid.ac identifier please use the “suggest and institute” function here: https://www.grid.ac/institutes#

https://schema.org/identifier

identifier

identifier Type

merged type (Organisation Identifier)

all of

name

Name of the organisation

https://schema.org/name

name

name Type

merged type (Organisation Name)

all of

logo

Please provide a logo associated with the Gateway Organisation using a valid URL. The following formats will be accepted .jpg, .png or .svg.

https://schema.org/logo

logo

logo Type

merged type (Organisation Logo)

all of

description

Please provide a URL that describes the organisation.

https://schema.org/description

description

description Type

merged type (Organisation Description)

all of

contactPoint

Organisation contact point(s)

https://schema.org/contactPoint

contactPoint

contactPoint Type

merged type (Organisation Contact Point)

any of

memberOf

Please indicate if the organisation is an Alliance Member or a Hub.

https://schema.org/memberOf

memberOf

memberOf Type

merged type (Organisation Membership)

all of

accessRights

The URL of a webpage where the data access request process and/or guidance is provided. If there is more than one access process i.e. industry vs academic please provide both.

dct:access_rights

accessRights

accessRights Type

merged type (Organisation Default Access Rights)

any of

deliveryLeadTime

Please provide an indication of the typical processing times based on the types of requests typically received. Note: This value will be used as default access request duration for all datasets submitted by the organisation. However, there will be the opportunity to overwrite this value for each dataset.

https://schema.org/deliveryLeadTime

deliveryLeadTime

deliveryLeadTime Type

merged type (Access Request Duration)

all of

accessService

Please provide a brief description of the data access services that are available including: environment that is currently available to researchers;additional consultancy and services;any indication of costs associated. If no environment is currently available, please indicate the current plans and timelines when and how data will be made available to researchers Note: This value will be used as default access environment for all datasets submitted by the organisation. However, there will be the opportunity to overwrite this value for each dataset.

dcat:accessService

accessService

accessService Type

merged type (Organisation Access Service)

all of

accessService Examples

"https://cnfl.extge.co.uk/display/GERE/Research+Environment+User+Guide"

accessRequestCost

Please provide link(s) to a webpage or a short description detailing the commercial model for processing data access requests for the organisation (if available) Definition: Indication of commercial model or cost (in GBP) for processing each data access request by the data custodian.

No standard identified

accessRequestCost

accessRequestCost Type

merged type (Organisation Access Request Cost)

any of

dataUseLimitation

Please provide an indication of consent permissions for datasets and/or materials, and relates to the purposes for which datasets and/or material might be removed, stored or used. Notes: where there are existing data-sharing arrangements such as the HDR UK HUB data sharing agreement or the NIHR HIC data sharing agreement this should be indicated within access rights. This value will be used as terms for all datasets submitted by the organisation. However, there will be the opportunity to overwrite this value for each dataset.

https://www.ebi.ac.uk/ols/ontologies/duo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDUO_0000001

dataUseLimitation

dataUseLimitation Type

merged type (Data Use Limitation)

any of

dataUseRequirements

Please indicate fit here are any additional conditions set for use if any, multiple requirements may be provided. Please ensure that these restrictions are documented in access rights information.

https://www.ebi.ac.uk/ols/ontologies/duo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDUO_0000001

dataUseRequirements

dataUseRequirements Type

merged type (Data Use Requirements)

any of

Definitions group documentation

Reference this group by using

{"$ref":"#/definitions/documentation#/definitions/documentation"}
Property Type Required Nullable Defined by
description Merged Optional cannot be null HDR UK Dataset Schema
associatedMedia Merged Optional cannot be null HDR UK Dataset Schema
isPartOf Merged Optional cannot be null HDR UK Dataset Schema

description

A free-text description of the record.

dc:description, https://schema.org/description

description

description Type

merged type (Description)

all of

associatedMedia

Please provide any media associated with the Gateway Organisation using a valid URI for the content. This is an opportunity to provide additional context that could be useful for researchers wanting to understand more about the dataset and its relevance to their research question. The following formats will be accepted .jpg, .png or .svg, .pdf, .xslx or .docx. Note: media asset can be hosted by the organisation or uploaded using the onboarding portal.

https://schema.org/associatedMedia

associatedMedia

associatedMedia Type

merged type (Associated Media)

any of

associatedMedia Examples

"PDF Document that describes study protocol"

isPartOf

Please complete only if the dataset is part of a group or family

https://schema.org/isPartOf NOTE: we may make Groups first class citizens so the are navigable

isPartOf

isPartOf Type

merged type (Group)

any of

isPartOf Default Value

The default value is:

"NOT APPLICABLE"

isPartOf Examples

"Hospital Episodes Statistics datasets (A&E, APC, OP, AC MSDS)."

Definitions group coverage

Reference this group by using

{"$ref":"#/definitions/coverage#/definitions/coverage"}
Property Type Required Nullable Defined by
spatial Merged Optional cannot be null HDR UK Dataset Schema
typicalAgeRange Merged Optional cannot be null HDR UK Dataset Schema
physicalSampleAvailability Merged Optional cannot be null HDR UK Dataset Schema
followup Merged Optional cannot be null HDR UK Dataset Schema
pathway Merged Optional cannot be null HDR UK Dataset Schema

spatial

The geographical area covered by the dataset. It is recommended that links are to entries in a well-maintained gazetteer such as https://www.geonames.org/ or https://what3words.com/daring.lion.race.

dct:spatial

spatial

spatial Type

merged type (Geographic Coverage)

any of

spatial Examples

"https://www.geonames.org/2635167/united-kingdom-of-great-britain-and-northern-ireland.html"

typicalAgeRange

Please indicate the age range in whole years of participants in the dataset. Please provide range in the following format '[min age] – [max age]' where both the minimum and maximum are whole numbers (integers).

https://schema.org/typicalAgeRange

typicalAgeRange

typicalAgeRange Type

merged type (Age Range)

all of

physicalSampleAvailability

Availability of physical samples associated with the dataset. If samples are available, please indicate the types of samples that are available. More than one type may be provided. If sample are not yet available, please provide “AVAILABILITY TO BE CONFIRMED”. If samples are not available, then please provide “NOT AVAILABLE”.

No standard identified. Used enumeration from the UK Tissue Directory.

physicalSampleAvailability

physicalSampleAvailability Type

merged type (Physical Sample Availability)

any of

physicalSampleAvailability Examples

"BONE MARROW"

followup

If known, what is the typical time span that a patient appears in the dataset (follow up period)

No standard identified

followup

followup Type

merged type (Followup)

all of

followup Default Value

The default value is:

"UNKNOWN"

pathway

Please indicate if the dataset is representative of the patient pathway and any limitations the dataset may have with respect to pathway coverage. This could include if the dataset is from a single speciality or area, a single tier of care, linked across two tiers (e.g. primary and secondary care), or an integrated care record covering the whole patient pathway.

No standard identified

pathway

pathway Type

merged type (Pathway)

all of

Definitions group provenance

Reference this group by using

{"$ref":"#/definitions/provenance#/definitions/provenance"}
Property Type Required Nullable Defined by
origin Merged Optional cannot be null HDR UK Dataset Schema
temporal Merged Required cannot be null HDR UK Dataset Schema

origin

origin

origin Type

merged type (Details)

all of

temporal

temporal

temporal Type

merged type (Details)

all of

Definitions group origin

Reference this group by using

{"$ref":"#/definitions/origin#/definitions/origin"}
Property Type Required Nullable Defined by
purpose Merged Optional cannot be null HDR UK Dataset Schema
source Merged Optional cannot be null HDR UK Dataset Schema
collectionSituation Merged Optional cannot be null HDR UK Dataset Schema

purpose

Pleases indicate the purpose(s) that the dataset was collected.

https://ddialliance.org/Specification/DDI-Lifecycle/3.3/XMLSchema/FieldLevelDocumentation/

purpose

purpose Type

merged type (Purpose)

any of

source

Pleases indicate the source of the data extraction

https://dublincore.org/specifications/dublin-core/dcmi-terms/#source

source

source Type

merged type (Source)

any of

collectionSituation

Pleases indicate the setting(s) where data was collected. Multiple settings may be provided

https://ddialliance.org/Specification/DDI-Lifecycle/3.2/XMLSchema/FieldLevelDocumentation/

collectionSituation

collectionSituation Type

merged type (Setting)

any of

Definitions group temporal

Reference this group by using

{"$ref":"#/definitions/temporal#/definitions/temporal"}
Property Type Required Nullable Defined by
accrualPeriodicity Merged Required cannot be null HDR UK Dataset Schema
distributionReleaseDate Merged Optional cannot be null HDR UK Dataset Schema
startDate Merged Required cannot be null HDR UK Dataset Schema
endDate Merged Optional cannot be null HDR UK Dataset Schema
timeLag Merged Required cannot be null HDR UK Dataset Schema

accrualPeriodicity

Please indicate the frequency of distribution release. If a dataset is distributed regularly please choose a distribution release periodicity from the constrained list and indicate the next release date. When the release date becomes historical, a new release date will be calculated based on the publishing periodicity. If a dataset has been published and will remain static please indicate that it is static and indicated when it was released. If a dataset is released on an irregular basis or “on-demand” please indicate that it is Irregular and leave release date as null. If a dataset can be published in real-time or near-real-time please indicate that it is continuous and leave release date as null. Notes: see https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/

dct:accrualPeriodicity

accrualPeriodicity

accrualPeriodicity Type

merged type (Periodicity)

all of

distributionReleaseDate

Date of the latest release of the dataset. If this is a regular release i.e. quarterly, or this is a static dataset please complete this alongside Periodicity. If this is Irregular or Continuously released please leave this blank. Notes: Periodicity and release date will be used to determine when the next release is expected. E.g. if the release date is documented as 01/01/2020 and it is now 20/04/2020 and there is a quarterly release schedule, the latest release will be calculated as 01/04/2020.

dcat:distribution_release_date

distributionReleaseDate

distributionReleaseDate Type

merged type (Release Date)

any of

startDate

The start of the time period that the dataset provides coverage for. If there are multiple cohorts in the dataset with varying start dates, please provide the earliest date and use the description or the media attribute to provide more information.

dcat:startDate

startDate

startDate Type

merged type (Start Date)

any of

endDate

The end of the time period that the dataset provides coverage for. If the dataset is “Continuous” and has no known end date, please state continuous. If there are multiple cohorts in the dataset with varying end dates, please provide the latest date and use the description or the media attribute to provide more information.

dcat:endDate

endDate

endDate Type

merged type (End Date)

any of

timeLag

Please indicate the typical time-lag between an event and the data for that event appearing in the dataset

No standard identified

timeLag

timeLag Type

merged type (Time Lag)

all of

Definitions group accessibility

Reference this group by using

{"$ref":"#/definitions/accessibility#/definitions/accessibility"}
Property Type Required Nullable Defined by
usage Not specified Optional cannot be null HDR UK Dataset Schema
access Not specified Required cannot be null HDR UK Dataset Schema
formatAndStandards Not specified Optional cannot be null HDR UK Dataset Schema

usage

This section includes information about how the data can be used and how it is currently being used

usage

usage Type

unknown (Usage)

access

This section includes information about data access

access

access Type

unknown (Access)

formatAndStandards

Section includes technical attributes for language vocabularies, sizes etc. and gives researchers facts about and processing the underlying data in the dataset.

formatAndStandards

formatAndStandards Type

unknown (Format and Standards)

Definitions group usage

Reference this group by using

{"$ref":"#/definitions/usage#/definitions/usage"}
Property Type Required Nullable Defined by
dataUseLimitation Merged Optional cannot be null HDR UK Dataset Schema
dataUseRequirements Merged Optional cannot be null HDR UK Dataset Schema
resourceCreator Merged Optional cannot be null HDR UK Dataset Schema
investigations Merged Optional cannot be null HDR UK Dataset Schema
isReferencedBy Merged Optional cannot be null HDR UK Dataset Schema

dataUseLimitation

Please provide an indication of consent permissions for datasets and/or materials, and relates to the purposes for which datasets and/or material might be removed, stored or used. NOTE: we have extended the DUO to include a value for NO LINKAGE

https://www.ebi.ac.uk/ols/ontologies/duo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDUO_0000001

dataUseLimitation

dataUseLimitation Type

merged type (Data Use Limitation)

any of

dataUseRequirements

Please indicate fit here are any additional conditions set for use if any, multiple requirements may be provided. Please ensure that these restrictions are documented in access rights information.

https://www.ebi.ac.uk/ols/ontologies/duo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDUO_0000001

dataUseRequirements

dataUseRequirements Type

merged type (Data Use Requirements)

any of

resourceCreator

Please provide the text that you would like included as part of any citation that credits this dataset. This is typically just the name of the publisher. No employee details should be provided.

dct:creator

resourceCreator

resourceCreator Type

merged type (Citation Requirements)

any of

investigations

No standard identified

investigations

investigations Type

merged type (Investigations)

any of

isReferencedBy

Please provide the keystone paper associated with the dataset. Also include a list of known citations, if available and should be links to existing resources where the dataset has been used or referenced. Please provide multiple entries, or if you are using a csv upload please provide them as a tab separated list.

dct:isReferencedBy

isReferencedBy

isReferencedBy Type

merged type (Citations)

any of

Definitions group access

Reference this group by using

{"$ref":"#/definitions/access#/definitions/access"}
Property Type Required Nullable Defined by
accessRights Merged Required cannot be null HDR UK Dataset Schema
accessService Merged Optional cannot be null HDR UK Dataset Schema
accessRequestCost Merged Optional cannot be null HDR UK Dataset Schema
deliveryLeadTime Merged Optional cannot be null HDR UK Dataset Schema
jurisdiction Merged Required cannot be null HDR UK Dataset Schema
dataController Merged Required cannot be null HDR UK Dataset Schema
dataProcessor Merged Optional cannot be null HDR UK Dataset Schema

accessRights

dct:access_rights NOTE: need to ensure that this is consistent across the organisation info and the dataset info

accessRights

accessRights Type

merged type (Access Rights)

any of

accessService

Please provide a brief description of the data access services that are available including: environment that is currently available to researchers;additional consultancy and services;any indication of costs associated. If no environment is currently available, please indicate the current plans and timelines when and how data will be made available to researchers Note: This value will be used as default access environment for all datasets submitted by the organisation. However, there will be the opportunity to overwrite this value for each dataset.

dcat:accessService

accessService

accessService Type

merged type (Access Service)

all of

accessService Examples

"https://cnfl.extge.co.uk/display/GERE/Research+Environment+User+Guide"

accessRequestCost

Please provide link(s) to a webpage detailing the commercial model for processing data access requests for the organisation (if available) Definition: Indication of commercial model or cost (in GBP) for processing each data access request by the data custodian.

No standard identified

accessRequestCost

accessRequestCost Type

merged type (Organisation Access Request Cost)

any of

deliveryLeadTime

Please provide an indication of the typical processing times based on the types of requests typically received.

https://schema.org/deliveryLeadTime

deliveryLeadTime

deliveryLeadTime Type

merged type (Access Request Duration)

all of

jurisdiction

Please use country code from ISO 3166-1 country codes and the associated ISO 3166-2 for regions, cities, states etc. for the country/state under whose laws the data subjects' data is collected, processed and stored.

http://purl.org/dc/terms/Jurisdiction FIXME: Add ISO 3166-2 Subdivision code pattern

jurisdiction

jurisdiction Type

merged type (Jurisdiction)

any of

jurisdiction Default Value

The default value is:

"GB-ENG"

dataController

Data Controller means a person/entity who (either alone or jointly or in common with other persons/entities) determines the purposes for which and the way any Data Subject data, specifically personal data or are to be processed.

dpv:DataController

dataController

dataController Type

merged type (Data Controller)

all of

dataProcessor

A Data Processor, in relation to any Data Subject data, specifically personal data, means any person/entity (other than an employee of the data controller) who processes the data on behalf of the data controller.

dpv:DataProcessor

dataProcessor

dataProcessor Type

merged type (Data Processor)

all of

Definitions group formatAndStandards

Reference this group by using

{"$ref":"#/definitions/formatAndStandards#/definitions/formatAndStandards"}
Property Type Required Nullable Defined by
vocabularyEncodingScheme Merged Required cannot be null HDR UK Dataset Schema
conformsTo Merged Required cannot be null HDR UK Dataset Schema
language Merged Required cannot be null HDR UK Dataset Schema
format Merged Required cannot be null HDR UK Dataset Schema

vocabularyEncodingScheme

List any relevant terminologies / ontologies / controlled vocabularies, such as ICD 10 Codes, NHS Data Dictionary National Codes or SNOMED CT International, that are being used by the dataset. If the controlled vocabularies are local standards, please make that explicit. If you are using a standard that has not been included in the list, please use “other” and contact support desk to ask for an addition. Notes: More than one vocabulary may be provided.

https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/dcam/VocabularyEncodingScheme

vocabularyEncodingScheme

vocabularyEncodingScheme Type

merged type (Controlled Vocabulary)

any of

vocabularyEncodingScheme Default Value

The default value is:

"LOCAL"

conformsTo

List standardised data models that the dataset has been stored in or transformed to, such as OMOP or FHIR. If the data is only available in a local format, please make that explicit. If you are using a standard that has not been included in the list, please use “other” and contact support desk to ask for an addition.

dct:conformsTo

conformsTo

conformsTo Type

merged type (Conforms To)

any of

conformsTo Default Value

The default value is:

"LOCAL"

language

This should list all the languages in which the dataset metadata and underlying data is made available.

dct:language. FIXME: Conforms to spec, but may be a list of strings given cardinality 1:*. Validate against external list of languages. Resources defined by the Library of Congress (ISO 639-1, ISO 639-2) SHOULD be used.

language

language Type

merged type (Language)

any of

language Default Value

The default value is:

"en"

format

If multiple formats are available please specify. See application, audio, image, message, model, multipart, text, video, https://www.iana.org/assignments/media-types/media-types.xhtml Note: If your file format is not included in the current list of formats, please indicate other. If you are using the HOP you will be directed to a service desk page where you can request your additional format. If not please go to: https://metadata.atlassian.net/servicedesk/customer/portal/4 to request your format.

http://purl.org/dc/terms/format

format

format Type

merged type (Format)

any of

Definitions group enrichmentAndLinkage

Reference this group by using

{"$ref":"#/definitions/enrichmentAndLinkage#/definitions/enrichmentAndLinkage"}
Property Type Required Nullable Defined by
qualifiedRelation Merged Optional cannot be null HDR UK Dataset Schema
derivation Merged Optional cannot be null HDR UK Dataset Schema
tools Merged Optional cannot be null HDR UK Dataset Schema

qualifiedRelation

If applicable, please provide the DOI of other datasets that have previously been linked to this dataset and their availability. If no DOI is available, please provide the title of the datasets that can be linked, where possible using the same title of a dataset previously onboarded to the HOP. Note: If all the datasets from Gateway organisation can be linked please indicate “ALL” and the onboarding portal will automate linkage across the datasets submitted.

dcat:qualifiedRelation

qualifiedRelation

qualifiedRelation Type

merged type (Linked Datasets)

any of

derivation

Indicate if derived datasets or predefined extracts are available and the type of derivation available. Notes. Single or multiple dimensions can be provided as a derived extract alongside the dataset.

prov:Derivation

derivation

derivation Type

merged type (Derivations)

any of

tools

Please provide the URL of any analysis tools or models that have been created for this dataset and are available for further use. Multiple tools may be provided. Note: We encourage users to adopt a model along the lines of https://www.ga4gh.org/news/tool-registry-service-api-enabling-an-interoperable-library-of-genomics-analysis-tools/

No standard identified. We encourage users to adopt a model along the lines of https://www.ga4gh.org/news/tool-registry-service-api-enabling-an-interoperable-library-of-genomics-analysis-tools/

tools

tools Type

merged type (Tools)

any of

Definitions group observation

Reference this group by using

{"$ref":"#/definitions/observation#/definitions/observation"}
Property Type Required Nullable Defined by
observedNode Merged Required cannot be null HDR UK Dataset Schema
measuredValue integer Required cannot be null HDR UK Dataset Schema
disambiguatingDescription Merged Optional cannot be null HDR UK Dataset Schema
observationDate Merged Required cannot be null HDR UK Dataset Schema
measuredProperty Merged Required cannot be null HDR UK Dataset Schema

observedNode

Please select one of the following statistical populations for you observation

https://schema.org/observedNode

observedNode

observedNode Type

merged type (Statistical Population)

all of

observedNode Examples

"PERSONS"

measuredValue

Please provide the population size associated with the population type the dataset i.e. 1000 people in a study, or 87 images (MRI) of Knee Usage Note: Used with Statistical Population, which specifies the type of the population in the dataset.

https://schema.org/measuredValue

measuredValue

measuredValue Type

integer (Measured Value)

disambiguatingDescription

If SNOMED CT term does not provide sufficient detail, please provide a description that disambiguates the population type.

https://schema.org/disambiguatingDescription

disambiguatingDescription

disambiguatingDescription Type

merged type (Disambiguating Description)

all of

observationDate

Please provide the date that the observation was made. Some datasets may be continuously updated and the number of records will change regularly, so the observation date provides users with the date that the analysis or query was run to generate the particular observation. Multiple observations can be made i.e. an observation of cumulative COVID positive cases by specimen on the 1/1/2021 could be 2M. On the 8/1/2021 a new observation could be 2.1M. Users can add multiple observations.

https://schema.org/observationDate

observationDate

observationDate Type

merged type (Observation Date)

any of

observationDate Default Value

The default value is:

"release date"

measuredProperty

Initially this will be defaulted to "COUNT"

https://schema.org/measuredProperty

measuredProperty

measuredProperty Type

merged type (Measured Property)

all of

measuredProperty Default Value

The default value is:

"COUNT"

Definitions group uuidv4

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/uuidv4"}
Property Type Required Nullable Defined by

Definitions group semver

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/semver"}
Property Type Required Nullable Defined by

Definitions group url

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/url"}
Property Type Required Nullable Defined by

Definitions group eightyCharacters

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/eightyCharacters"}
Property Type Required Nullable Defined by

Definitions group abstractText

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/abstractText"}
Property Type Required Nullable Defined by

Definitions group emailAddress

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/emailAddress"}
Property Type Required Nullable Defined by

Definitions group shortDescription

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/shortDescription"}
Property Type Required Nullable Defined by

Definitions group description

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/description"}
Property Type Required Nullable Defined by

Definitions group longDescription

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/longDescription"}
Property Type Required Nullable Defined by

Definitions group commaSeparatedValues

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/commaSeparatedValues"}
Property Type Required Nullable Defined by

Definitions group doi

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/doi"}
Property Type Required Nullable Defined by

Definitions group ageRange

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/ageRange"}
Property Type Required Nullable Defined by

Definitions group format

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/format"}
Property Type Required Nullable Defined by

Definitions group isocountrycode

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/isocountrycode"}
Property Type Required Nullable Defined by

Definitions group memberOf

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/memberOf"}
Property Type Required Nullable Defined by

Definitions group physicalSampleAvailability

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/physicalSampleAvailability"}
Property Type Required Nullable Defined by

Definitions group followup

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/followup"}
Property Type Required Nullable Defined by

Definitions group periodicity

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/periodicity"}
Property Type Required Nullable Defined by

Definitions group purpose

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/purpose"}
Property Type Required Nullable Defined by

Definitions group source

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/source"}
Property Type Required Nullable Defined by

Definitions group setting

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/setting"}
Property Type Required Nullable Defined by

Definitions group timeLag

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/timeLag"}
Property Type Required Nullable Defined by

Definitions group dataUseLimitation

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/dataUseLimitation"}
Property Type Required Nullable Defined by

Definitions group dataUseRequirements

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/dataUseRequirements"}
Property Type Required Nullable Defined by

Definitions group deliveryLeadTime

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/deliveryLeadTime"}
Property Type Required Nullable Defined by

Definitions group standardisedDataModels

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/standardisedDataModels"}
Property Type Required Nullable Defined by

Definitions group controlledVocabulary

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/controlledVocabulary"}
Property Type Required Nullable Defined by

Definitions group language

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/language"}
Property Type Required Nullable Defined by

Definitions group statisticalPopulationConstrained

Reference this group by using

{"$ref":"https://hdruk.github.io/schemata/schema/dataset/latest/dataset.schema.json#/definitions/statisticalPopulationConstrained"}
Property Type Required Nullable Defined by

Definitions group dataClass

Reference this group by using

{"$ref":"#/definitions/dataClass#/definitions/dataClass"}
Property Type Required Nullable Defined by
name Merged Required cannot be null HDR UK Dataset Schema
description string Optional cannot be null HDR UK Dataset Schema
elements array Required cannot be null HDR UK Dataset Schema

name

The name of a table in a dataset.

Should be limited to 255 Characters, abstract text requires rewrite.

name

name Type

merged type (Table Name)

all of

description

A description of a table in a dataset.

description

description Type

string (Table Description)

description Constraints

maximum length: the maximum number of characters for this string is: 20000

minimum length: the minimum number of characters for this string is: 1

elements

A list of data elements contained within a table in a dataset.

elements

elements Type

an array of merged types (Details)

Definitions group dataElement

Reference this group by using

{"$ref":"#/definitions/dataElement#/definitions/dataElement"}
Property Type Required Nullable Defined by
name Merged Required cannot be null HDR UK Dataset Schema
dataType string Required cannot be null HDR UK Dataset Schema
description string Optional cannot be null HDR UK Dataset Schema
sensitive boolean Required cannot be null HDR UK Dataset Schema
Additional Properties Any Optional can be null

name

The name of a column in a table.

255 Chars

name

name Type

merged type (Column Name)

all of

dataType

The data type of values in the column

In future we could enumerate options for this, rather than just a string. 255 Chars

dataType

dataType Type

string (Data Type)

description

A description of a column in a table.

description

description Type

string (Column Description)

description Constraints

maximum length: the maximum number of characters for this string is: 20000

minimum length: the minimum number of characters for this string is: 1

sensitive

A True or False value, indicating if the field is sensitive or not

We could clarify a definition of what is sensitive in the future.

sensitive

sensitive Type

boolean (Sensitive)

Additional Properties

Additional properties are allowed and do not have to follow a specific schema