Requirements

System Requirements

Relational database storage of metadata related to datasets, parameters and scripts used in running models.
Persistent and unique identifiers on all data objects.
Storage of issues against data objects.
Web and REST APIs to update and query storage metadata.
Ability to run reports of data objects which can be used by codes to pull in data and report known issues with data objects.

The data being stored falls into the following categories:

Unique identifier of dataset
Unique identifier of primary dataset
Unique identifier of curation script
Short description of dataset
Longer text description of dataset, including specification of population characteristics: spatial location, time, any other important characterisation features (survey, census; observational study, case control study, convenience sample)
Unique ID of curator
Quality Metric
Generalizability Metric
Fitness for Purpose Assessment (Red/Amber/Green)
Longer textual assessment of fitness for purpose, summarising the pros and cons of the dataset
Status tag (Active/Superceded/Invalid)
Unique identifier of successor dataset

Unique identifier of data product
Unique identifier (DOI?) of source document
Short description of data product
Longer description of data product, including specification of populations characteristics: spatial location, time, any other important characterisation features
Unique ID of reviewer
GRADE or Newcastle-Ottawa assessment?
Point estimate
Uncertainty/variability metrics (in increasing order of desirability)
- None
- Interval
- Distribution specification: distribution and parameters
- Empirical distribution
Quality Metric
Generalizability Metric
Fitness for Purpose Assessment (Red/Amber/Green)
Longer textual assessment of fitness for purpose, summarising the pros and cons of the material
Status tag (Active/Superceded/Invalid)
Unique identifier of successor data product

Unique identifier of data product
Unique identifier of dataset
Unique identifier of analysis script
Short description of data product
Longer description of dataset, (inherited from parent); specification of populations characteristics: spatial location, time, any other important characterisation features
Longer description of script used, (inherited from parent); specification of statistical analysis, and any critical assumptions
Qualitative assessment of fit to the model
Unique ID of data analyst
Point estimate
Uncertainty/variability metrics (in increasing order of desirability)
- None
- Interval
- Distribution specification: distribution and parameters
- Empirical distribution
Quality Metric (can be no better than metric for dataset)
Generalizability Metric (can be no better than metric for dataset)
Fitness for Purpose Assessment (Red/Amber/Green) (can be no better than assessment for data)
Longer textual assessment of fitness for purpose, summarising the pros and cons of the material
Status tag (Active/Superceded/Invalid)
Unique identifier of successor data product

Unique identifier of script
Short description of script
Longer description of script used; specification of statistical analysis, and any critical assumptions
Unique ID of author
Superseded tag
Unique identifier of successor script