Skip to content

Requirements

Jonathan Hollocombe edited this page May 15, 2020 · 6 revisions

System Requirements

  1. Relational database storage of metadata related to datasets, parameters and scripts used in running models.
  2. Persistent and unique identifiers on all data objects.
  3. Storage of issues against data objects.
  4. Web and REST APIs to update and query storage metadata.
  5. Ability to run reports of data objects which can be used by codes to pull in data and report known issues with data objects.

Data Specifications

The data being stored falls into the following categories:

Datasets

  • Unique identifier of dataset
  • Unique identifier of primary dataset
  • Unique identifier of curation script
  • Short description of dataset
  • Longer text description of dataset, including specification of population characteristics: spatial location, time, any other important characterisation features (survey, census; observational study, case control study, convenience sample)
  • Unique ID of curator
  • Quality Metric
  • Generalizability Metric
  • Fitness for Purpose Assessment (Red/Amber/Green)
  • Longer textual assessment of fitness for purpose, summarising the pros and cons of the dataset
  • Status tag (Active/Superceded/Invalid)
  • Unique identifier of successor dataset

Literature-based data products

  • Unique identifier of data product
  • Unique identifier (DOI?) of source document
  • Short description of data product
  • Longer description of data product, including specification of populations characteristics: spatial location, time, any other important characterisation features
  • Unique ID of reviewer
  • GRADE or Newcastle-Ottawa assessment?
  • Point estimate
  • Uncertainty/variability metrics (in increasing order of desirability)
    • None
    • Interval
    • Distribution specification: distribution and parameters
    • Empirical distribution
  • Quality Metric
  • Generalizability Metric
  • Fitness for Purpose Assessment (Red/Amber/Green)
  • Longer textual assessment of fitness for purpose, summarising the pros and cons of the material
  • Status tag (Active/Superceded/Invalid)
  • Unique identifier of successor data product

Dataset-based data products

  • Unique identifier of data product
  • Unique identifier of dataset
  • Unique identifier of analysis script
  • Short description of data product
  • Longer description of dataset, (inherited from parent); specification of populations characteristics: spatial location, time, any other important characterisation features
  • Longer description of script used, (inherited from parent); specification of statistical analysis, and any critical assumptions
  • Qualitative assessment of fit to the model
  • Unique ID of data analyst
  • Point estimate
  • Uncertainty/variability metrics (in increasing order of desirability)
    • None
    • Interval
    • Distribution specification: distribution and parameters
    • Empirical distribution
  • Quality Metric (can be no better than metric for dataset)
  • Generalizability Metric (can be no better than metric for dataset)
  • Fitness for Purpose Assessment (Red/Amber/Green) (can be no better than assessment for data)
  • Longer textual assessment of fitness for purpose, summarising the pros and cons of the material
  • Status tag (Active/Superceded/Invalid)
  • Unique identifier of successor data product

Scripts

  • Unique identifier of script
  • Short description of script
  • Longer description of script used; specification of statistical analysis, and any critical assumptions
  • Unique ID of author
  • Superseded tag
  • Unique identifier of successor script