Skip to content
matthewcornell edited this page May 11, 2020 · 10 revisions

Welcome to the dengue-data wiki!

This wiki documents the import process that creates the Impetus Project's central dengue_cases Postgres database, along with its main use cases.

temp

Overview

The Impetus Project (Improving Methods for Prediction of Epidemic Transmission Using Spatial Surveillance):

... aims to develop and extend statistical and modeling methodologies to correct for biases in surveillance data, impute missing data, predict the course of epidemics, and appropriately characterize the uncertainty in estimates and predictions at relevant spatial scales.

The codebase that makes those predictions and generates the reports uses an underlying Postgres database that this wiki describes, along with the detailed steps used to create it whenever new data arrives. This wiki also gives an overview of the former and with other database applications, esp. the analytics database (in the form of CSV files) that's stored for researchers' use.

Database source and overview

Through our collaboration with the Thai Ministry of Public Health (MOPH), we have curated a dataset containing over 2.5 million records of unique cases of dengue fever infections in Thailand since 1968. All of these records come from their national surveillance system. Thailand has 76 provinces, or "changwat" (plus one municipality, Bangkok) and close to 900 districts, or "amphoe". Based on census data from 2002, districts have on average about 70,000 residents, although they range in size from about 2,000 to over 400,000.

Records of dengue fever are split into three categories, dengue fever (DF), dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS), representing different clinical manifestations of infection with dengue virus. DHF and DSS are severe forms of dengue that are often clinically attended. In our modeling to date, we have used records of DHF because it was the first of the three to become a notifiable disease, and reporting of DHF is likely more consistent over time since it is usually a hospital-attended illness.

Data were collected and reported in different ways over time. Prior to 1999, we have monthly province-level counts of DHF. After 1999, we have list-list data (one case per row) with detailed case-level information, including an address code that specifies province, district, subdistrict, and in some cases the village level.

Contents

  • Servers-Databases-And-Code-Repos: Describes the machines and database involved in importing the data, including code locations.
  • Import-Process: Details the major steps that are taken to do an import run.
  • Dengue-Cases-Database: Describes the dengue_cases Postgres database tables, including its namespaces, tables, and functions.
  • Use-Cases: Lists the applications that use the dengue_cases database, including the main one that makes predictions and generates final reports.
  • Automated-Tests: Documents the current set of tests that run when an import executes.
  • Backups: Specifies the database backup scheme in place.
  • Unique-Ids: Describes the challenge in uniquely identifying cases, and our current and proposed solutions.