Skip to content
sheldonabrown edited this page Jul 26, 2013 · 23 revisions

Overview

This page describes the database schema and all of the tables used by the system to persist Onebusaway-NYC data. The data is stored in three databases given below, and also offline on a file server. This page briefly describes the purpose of each database and its tables. Typically there is one master database for each environment (production, staging, developement). There is optionally a read only database in each environment which replicates in real-time the data in the master database, provided for querying purposes without affecting write performance of the master.

Also note that queue messaging is used to exchange real-time data for this application; the tables below represent ancillary purposes such as authentication, service alerts, and data archiving.

Timezones

Real-time, Inference, and archived data (data that is produced by the archiver) is stored in the UTC time zone. All other data is stored in the locally configured time zone.

Archival Databases

Three distinct reporting periods exist in OneBusAway as represented in the diagram below:
This diagram distinguishes the time frames of the various archival databases:
Archiving Time Frames

Real-Time database stores from now to 30 days.
Reporting database stores yesterday’s data through two years back, dropping a few unnecessary fields along the way. It is updated daily via cron job.
Archive is permanent off-line storage of database dumps on a file server.

App Database

This database is called ‘onebusaway_nyc_app’ and stores data about users, their roles and service alerts in the system. This database is used for:

  • authenticating user to the admin web application;
  • authenticate API key requests to the front-end;
  • store active service alerts and service alerts subscriptions managed by the TDM;

It has following tables:

Table Description
oba_users Stores users of the system
oba_user_indices Stores user type as well as user credentials
oba_user_roles Stores roles created in the system
oba_user_roles_mapping Stores each user’s role in the system
obanyc_servicealerts Stores service alerts received by the system
obanyc_servicealert_subscription Stores service alert subscriptions

The following diagram shows ‘App’ database design
App Database Design

Real-time Database

This database is called ‘onebusaway_nyc’ and stores real time as well as inferred vehicle data. It also stores depot, vehicle and crew assignment data that is uploaded by the transit agency daily. This database is used by archiver webapp to persist vehicle data and its APIs to retrieve it.

The obanyc_cclocationreport and obanyc_inferredlocation_record tables partition by day on the time_received and time_reported columns respectively. In this manner, partitions older than 30 days can be dropped by an external cron job without taking the database off-line.

The real-time database has following tables

Table Description
obanyc_cclocationreport Stores real time vehicle data
obanyc_inferredlocation Stores vehicle inference data
obanyc_last_known_vehicle Stores last known record for each vehicle. It is a combination of real time and inferred data for that vehicle
obanyc_invalidlocationreport Stores errors when persisting real time and inference records
obanyc_crewassignment Stores crew assignment data uploaded by Transit Agency daily
obanyc_vehiclepullout Stores vehicle pullout data uploaded by Transit Agency daily
obanyc_depot Stores depot data fetched from webservice that is persisted daily
obanyc_systemlog Stores messages logged while performing actions such as user login by the system

The following diagram shows ‘real-time’ database design
Real Time

Reporting Database

The reporting database, onebusaway_nyc_reporting, consists of a single table, obanyc_reporting, that is updated daily via a dump of the last day of the real-time database. As part of the dump and load process, the real-time and inference tables are combined into a single table, and some unnecessary columns are dropped. The reporting database contains data from yesterday through to two years back. Like the real-time database, each day is a partition use the time_received column; the partitions of data older than 2 years are simply dropped by an external cron job called daily.

The following diagram shows the ‘reporting’ database design
Reporting Database Design

Archive Database (off-line)

The archive database consists of MySQL dumps of the real-time database used to copy data from the real-time to the reporting database. These dump files are copied to a file server for permanent archival.

The format of the dump is that of the real-time database. The archive database contains data from yesterday through the beginning of the project with the following filename convention:

  
obanyc_reporting.YYYY-MM-DD.sql.gz
  

As part of this process, CSV is generated representing the same dataset. It follows a similar convention as above:


obanyc_reporting.YYYY-MM-DD.csv.gz

The format of the CSV is a simple enumeration of the database columns in a tab delimited manner. Here is an example:
  
data_quality_qualitative_indicator      dest_sign_code  direction_deg   emergency_code  latitude        longitude       manufacturer_data       operator_id_designator  request_id      route_id_designator     run_id_designator       speed   time_received   time_reported   vehicle_agency_designator       vehicle_agency_id       vehicle_id      time_received_index     has_inference   agency_id       archive_time_received   depot_id        distance_along_block    distance_along_trip     inference_is_formal     inferred_block_id       inferred_dest_sign_code inferred_direction_id   inferred_latitude       inferred_longitude      inferred_operator_id    inferred_phase  inferred_route_id       inferred_run_id inferred_status inferred_trip_id        next_scheduled_stop_distance    next_scheduled_stop_id  schedule_deviation      service_date    assigned_run_id
NULL    6       175.77  NULL    40.632810       -74.127648      XYZ155-600-599 0       498     000     000     -15.0   2012-01-01 05:00:00     2012-01-01 04:59:56     MTA NYCT        2008    4000    NULL    0       NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    4630    128.77  NULL    40.684688       -73.978993      XYZ123456789           4992772                 -30.0   2012-01-01 05:00:00     2012-01-01 04:59:56     MTA NYCT        2008    7572    NULL    0       NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
  

Tables

All the tables and their columns are described in detail in Database Tables page.

Clone this wiki locally