Skip to content

Data Tables

yuenmichelle1 edited this page Aug 25, 2023 · 3 revisions

Data Architecture

ERAS has 3 main tables that it pulls from. ERAS also heavily relies on Timescale's feature of Continuous Aggregates (which are materialized views of our 3 main tables bucketed by time).

ERAS also uses Composite Primary Keys to define our hypertables and in the schema. Composite primary keys are utilized to ensure the chunk table partitions of our hypertables are setup correctly https://docs.timescale.com/timescaledb/latest/quick-start/ruby/#step-3-create-a-table.


Main Tables


comment_events

Column Name Type Description Nullable? Composite Primary Key?
comment_id integer Id of the Talk comment No ✔️
event_time timestamp created_at of Talk comment No ✔️
updated_at timestamp updated_at of Talk comment No
project_id integer project_id of the Talk comment (sometimes null, because some Talk comments are not about a specific project) Yes
user_id integer user_id of talk comment Yes

classification_events

Column Name Type Description Nullable? Composite Primary Key? Default
classification_id integer Id of classification No ✔️
event_time timestamp created_at of classification No ✔️
classification_updated_at timestamp updated_at of classification No
started_at timestamp started_at of classification. Comes from metadata of classification, set by volunteer's machine Yes
finished_at timestamp finished_at of classification. Comes from metadata of classification, set by volunteer's machine Yes
project_id integer project_id of the classification Yes
workflow_id integer workflow_id of the classification Yes
user_id integer user_id of classification Yes
user_group_ids array[int] Array of user group ids that the classifying user belongs to Yes []
session_time float finished_at - started_at time in seconds Yes

classification_user_groups

Currently, continuous aggregates cannot be created on classification_events grouping by user_group_id, since classification_event.user_group_ids is an array. A workaround suggested by Timescale is to flatten our data if we want to group by user_group_id. Because of this, we created a "faux join" table that links a classification to the multiple user groups a classifying user belongs to.

We only create classification_user_groups when a classification's metadata.user_group_ids array is not empty.

More motivations for this "faux join" mapping table can be found in Slack thread here: https://zooniverse.slack.com/archives/C010QAPB67J/p1682390755896589?thread_ts=1682096381.364729&cid=C010QAPB67J

Column Name Type Description Nullable? Composite Primary Key? Default
classification_id integer Id of classification Yes
event_time timestamp created_at of classification No
user_group_id integer id of a user_group that a classification's user belongs to No
session_time float Comes from a classification's metadata's finished_at - started_at time in seconds Yes
project_id integer project_id of the classification Yes
workflow_id integer workflow_id of the classification Yes
user_id integer user_id of classification Yes