Data Tables

Data Architecture

ERAS has 3 main tables that it pulls from. ERAS also heavily relies on Timescale's feature of Continuous Aggregates (which are materialized views of our 3 main tables bucketed by time).

ERAS also uses Composite Primary Keys to define our hypertables and in the schema. Composite primary keys are utilized to ensure the chunk table partitions of our hypertables are setup correctly https://docs.timescale.com/timescaledb/latest/quick-start/ruby/#step-3-create-a-table.

Main Tables

comment_events

Column Name	Type	Description	Nullable?	Composite Primary Key?
comment_id	integer	Id of the Talk comment	No	✔️
event_time	timestamp	`created_at` of Talk comment	No	✔️
updated_at	timestamp	`updated_at` of Talk comment	No
project_id	integer	`project_id` of the Talk comment (sometimes null, because some Talk comments are not about a specific project)	Yes
user_id	integer	`user_id` of talk comment	Yes

classification_events

Column Name	Type	Description	Nullable?	Composite Primary Key?	Default
classification_id	integer	Id of classification	No	✔️
event_time	timestamp	`created_at` of classification	No	✔️
classification_updated_at	timestamp	`updated_at` of classification	No
started_at	timestamp	`started_at` of classification. Comes from metadata of classification, set by volunteer's machine	Yes
finished_at	timestamp	`finished_at` of classification. Comes from metadata of classification, set by volunteer's machine	Yes
project_id	integer	`project_id` of the classification	Yes
workflow_id	integer	`workflow_id` of the classification	Yes
user_id	integer	`user_id` of classification	Yes
user_group_ids	array[int]	Array of user group ids that the classifying user belongs to	Yes		[]
session_time	float	`finished_at - started_at` time in seconds	Yes

classification_user_groups

Currently, continuous aggregates cannot be created on classification_events grouping by user_group_id, since classification_event.user_group_ids is an array. A workaround suggested by Timescale is to flatten our data if we want to group by user_group_id. Because of this, we created a "faux join" table that links a classification to the multiple user groups a classifying user belongs to.

We only create classification_user_groups when a classification's metadata.user_group_ids array is not empty.

More motivations for this "faux join" mapping table can be found in Slack thread here: https://zooniverse.slack.com/archives/C010QAPB67J/p1682390755896589?thread_ts=1682096381.364729&cid=C010QAPB67J

Column Name	Type	Description	Nullable?
classification_id	integer	Id of classification	Yes
event_time	timestamp	`created_at` of classification	No
user_group_id	integer	id of a user_group that a classification's user belongs to	No
session_time	float	Comes from a classification's metadata's `finished_at - started_at` time in seconds	Yes
project_id	integer	`project_id` of the classification	Yes
workflow_id	integer	`workflow_id` of the classification	Yes
user_id	integer	`user_id` of classification	Yes

Are you... Ready for it?

Overview

Home
User Groups Stats Visibility Levels
Data Tables
Mappings from Source Tables to ERAS
Continuous Aggregates
Decisions Around Session Time Caps

API Documentation

Routes and Params
Examples
Troubleshooting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly