Skip to content

Data Model

Spencer Childress edited this page Jul 24, 2024 · 7 revisions


Example {gsm} data flow

gsm system diagram

Detailed {gsm} data model

GSM Data Model (3)

Analytics data model

The KRI analytics pipeline is a standardized process for Analyzing data issues by going from participant-level input data to a standardized site-level summary of model results. The data sets used in each step of the data pipeline are described in detail below.

  1. input data - Cross-domain participant-level input data with all needed data for KRI derivation.
  2. transformed data - Site-level transformed data including KRI calculation. Created by Transform functions.
  3. analyzed data - Site-level analysis result data. Created by Analyze functions.
  4. flagged data - Site-level analysis results with flags added. Created by passing numeric thresholds to a Flag function.
  5. summary data - Standardized subset the flagged data. This summary data has the same structure for all assessments and always includes both KRI and Flag values so that we can easily look at trends for any given site across multiple assessments. Created using a Summarize Function.

During this process, we also create bounded data that creates upper- and lower- bounds across the full range of exposure values. This is created in the analytics pipeline, but primarily used in the reporting pipeline.

Analytics Data Tables


  • Function(s) used to create table:
    • Input_Rate()
  • Inputs:
    • dfSubjects
    • dfNumerator
    • dfDenominator
  • Usage: The base data.frame for all Analytics workflows. Feeds into the Transform_XX functions.
  • Structure:
Table Column Name Description Type Optional
dfInput SubjectID The subject ID Character
dfInput GroupID The group ID for the metric Character
dfInput GroupLevel The group type for the metric (e.g. "Site") Character
dfInput Numerator The calculated numerator value Numeric
dfInput Denominator The calculated denominator value Numeric
dfInput Metric The calculated rate/metric value Numeric


  • Function(s) used to create table:
    • Transform_Rate()
    • Transform_Count()
  • Inputs: dfInput
  • Usage: Convert from input data format to needed format to derive KRI for an Assessment via the Analyze_XX functions.
  • Structure:
Table Column Name Description Type Optional
dfTransformed GroupID The group ID for the metric Character
dfTransformed GroupLevel The group type for the metric (e.g. "Site") Character
dfTransformed Numerator The calculated numerator value Numeric
dfTransformed Denominator The calculated denominator value Numeric
dfTransformed Metric The calculated rate/metric value Numeric


  • Function(s) used to create table:
    • Analyze_Fisher()
    • Analyze_Identity()
    • Analyze_NormalApprox()
    • Analyze_Poisson()
    • Analyze_QTL()
  • Inputs: dfTransformed
  • Usage: Prepare the data for Flag_XX by performing the specified test on the metric provided.
  • Structure:
Table Column Name Description Type Optional
dfAnalyzed GroupID The group ID for the metric Character
dfAnalyzed GroupLevel The group type for the metric (e.g. "Site") Character
dfAnalyzed Numerator The calculated numerator value Numeric
dfAnalyzed Denominator The calculated denominator value Numeric
dfAnalyzed Metric The calculated rate/metric value Numeric
dfAnalyzed Score The Statistical Score Numeric
dfAnalyzed Overall Metric Numeric *
dfAnalyzed Factor Numeric *
dfAnalyzed Predicted Count Numeric *


  • Function(s) used to create table:
    • Analyze_NormalApprox_PredictBounds()
    • Analyze_Poisson_PredictBounds()
  • Inputs: dfTransformed
  • Usage: Calculates predicted percentages/rates and upper- and lower-bounds across the full range of sample sizes/total exposure values for reporting.
  • Structure:
Table Column Name Description Type Optional
dfBounds Threshold The number of standard deviations that the upper and lower bounds are based on Numeric
dfBounds Denominator The calculated denominator value Numeric
dfBounds LogDenominator The calculated log denominator value Numeric
dfBounds Numerator The calculated numerator value Numeric
dfBounds Metric The calculated rate/metric value Numeric
dfBounds MetricID The Metric ID Character *
dfBounds StudyID The Study ID Character *
dfBounds SnapshotDate The Date of the snapshot Date *


  • Function(s) used to create table:
    • Flag_Fisher()
    • Flag_Identity()
    • Flag_NormalApprox()
    • Flag_Poisson()
    • Flag_QTL()
  • Inputs: dfAnalyzed
  • Usage: Flag a group-level metric to be summarized via Summarize() and used for reporting.
  • Structure:
Table Column Name Description Type Optional
dfFlagged GroupID The group ID for the metric Character
dfFlagged GroupLevel The group type for the metric (e.g. "Site") Character
dfFlagged Numerator The calculated numerator value Numeric
dfFlagged Denominator The calculated denominator value Numeric
dfFlagged Metric The calculated rate/metric value Numeric
dfFlagged Score The Statistical Score Numeric
dfFlagged Flag The ordinal Flag to be applied Numeric
dfFlagged Overall Metric Numeric *
dfFlagged Factor Numeric *
dfFlagged Predicted Count Numeric *


  • Function(s) used to create table:
    • Summarize()
  • Inputs: dfFlagged
  • Usage: Summarize KRI at the group level for reporting.
  • Structure:
Table Column Name Description Type Optional
dfSummary GroupID The group ID for the metric Character
dfSummary GroupLevel The group type for the metric (e.g. "Site") Character
dfSummary Numerator The calculated numerator value Numeric
dfSummary Denominator The calculated denominator value Numeric
dfSummary Metric The calculated rate/metric value Numeric
dfSummary MetricID The Metric ID Character *
dfSummary StudyID The Study ID Character *
dfSummary SnapshotDate The Date of the snapshot Date *

Overview of Reporting data model

Reporting Data Tables


  • Function(s) used to create table:
    • Summarize()
  • Inputs: dfFlagged
  • Usage: Summarize KRI at the group level for reporting.
  • Structure:
Table Column Name Description Type Optional rbm-viz Column Name
dfSummary GroupID The group ID for the metric Character groupid
dfSummary GroupLevel The group type for the metric (e.g. "Site") Character -
dfSummary Numerator The calculated numerator value Numeric numerator
dfSummary Denominator The calculated denominator value Numeric denominator
dfSummary Metric The calculated rate/metric value Numeric metric
dfSummary Score The calculated metric score Numeric score
dfSummary Flag The calculated flag Numeric flag
dfSummary MetricID The Metric ID Character * workflowid
dfSummary StudyID The Study ID Character * studyid
dfSummary SnapshotDate The Date of the snapshot Date * snapshot_date


  • Function(s) used to create table:
    • Analyze_NormalApprox_PredictBounds()
    • Analyze_Poisson_PredictBounds()
  • Inputs: dfTransformed
  • Usage: Calculates predicted percentages/rates and upper- and lower-bounds across the full range of sample sizes/total exposure values for reporting.
  • Structure:
Table Column Name Description Type Optional rbm-viz Column Name
dfBounds Threshold The number of standard deviations that the upper and lower bounds are based on Numeric threshold
dfBounds Denominator The calculated denominator value Numeric denominator
dfBounds LogDenominator The calculated log denominator value Numeric log_denominator
dfBounds Numerator The calculated numerator value Numeric numerator
dfBounds Metric The calculated rate/metric value Numeric -
dfBounds MetricID The Metric ID Character * workflowid
dfBounds StudyID The Study ID Character * studyid
dfBounds SnapshotDate The Date of the snapshot Date * snapshot_date


  • Workflow used to create table: GroupMeta
  • Inputs: CTMS site, study and country data
  • Usage: Group metadata
  • Structure: dfGroups is a Long data frame, with one record per parameter per group; certain Param values are expected for given GroupLevels.
Table Column Description Type Optional
dfGroups ProtocolID Protocol ID Character
dfGroups SnapshotDate Snapshot Date Character
dfGroups GroupID Unique Group ID Character
dfGroups GroupLevel Group Level (e.g. Site, Country) Character
dfGroups Param Parameter Name (e.g. "Status") Character
dfGroups Value Parameter Value (e.g. "Active") Character

Expected Param by GroupLevel for use in gsm reporting. Fine to add other Param values as needed.

GroupLevel Param Description Value Type
Study Status Study Status Character
Study Title Protocol Title Numeric
Study ParticipantCount # of Enrolled Participants Numeric
Study SiteCount # of Activated Sites Numeric
Study ParticipantsPlanned # of Planned Participants Numeric
Study SitesPlanned # of Planned Sites Numeric
Site ParticipantCount # of Enrolled Participants Numeric
Site Status Site Status Character
Site InvestigatorFirstName Investigator First name Character
Site InvestigatorLastName Investigator Last name Character
Site City City Character
Site State State Character
Site Country Country Character
Country EnrolledParticipants # of Enrolled Participants Numeric


  • Function used to create table:
  • Inputs:
  • Usage:
  • Structure:
Table Column Name Description Type Optional rbm-viz Column Name
dfMetrics File The yaml file for workflow Character file
dfMetrics MetricID ID for the Metric Character workflowid
dfMetrics Group The group type for the metric (e.g. "Site") Character group
dfMetrics Abbreviation Abbreviation for the metric Character abbreviations
dfMetrics Metric Name of the metric Character metric
dfMetrics Numerator Data source for the Numerator Character numerator
dfMetrics Denominator Data source for the Denominator Character denominator
dfMetrics Model Model used to calculate metric Character model
dfMetrics Score Type of Score reported Character score
dfMetrics strThreshold Thresholds to be used for bounds and flags Character vthreshold

Previous Versions


  • Function used to create table:
  • Inputs:
  • Usage:
  • Structure:
Table Column Description Type Optional rbm-viz Column Name
dfSite protocol_row_id Protocol row ID Character protocol_row_id
dfSite SiteID Unique Site ID Character site_id
dfSite site_row_id Site row ID Character site_row_id
dfSite protocol Protocol ID Character protocol
dfSite pi_number Principal Investigator Number Character pi_number
dfSite pi_last_name Principal Investigator Last Name Character pi_last_name
dfSite pi_first_name Principal Investigator First Name Character pi_first_name
dfSite site_status Site Status Character site_status
dfSite is_satellite Is site a satellite location Character is_satellite
dfSite account Character account
dfSite site_active_dt Date that Site became active Character site_active_dt
dfSite city Site City Character city
dfSite state Site State Character state
dfSite country Site Country Character country


  • Function used to create table:
  • Inputs:
  • Usage:
  • Structure:
Table Column Description Type Optional
dfStudy protocol_row_id Protocol row ID Character
dfStudy StudyID Unique Study ID Character
dfStudy protocol_title Protocol Title Character
dfStudy nickname Protocol Nickname Character
dfStudy protocol_type Protocol Type Character
dfStudy phase Study phase Character
dfStudy num_plan_site Number of planned sites in the study Character
dfStudy num_site_actl Number of active sites in the study Character
dfStudy est_fpfv Estimated first patient first visit Date
dfStudy act_fpfv Actual first patient first visit Date
dfStudy est_lplv Estimated last patient last visit Date
dfStudy act_lplv Actual last patient last visit Date
dfStudy est_lpfv Estimated last patient first visit Date
dfStudy act_lpfv Actual last patient first visit Date
dfStudy status Study Status Character
dfStudy num_plan_subj Number of planned subjects in study Numeric
dfStudy num_enrolled_subj_m Number of enrolled subjects in study Numeric
dfStudy protocol_indication Protocol Indication Character
dfStudy product Product Character
dfStudy therapeutic_area Therapeutic Area Character
dfStudy protocol_product_number Protocol Product Number Numeric
dfStudy x_rbm_flg Is RBM Flagged Character