Data Model

Overview

Example {gsm} data flow

gsm system diagram

Detailed {gsm} data model

GSM Data Model (3)

Analytics data model

The KRI analytics pipeline is a standardized process for Analyzing data issues by going from participant-level input data to a standardized site-level summary of model results. The data sets used in each step of the data pipeline are described in detail below.

input data - Cross-domain participant-level input data with all needed data for KRI derivation.
transformed data - Site-level transformed data including KRI calculation. Created by Transform functions.
analyzed data - Site-level analysis result data. Created by Analyze functions.
flagged data - Site-level analysis results with flags added. Created by passing numeric thresholds to a Flag function.
summary data - Standardized subset the flagged data. This summary data has the same structure for all assessments and always includes both KRI and Flag values so that we can easily look at trends for any given site across multiple assessments. Created using a Summarize Function.

During this process, we also create bounded data that creates upper- and lower- bounds across the full range of exposure values. This is created in the analytics pipeline, but primarily used in the reporting pipeline.

Analytics Data Tables

`dfInput`

Function(s) used to create table:
- Input_Rate()
Inputs:
- dfSubjects
- dfNumerator
- dfDenominator
Usage: The base data.frame for all Analytics workflows. Feeds into the Transform_XX functions.
Structure:

Table	Column Name	Description	Type
dfInput	SubjectID	The subject ID	Character
dfInput	GroupID	The group ID for the metric	Character
dfInput	GroupLevel	The group type for the metric (e.g. "Site")	Character
dfInput	Numerator	The calculated numerator value	Numeric
dfInput	Denominator	The calculated denominator value	Numeric
dfInput	Metric	The calculated rate/metric value	Numeric

`dfTransformed`

Function(s) used to create table:
- Transform_Rate()
- Transform_Count()
Inputs: dfInput
Usage: Convert from input data format to needed format to derive KRI for an Assessment via the Analyze_XX functions.
Structure:

Table	Column Name	Description	Type
dfTransformed	GroupID	The group ID for the metric	Character
dfTransformed	GroupLevel	The group type for the metric (e.g. "Site")	Character
dfTransformed	Numerator	The calculated numerator value	Numeric
dfTransformed	Denominator	The calculated denominator value	Numeric
dfTransformed	Metric	The calculated rate/metric value	Numeric

`dfAnalyzed`

Function(s) used to create table:
- Analyze_Fisher()
- Analyze_Identity()
- Analyze_NormalApprox()
- Analyze_Poisson()
- Analyze_QTL()
Inputs: dfTransformed
Usage: Prepare the data for Flag_XX by performing the specified test on the metric provided.
Structure:

Table	Column Name	Description	Type	Optional
dfAnalyzed	GroupID	The group ID for the metric	Character
dfAnalyzed	GroupLevel	The group type for the metric (e.g. "Site")	Character
dfAnalyzed	Numerator	The calculated numerator value	Numeric
dfAnalyzed	Denominator	The calculated denominator value	Numeric
dfAnalyzed	Metric	The calculated rate/metric value	Numeric
dfAnalyzed	Score	The Statistical Score	Numeric
dfAnalyzed	Overall Metric		Numeric	*
dfAnalyzed	Factor		Numeric	*
dfAnalyzed	Predicted Count		Numeric	*

`dfBounds`

Function(s) used to create table:
- Analyze_NormalApprox_PredictBounds()
- Analyze_Poisson_PredictBounds()
Inputs: dfTransformed
Usage: Calculates predicted percentages/rates and upper- and lower-bounds across the full range of sample sizes/total exposure values for reporting.
Structure:

Table	Column Name	Description	Type	Optional
dfBounds	Threshold	The number of standard deviations that the upper and lower bounds are based on	Numeric
dfBounds	Denominator	The calculated denominator value	Numeric
dfBounds	LogDenominator	The calculated log denominator value	Numeric
dfBounds	Numerator	The calculated numerator value	Numeric
dfBounds	Metric	The calculated rate/metric value	Numeric
dfBounds	MetricID	The Metric ID	Character	*
dfBounds	StudyID	The Study ID	Character	*
dfBounds	SnapshotDate	The Date of the snapshot	Date	*

`dfFlagged`

Function(s) used to create table:
- Flag_Fisher()
- Flag_Identity()
- Flag_NormalApprox()
- Flag_Poisson()
- Flag_QTL()
Inputs: dfAnalyzed
Usage: Flag a group-level metric to be summarized via Summarize() and used for reporting.
Structure:

Table	Column Name	Description	Type	Optional
dfFlagged	GroupID	The group ID for the metric	Character
dfFlagged	GroupLevel	The group type for the metric (e.g. "Site")	Character
dfFlagged	Numerator	The calculated numerator value	Numeric
dfFlagged	Denominator	The calculated denominator value	Numeric
dfFlagged	Metric	The calculated rate/metric value	Numeric
dfFlagged	Score	The Statistical Score	Numeric
dfFlagged	Flag	The ordinal Flag to be applied	Numeric
dfFlagged	Overall Metric		Numeric	*
dfFlagged	Factor		Numeric	*
dfFlagged	Predicted Count		Numeric	*

`dfSummary`

Function(s) used to create table:
- Summarize()
Inputs: dfFlagged
Usage: Summarize KRI at the group level for reporting.
Structure:

Table	Column Name	Description	Type	Optional
dfSummary	GroupID	The group ID for the metric	Character
dfSummary	GroupLevel	The group type for the metric (e.g. "Site")	Character
dfSummary	Numerator	The calculated numerator value	Numeric
dfSummary	Denominator	The calculated denominator value	Numeric
dfSummary	Metric	The calculated rate/metric value	Numeric
dfSummary	MetricID	The Metric ID	Character	*
dfSummary	StudyID	The Study ID	Character	*
dfSummary	SnapshotDate	The Date of the snapshot	Date	*

Overview of Reporting data model

Reporting Data Tables

`dfSummary`

Function(s) used to create table:
- Summarize()
Inputs: dfFlagged
Usage: Summarize KRI at the group level for reporting.
Structure:

Table	Column Name	Description	Type	Optional	rbm-viz Column Name
dfSummary	GroupID	The group ID for the metric	Character		groupid
dfSummary	GroupLevel	The group type for the metric (e.g. "Site")	Character		-
dfSummary	Numerator	The calculated numerator value	Numeric		numerator
dfSummary	Denominator	The calculated denominator value	Numeric		denominator
dfSummary	Metric	The calculated rate/metric value	Numeric		metric
dfSummary	Score	The calculated metric score	Numeric		score
dfSummary	Flag	The calculated flag	Numeric		flag
dfSummary	MetricID	The Metric ID	Character	*	workflowid
dfSummary	StudyID	The Study ID	Character	*	studyid
dfSummary	SnapshotDate	The Date of the snapshot	Date	*	snapshot_date

`dfBounds`

Function(s) used to create table:
- Analyze_NormalApprox_PredictBounds()
- Analyze_Poisson_PredictBounds()
Inputs: dfTransformed
Usage: Calculates predicted percentages/rates and upper- and lower-bounds across the full range of sample sizes/total exposure values for reporting.
Structure:

Table	Column Name	Description	Type	Optional	rbm-viz Column Name
dfBounds	Threshold	The number of standard deviations that the upper and lower bounds are based on	Numeric		threshold
dfBounds	Denominator	The calculated denominator value	Numeric		denominator
dfBounds	LogDenominator	The calculated log denominator value	Numeric		log_denominator
dfBounds	Numerator	The calculated numerator value	Numeric		numerator
dfBounds	Metric	The calculated rate/metric value	Numeric		-
dfBounds	MetricID	The Metric ID	Character	*	workflowid
dfBounds	StudyID	The Study ID	Character	*	studyid
dfBounds	SnapshotDate	The Date of the snapshot	Date	*	snapshot_date

`dfGroups`

Workflow used to create table: GroupMeta
Inputs: CTMS site, study and country data
Usage: Group metadata
Structure: dfGroups is a Long data frame, with one record per parameter per group; certain Param values are expected for given GroupLevels.

Table	Column	Description	Type
dfGroups	ProtocolID	Protocol ID	Character
dfGroups	SnapshotDate	Snapshot Date	Character
dfGroups	GroupID	Unique Group ID	Character
dfGroups	GroupLevel	Group Level (e.g. Site, Country)	Character
dfGroups	Param	Parameter Name (e.g. "Status")	Character
dfGroups	Value	Parameter Value (e.g. "Active")	Character

Expected Param by GroupLevel for use in gsm reporting. Fine to add other Param values as needed.

GroupLevel	Param	Description	Value Type
Study	Status	Study Status	Character
Study	Title	Protocol Title	Numeric
Study	ParticipantCount	# of Enrolled Participants	Numeric
Study	SiteCount	# of Activated Sites	Numeric
Study	ParticipantsPlanned	# of Planned Participants	Numeric
Study	SitesPlanned	# of Planned Sites	Numeric
Site	ParticipantCount	# of Enrolled Participants	Numeric
Site	Status	Site Status	Character
Site	InvestigatorFirstName	Investigator First name	Character
Site	InvestigatorLastName	Investigator Last name	Character
Site	City	City	Character
Site	State	State	Character
Site	Country	Country	Character
Country	EnrolledParticipants	# of Enrolled Participants	Numeric

`dfMetrics`

Function used to create table:
Inputs:
Usage:
Structure:

Table	Column Name	Description	Type	rbm-viz Column Name
dfMetrics	File	The yaml file for workflow	Character	file
dfMetrics	MetricID	ID for the Metric	Character	workflowid
dfMetrics	Group	The group type for the metric (e.g. "Site")	Character	group
dfMetrics	Abbreviation	Abbreviation for the metric	Character	abbreviations
dfMetrics	Metric	Name of the metric	Character	metric
dfMetrics	Numerator	Data source for the Numerator	Character	numerator
dfMetrics	Denominator	Data source for the Denominator	Character	denominator
dfMetrics	Model	Model used to calculate metric	Character	model
dfMetrics	Score	Type of Score reported	Character	score
dfMetrics	strThreshold	Thresholds to be used for bounds and flags	Character	vthreshold

Previous Versions

`dfSite`

Function used to create table:
Inputs:
Usage:
Structure:

Table	Column	Description	Type	rbm-viz Column Name
dfSite	protocol_row_id	Protocol row ID	Character	protocol_row_id
dfSite	SiteID	Unique Site ID	Character	site_id
dfSite	site_row_id	Site row ID	Character	site_row_id
dfSite	protocol	Protocol ID	Character	protocol
dfSite	pi_number	Principal Investigator Number	Character	pi_number
dfSite	pi_last_name	Principal Investigator Last Name	Character	pi_last_name
dfSite	pi_first_name	Principal Investigator First Name	Character	pi_first_name
dfSite	site_status	Site Status	Character	site_status
dfSite	is_satellite	Is site a satellite location	Character	is_satellite
dfSite	account		Character	account
dfSite	site_active_dt	Date that Site became active	Character	site_active_dt
dfSite	city	Site City	Character	city
dfSite	state	Site State	Character	state
dfSite	country	Site Country	Character	country

`dfStudy`

Function used to create table:
Inputs:
Usage:
Structure:

Table	Column	Description	Type
dfStudy	protocol_row_id	Protocol row ID	Character
dfStudy	StudyID	Unique Study ID	Character
dfStudy	protocol_title	Protocol Title	Character
dfStudy	nickname	Protocol Nickname	Character
dfStudy	protocol_type	Protocol Type	Character
dfStudy	phase	Study phase	Character
dfStudy	num_plan_site	Number of planned sites in the study	Character
dfStudy	num_site_actl	Number of active sites in the study	Character
dfStudy	est_fpfv	Estimated first patient first visit	Date
dfStudy	act_fpfv	Actual first patient first visit	Date
dfStudy	est_lplv	Estimated last patient last visit	Date
dfStudy	act_lplv	Actual last patient last visit	Date
dfStudy	est_lpfv	Estimated last patient first visit	Date
dfStudy	act_lpfv	Actual last patient first visit	Date
dfStudy	status	Study Status	Character
dfStudy	num_plan_subj	Number of planned subjects in study	Numeric
dfStudy	num_enrolled_subj_m	Number of enrolled subjects in study	Numeric
dfStudy	protocol_indication	Protocol Indication	Character
dfStudy	product	Product	Character
dfStudy	therapeutic_area	Therapeutic Area	Character
dfStudy	protocol_product_number	Protocol Product Number	Numeric
dfStudy	x_rbm_flg	Is RBM Flagged	Character