-
Notifications
You must be signed in to change notification settings - Fork 13
Data Model
Spencer Childress edited this page Jul 24, 2024
·
7 revisions
The KRI analytics pipeline is a standardized process for Analyzing data issues by going from participant-level input
data to a standardized site-level summary
of model results. The data sets used in each step of the data pipeline are described in detail below.
-
input
data - Cross-domain participant-level input data with all needed data for KRI derivation. -
transformed
data - Site-level transformed data including KRI calculation. Created byTransform
functions. -
analyzed
data - Site-level analysis result data. Created byAnalyze
functions. -
flagged
data - Site-level analysis results with flags added. Created by passing numericthresholds
to aFlag
function. -
summary
data - Standardized subset the flagged data. This summary data has the same structure for all assessments and always includes bothKRI
andFlag
values so that we can easily look at trends for any given site across multiple assessments. Created using aSummarize
Function.
During this process, we also create bounded
data that creates upper- and lower- bounds across the full range of exposure values. This is created in the analytics pipeline, but primarily used in the reporting pipeline.
- Function(s) used to create table:
Input_Rate()
- Inputs:
dfSubjects
dfNumerator
dfDenominator
- Usage: The base data.frame for all Analytics workflows. Feeds into the
Transform_XX
functions. - Structure:
Table | Column Name | Description | Type | Optional |
---|---|---|---|---|
dfInput | SubjectID | The subject ID | Character | |
dfInput | GroupID | The group ID for the metric | Character | |
dfInput | GroupLevel | The group type for the metric (e.g. "Site") | Character | |
dfInput | Numerator | The calculated numerator value | Numeric | |
dfInput | Denominator | The calculated denominator value | Numeric | |
dfInput | Metric | The calculated rate/metric value | Numeric |
- Function(s) used to create table:
Transform_Rate()
Transform_Count()
- Inputs:
dfInput
- Usage: Convert from input data format to needed format to derive KRI for an Assessment via the
Analyze_XX
functions. - Structure:
Table | Column Name | Description | Type | Optional |
---|---|---|---|---|
dfTransformed | GroupID | The group ID for the metric | Character | |
dfTransformed | GroupLevel | The group type for the metric (e.g. "Site") | Character | |
dfTransformed | Numerator | The calculated numerator value | Numeric | |
dfTransformed | Denominator | The calculated denominator value | Numeric | |
dfTransformed | Metric | The calculated rate/metric value | Numeric |
- Function(s) used to create table:
Analyze_Fisher()
Analyze_Identity()
Analyze_NormalApprox()
Analyze_Poisson()
Analyze_QTL()
- Inputs:
dfTransformed
- Usage: Prepare the data for
Flag_XX
by performing the specified test on the metric provided. - Structure:
Table | Column Name | Description | Type | Optional |
---|---|---|---|---|
dfAnalyzed | GroupID | The group ID for the metric | Character | |
dfAnalyzed | GroupLevel | The group type for the metric (e.g. "Site") | Character | |
dfAnalyzed | Numerator | The calculated numerator value | Numeric | |
dfAnalyzed | Denominator | The calculated denominator value | Numeric | |
dfAnalyzed | Metric | The calculated rate/metric value | Numeric | |
dfAnalyzed | Score | The Statistical Score | Numeric | |
dfAnalyzed | Overall Metric | Numeric | * | |
dfAnalyzed | Factor | Numeric | * | |
dfAnalyzed | Predicted Count | Numeric | * |
- Function(s) used to create table:
Analyze_NormalApprox_PredictBounds()
Analyze_Poisson_PredictBounds()
- Inputs:
dfTransformed
- Usage: Calculates predicted percentages/rates and upper- and lower-bounds across the full range of sample sizes/total exposure values for reporting.
- Structure:
Table | Column Name | Description | Type | Optional |
---|---|---|---|---|
dfBounds | Threshold | The number of standard deviations that the upper and lower bounds are based on | Numeric | |
dfBounds | Denominator | The calculated denominator value | Numeric | |
dfBounds | LogDenominator | The calculated log denominator value | Numeric | |
dfBounds | Numerator | The calculated numerator value | Numeric | |
dfBounds | Metric | The calculated rate/metric value | Numeric | |
dfBounds | MetricID | The Metric ID | Character | * |
dfBounds | StudyID | The Study ID | Character | * |
dfBounds | SnapshotDate | The Date of the snapshot | Date | * |
- Function(s) used to create table:
Flag_Fisher()
Flag_Identity()
Flag_NormalApprox()
Flag_Poisson()
Flag_QTL()
- Inputs:
dfAnalyzed
- Usage: Flag a group-level metric to be summarized via
Summarize()
and used for reporting. - Structure:
Table | Column Name | Description | Type | Optional |
---|---|---|---|---|
dfFlagged | GroupID | The group ID for the metric | Character | |
dfFlagged | GroupLevel | The group type for the metric (e.g. "Site") | Character | |
dfFlagged | Numerator | The calculated numerator value | Numeric | |
dfFlagged | Denominator | The calculated denominator value | Numeric | |
dfFlagged | Metric | The calculated rate/metric value | Numeric | |
dfFlagged | Score | The Statistical Score | Numeric | |
dfFlagged | Flag | The ordinal Flag to be applied | Numeric | |
dfFlagged | Overall Metric | Numeric | * | |
dfFlagged | Factor | Numeric | * | |
dfFlagged | Predicted Count | Numeric | * |
- Function(s) used to create table:
Summarize()
- Inputs:
dfFlagged
- Usage: Summarize KRI at the group level for reporting.
- Structure:
Table | Column Name | Description | Type | Optional |
---|---|---|---|---|
dfSummary | GroupID | The group ID for the metric | Character | |
dfSummary | GroupLevel | The group type for the metric (e.g. "Site") | Character | |
dfSummary | Numerator | The calculated numerator value | Numeric | |
dfSummary | Denominator | The calculated denominator value | Numeric | |
dfSummary | Metric | The calculated rate/metric value | Numeric | |
dfSummary | MetricID | The Metric ID | Character | * |
dfSummary | StudyID | The Study ID | Character | * |
dfSummary | SnapshotDate | The Date of the snapshot | Date | * |
- Function(s) used to create table:
Summarize()
- Inputs:
dfFlagged
- Usage: Summarize KRI at the group level for reporting.
- Structure:
Table | Column Name | Description | Type | Optional | rbm-viz Column Name |
---|---|---|---|---|---|
dfSummary | GroupID | The group ID for the metric | Character | groupid | |
dfSummary | GroupLevel | The group type for the metric (e.g. "Site") | Character | - | |
dfSummary | Numerator | The calculated numerator value | Numeric | numerator | |
dfSummary | Denominator | The calculated denominator value | Numeric | denominator | |
dfSummary | Metric | The calculated rate/metric value | Numeric | metric | |
dfSummary | Score | The calculated metric score | Numeric | score | |
dfSummary | Flag | The calculated flag | Numeric | flag | |
dfSummary | MetricID | The Metric ID | Character | * | workflowid |
dfSummary | StudyID | The Study ID | Character | * | studyid |
dfSummary | SnapshotDate | The Date of the snapshot | Date | * | snapshot_date |
- Function(s) used to create table:
Analyze_NormalApprox_PredictBounds()
Analyze_Poisson_PredictBounds()
- Inputs:
dfTransformed
- Usage: Calculates predicted percentages/rates and upper- and lower-bounds across the full range of sample sizes/total exposure values for reporting.
- Structure:
Table | Column Name | Description | Type | Optional | rbm-viz Column Name |
---|---|---|---|---|---|
dfBounds | Threshold | The number of standard deviations that the upper and lower bounds are based on | Numeric | threshold | |
dfBounds | Denominator | The calculated denominator value | Numeric | denominator | |
dfBounds | LogDenominator | The calculated log denominator value | Numeric | log_denominator | |
dfBounds | Numerator | The calculated numerator value | Numeric | numerator | |
dfBounds | Metric | The calculated rate/metric value | Numeric | - | |
dfBounds | MetricID | The Metric ID | Character | * | workflowid |
dfBounds | StudyID | The Study ID | Character | * | studyid |
dfBounds | SnapshotDate | The Date of the snapshot | Date | * | snapshot_date |
- Workflow used to create table:
GroupMeta
- Inputs: CTMS site, study and country data
- Usage: Group metadata
- Structure:
dfGroups
is a Long data frame, with one record per parameter per group; certainParam
values are expected for givenGroupLevel
s.
Table | Column | Description | Type | Optional |
---|---|---|---|---|
dfGroups | ProtocolID | Protocol ID | Character | |
dfGroups | SnapshotDate | Snapshot Date | Character | |
dfGroups | GroupID | Unique Group ID | Character | |
dfGroups | GroupLevel | Group Level (e.g. Site, Country) | Character | |
dfGroups | Param | Parameter Name (e.g. "Status") | Character | |
dfGroups | Value | Parameter Value (e.g. "Active") | Character |
Expected Param
by GroupLevel
for use in gsm reporting. Fine to add other Param values as needed.
GroupLevel | Param | Description | Value Type |
---|---|---|---|
Study | Status | Study Status | Character |
Study | Title | Protocol Title | Numeric |
Study | ParticipantCount | # of Enrolled Participants | Numeric |
Study | SiteCount | # of Activated Sites | Numeric |
Study | ParticipantsPlanned | # of Planned Participants | Numeric |
Study | SitesPlanned | # of Planned Sites | Numeric |
Site | ParticipantCount | # of Enrolled Participants | Numeric |
Site | Status | Site Status | Character |
Site | InvestigatorFirstName | Investigator First name | Character |
Site | InvestigatorLastName | Investigator Last name | Character |
Site | City | City | Character |
Site | State | State | Character |
Site | Country | Country | Character |
Country | EnrolledParticipants | # of Enrolled Participants | Numeric |
- Function used to create table:
- Inputs:
- Usage:
- Structure:
Table | Column Name | Description | Type | Optional | rbm-viz Column Name |
---|---|---|---|---|---|
dfMetrics | File | The yaml file for workflow | Character | file | |
dfMetrics | MetricID | ID for the Metric | Character | workflowid | |
dfMetrics | Group | The group type for the metric (e.g. "Site") | Character | group | |
dfMetrics | Abbreviation | Abbreviation for the metric | Character | abbreviations | |
dfMetrics | Metric | Name of the metric | Character | metric | |
dfMetrics | Numerator | Data source for the Numerator | Character | numerator | |
dfMetrics | Denominator | Data source for the Denominator | Character | denominator | |
dfMetrics | Model | Model used to calculate metric | Character | model | |
dfMetrics | Score | Type of Score reported | Character | score | |
dfMetrics | strThreshold | Thresholds to be used for bounds and flags | Character | vthreshold |
- Function used to create table:
- Inputs:
- Usage:
- Structure:
Table | Column | Description | Type | Optional | rbm-viz Column Name |
---|---|---|---|---|---|
dfSite | protocol_row_id | Protocol row ID | Character | protocol_row_id | |
dfSite | SiteID | Unique Site ID | Character | site_id | |
dfSite | site_row_id | Site row ID | Character | site_row_id | |
dfSite | protocol | Protocol ID | Character | protocol | |
dfSite | pi_number | Principal Investigator Number | Character | pi_number | |
dfSite | pi_last_name | Principal Investigator Last Name | Character | pi_last_name | |
dfSite | pi_first_name | Principal Investigator First Name | Character | pi_first_name | |
dfSite | site_status | Site Status | Character | site_status | |
dfSite | is_satellite | Is site a satellite location | Character | is_satellite | |
dfSite | account | Character | account | ||
dfSite | site_active_dt | Date that Site became active | Character | site_active_dt | |
dfSite | city | Site City | Character | city | |
dfSite | state | Site State | Character | state | |
dfSite | country | Site Country | Character | country |
- Function used to create table:
- Inputs:
- Usage:
- Structure:
Table | Column | Description | Type | Optional |
---|---|---|---|---|
dfStudy | protocol_row_id | Protocol row ID | Character | |
dfStudy | StudyID | Unique Study ID | Character | |
dfStudy | protocol_title | Protocol Title | Character | |
dfStudy | nickname | Protocol Nickname | Character | |
dfStudy | protocol_type | Protocol Type | Character | |
dfStudy | phase | Study phase | Character | |
dfStudy | num_plan_site | Number of planned sites in the study | Character | |
dfStudy | num_site_actl | Number of active sites in the study | Character | |
dfStudy | est_fpfv | Estimated first patient first visit | Date | |
dfStudy | act_fpfv | Actual first patient first visit | Date | |
dfStudy | est_lplv | Estimated last patient last visit | Date | |
dfStudy | act_lplv | Actual last patient last visit | Date | |
dfStudy | est_lpfv | Estimated last patient first visit | Date | |
dfStudy | act_lpfv | Actual last patient first visit | Date | |
dfStudy | status | Study Status | Character | |
dfStudy | num_plan_subj | Number of planned subjects in study | Numeric | |
dfStudy | num_enrolled_subj_m | Number of enrolled subjects in study | Numeric | |
dfStudy | protocol_indication | Protocol Indication | Character | |
dfStudy | product | Product | Character | |
dfStudy | therapeutic_area | Therapeutic Area | Character | |
dfStudy | protocol_product_number | Protocol Product Number | Numeric | |
dfStudy | x_rbm_flg | Is RBM Flagged | Character |