-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add funnel plot methods #812
Conversation
Fix unit tests for new stats methods
@@ -69,7 +69,8 @@ AE_Map_Adam <- function( | |||
dfInput <- dfs$dfADSL %>% | |||
mutate( | |||
SubjectID = .data[[lMapping$dfADSL$strIDCol]], | |||
Exposure = as.numeric(.data[[lMapping$dfADSL$strEndCol]] - .data[[lMapping$dfADSL$strStartCol]]) + 1) %>% | |||
Exposure = as.numeric(.data[[lMapping$dfADSL$strEndCol]] - .data[[lMapping$dfADSL$strStartCol]]) + 1 | |||
) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems accident enter press (probably not a change needed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an automated change @gwu05 - I ran the code formatter prior to my review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see it makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so cool seeing theory applied to real data - love it @Zhongkai-Wang! I reviewed the formulas in the paper and I kind of get what's going on but @gwu05 should probably have the final sign off on the stats.
R/Analyze_Binary.R
Outdated
|
||
dfAnalyzed <- dfTransformed %>% | ||
mutate( | ||
z_0 = (.data$Metric - mean(.data$Metric)) / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sum(.data$Numerator) / sum(.data$Denominator)
Good catch! Thank you, George! They are updated.
R/Analyze_Rate.R
Outdated
|
||
dfAnalyzed <- dfTransformed %>% | ||
mutate( | ||
z_0 = (.data$Metric - mean(.data$Metric)) / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
#' The input data (`dfTransformed`) for Analyze_Poisson is typically created using | ||
#' \code{\link{Transform_Rate}} and should be one record per site with columns for: | ||
#' - `GroupID` - Unique subject ID | ||
#' - `Numerator` - Number of Events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the predictbounds, I'm seeing the output Numerator is actually the Metric, and we need to calculate out the actual numerator. I think may be ideal to have both generated from the dfBounds, so have both Numerator, Denominator, and Metric. This applies to Rate Predict Bounds as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the error and inconsistency. They are updated.
Thanks! Looks good - for dfBounds, let's add in the issue we'll want to either output the 'center' line (i.e. the overall metric - 1 number when numerator is metric, and a line when y-axis is numerator) or have the viz code to plot it |
Thanks! Added to #751. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great! Nice job! Marking approved since functionality is working great for me.
Some (slightly annoying, but needed) naming changes below. These can be done here or in a new PR:
- Combine
Analyze_Binary
andAnalyze_Rate
intoAnalyze_NormalApprox
with astrMethod
param with options forbinary
andrate
- Change strMethod="funnel" to "normalApprox" in all
Assess
functions
@@ -55,14 +55,14 @@ | |||
|
|||
AE_Assess <- function(dfInput, | |||
vThreshold = NULL, | |||
strMethod = "poisson", | |||
strMethod = "funnel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zhongkai-Wang @gwu05 - Can we call this "normalApprox" or just "normal" or something like that? A Funnel plot is really just a type of visualization, not a method for analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not going to comment on this repeatedly, but it will require a bunch of minor changes)
#' | ||
#' @export | ||
|
||
Analyze_Binary <- function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would Analyze_Normal_Binary
or something similar be more descriptive?
vMu = sum(.data$Numerator) / sum(.data$Denominator), | ||
z_0 = (.data$Metric - .data$vMu) / | ||
sqrt(.data$vMu * (1 - .data$vMu) / .data$Denominator), | ||
phi = mean(.data$z_0^2), | ||
z_i = (.data$Metric - .data$vMu) / | ||
sqrt(.data$phi * .data$vMu * (1 - .data$vMu) / .data$Denominator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean and easy to follow. Will be good to explain this (and link to references) in the proposed stat vignette.
Numerator = .data$Metric * .data$Denominator | ||
) %>% | ||
# Only positive percentages are meaningful bounds | ||
filter(.data$Numerator >= 0) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dropping negative values is fine (and creates better visualizations) per our previous discussion, but might be worth considering whether there are edge cases where this will create unexpected results (like entire boundary lines being missing).
#' | ||
#' @export | ||
|
||
Analyze_Rate <- function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Analyze_Normal_Rate
?
@gwu05 @Zhongkai-Wang We're seeing lots of flag values of 1 and 2, but very few (only 1) negative flag. Is there a chance these new methods are missing situations where under-reporting is an issue? (Charts are from |
@@ -116,7 +118,10 @@ AE_Assess <- function( | |||
if (!bQuiet) cli::cli_alert_success("{.fn Transform_Rate} returned output with {nrow(lData$dfTransformed)} rows.") | |||
|
|||
# dfAnalyzed -------------------------------------------------------------- | |||
if (strMethod == "poisson") { | |||
if (strMethod == "funnel") { | |||
lData$dfAnalyzed <- gsm::Analyze_Rate(lData$dfTransformed, bQuiet = bQuiet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lData$dfAnalyzed <- gsm::Analyze_Rate(lData$dfTransformed, bQuiet = bQuiet) | |
lData$dfAnalyzed <- gsm::Analyze_NormalApprox(lData$dfTransformed, strType="rate", bQuiet = bQuiet) |
@jwildfire @Zhongkai-Wang I think it's worthwhile to do comparison plots for each just to see what's going on in more detail. Based on the results so far, one guess is that there is more room for variability on the up-side, so the variance isn't symmetric but the normal approximation assumes so, which makes it more difficult to call under-reporting etc. (we can probably see this if we drew the boundaries [rather than making them 0] for the lower boundary, and we'll find that a large section of isn't a possible outcome for the lower boundary side) |
Thanks, George. We quickly looked at a few of these scatter plots at scrum yesterday and this is what we suspected as well. I will create a summary and some comparisons. |
Overview
Fix #750
Fix #751
Fix #752
New functions created:
Analyze_Binary()
Analyze_Rate()
Flag_Funnel()
Analyze_Binary_PredictBounds()
Analyze_Rate_PredictBounds()
Assessment and workflow are updated to use new methods as default:
Need more work to clean and update across the board (for example, in
*_Assess()
,Visualize_Scatter()
, tests).Test Notes/Sample Code