Add funnel plot methods #812

Zhongkai-Wang · 2022-10-17T19:09:27Z

Overview

New functions created:
Analyze_Binary()
Analyze_Rate()
Flag_Funnel()
Analyze_Binary_PredictBounds()
Analyze_Rate_PredictBounds()
Assessment and workflow are updated to use new methods as default:
Need more work to clean and update across the board (for example, in *_Assess(), Visualize_Scatter(), tests).

Test Notes/Sample Code

# LB ----------------------------------------------------------------------
dfInput <- LB_Map_Raw()

dfTransformed <- Transform_Rate(
  dfInput,
  strGroupCol = "SiteID",
  strNumeratorCol = "Count",
  strDenominatorCol = "Total"
)

dfAnalyzed <- Analyze_Binary(dfTransformed)

dfFlagged <- Flag_Funnel(dfAnalyzed, vThreshold = c(-3, -2, 2, 3))

dfBounds <- Analyze_Binary_PredictBounds(dfTransformed, vThreshold = c(-3, -2, 2, 3))

# Disp --------------------------------------------------------------------
dfInput <- Disp_Map_Raw()

dfTransformed <- Transform_Rate(
  dfInput,
  strGroupCol = "SiteID",
  strNumeratorCol = "Count",
  strDenominatorCol = "Total"
)

dfAnalyzed <- Analyze_Binary(dfTransformed)

dfFlagged <- Flag_Funnel(dfAnalyzed, vThreshold = c(-3, -2, 2, 3))

dfBounds <- Analyze_Binary_PredictBounds(dfTransformed, vThreshold = c(-3, -2, 2, 3))

# AE ----------------------------------------------------------------------
dfInput <- AE_Map_Raw() %>% na.omit()

dfTransformed <- Transform_Rate(
  dfInput,
  strGroupCol = "SiteID",
  strNumeratorCol = "Count",
  strDenominatorCol = "Exposure"
)
dfAnalyzed <- Analyze_Rate(dfTransformed)

dfFlagged <- Flag_Funnel(dfAnalyzed, vThreshold = c(-3, -2, 2, 3))

dfBounds <- Analyze_Rate_PredictBounds(dfTransformed, vThreshold = c(-3, -2, 2, 3))

# PD ----------------------------------------------------------------------
dfInput <- PD_Map_Raw() 

dfTransformed <- Transform_Rate(
  dfInput,
  strGroupCol = "SiteID",
  strNumeratorCol = "Count",
  strDenominatorCol = "Exposure"
)

dfAnalyzed <- Analyze_Rate(dfTransformed)

dfFlagged <- Flag_Funnel(dfAnalyzed, vThreshold = c(-3, -2, 2, 3))

dfBounds <- Analyze_Rate_PredictBounds(dfTransformed, vThreshold = c(-3, -2, 2, 3))

Fix unit tests for new stats methods

gwu05 · 2022-10-21T18:25:05Z

R/AE_Map_Adam.R

@@ -69,7 +69,8 @@ AE_Map_Adam <- function(
    dfInput <- dfs$dfADSL %>%
      mutate(
        SubjectID = .data[[lMapping$dfADSL$strIDCol]],
-        Exposure = as.numeric(.data[[lMapping$dfADSL$strEndCol]] - .data[[lMapping$dfADSL$strStartCol]]) + 1) %>%
+        Exposure = as.numeric(.data[[lMapping$dfADSL$strEndCol]] - .data[[lMapping$dfADSL$strStartCol]]) + 1
+      ) %>%


seems accident enter press (probably not a change needed?)

This is just an automated change @gwu05 - I ran the code formatter prior to my review.

oh i see it makes sense

samussiah

It's so cool seeing theory applied to real data - love it @Zhongkai-Wang! I reviewed the formulas in the paper and I kind of get what's going on but @gwu05 should probably have the final sign off on the stats.

R/Analyze_Binary.R

R/Analyze_Rate.R

gwu05 · 2022-10-21T22:22:01Z

R/Analyze_Binary.R

+
+  dfAnalyzed <- dfTransformed %>%
+    mutate(
+      z_0 = (.data$Metric - mean(.data$Metric)) /


I think the estimate of p0 here should be: sum(.data$Numerator) / sum(.data$Denominator) as per Zink?

sum(.data$Numerator) / sum(.data$Denominator)

Good catch! Thank you, George! They are updated.

gwu05 · 2022-10-21T22:24:46Z

R/Analyze_Rate.R

+
+  dfAnalyzed <- dfTransformed %>%
+    mutate(
+      z_0 = (.data$Metric - mean(.data$Metric)) /


Same applies for the rate :

gwu05 · 2022-10-24T17:09:33Z

R/Analyze_Binary_PredictBounds.R

+#' The input data (`dfTransformed`) for Analyze_Poisson is typically created using
+#' \code{\link{Transform_Rate}} and should be one record per site with columns for:
+#' - `GroupID` - Unique subject ID
+#' - `Numerator` - Number of Events


For the predictbounds, I'm seeing the output Numerator is actually the Metric, and we need to calculate out the actual numerator. I think may be ideal to have both generated from the dfBounds, so have both Numerator, Denominator, and Metric. This applies to Rate Predict Bounds as well.

Sorry for the error and inconsistency. They are updated.

R/Analyze_Binary.R

gwu05 · 2022-10-25T16:01:49Z

Thanks! Looks good - for dfBounds, let's add in the issue we'll want to either output the 'center' line (i.e. the overall metric - 1 number when numerator is metric, and a line when y-axis is numerator) or have the viz code to plot it

Zhongkai-Wang · 2022-10-25T16:37:36Z

for dfBounds, let's add in the issue we'll want to either output the 'center' line (i.e. the overall metric - 1 number when numerator is metric, and a line when y-axis is numerator) or have the viz code to plot it

Thanks! Added to #751.

jwildfire

This all looks great! Nice job! Marking approved since functionality is working great for me.

Some (slightly annoying, but needed) naming changes below. These can be done here or in a new PR:

Combine Analyze_Binary and Analyze_Rate into Analyze_NormalApprox with a strMethod param with options for binary and rate
Change strMethod="funnel" to "normalApprox" in all Assess functions

jwildfire · 2022-10-27T14:35:34Z

R/AE_Assess.R

@@ -55,14 +55,14 @@

 AE_Assess <- function(dfInput,
  vThreshold = NULL,
-  strMethod = "poisson",
+  strMethod = "funnel",


@Zhongkai-Wang @gwu05 - Can we call this "normalApprox" or just "normal" or something like that? A Funnel plot is really just a type of visualization, not a method for analysis.

(not going to comment on this repeatedly, but it will require a bunch of minor changes)

jwildfire · 2022-10-27T14:43:00Z

R/Analyze_Binary.R

+#'
+#' @export
+
+Analyze_Binary <- function(


Would Analyze_Normal_Binary or something similar be more descriptive?

jwildfire · 2022-10-27T14:49:18Z

R/Analyze_Binary.R

+      vMu = sum(.data$Numerator) / sum(.data$Denominator),
+      z_0 = (.data$Metric - .data$vMu) /
+        sqrt(.data$vMu * (1 - .data$vMu) / .data$Denominator),
+      phi = mean(.data$z_0^2),
+      z_i = (.data$Metric -  .data$vMu) /
+        sqrt(.data$phi * .data$vMu * (1 - .data$vMu) / .data$Denominator)


Very clean and easy to follow. Will be good to explain this (and link to references) in the proposed stat vignette.

jwildfire · 2022-10-27T14:56:32Z

R/Analyze_Binary_PredictBounds.R

+      Numerator = .data$Metric * .data$Denominator
+    ) %>%
+    # Only positive percentages are meaningful bounds
+    filter(.data$Numerator >= 0) %>%


I think dropping negative values is fine (and creates better visualizations) per our previous discussion, but might be worth considering whether there are edge cases where this will create unexpected results (like entire boundary lines being missing).

jwildfire · 2022-10-27T14:58:51Z

R/Analyze_Rate.R

+#'
+#' @export
+
+Analyze_Rate <- function(


Analyze_Normal_Rate?

jwildfire · 2022-10-27T16:00:40Z

@gwu05 @Zhongkai-Wang We're seeing lots of flag values of 1 and 2, but very few (only 1) negative flag. Is there a chance these new methods are missing situations where under-reporting is an issue? (Charts are from MakeSnapshot()$results_summary on the PR branch)

jwildfire · 2022-10-27T16:21:41Z

R/AE_Assess.R

@@ -116,7 +118,10 @@ AE_Assess <- function(
    if (!bQuiet) cli::cli_alert_success("{.fn Transform_Rate} returned output with {nrow(lData$dfTransformed)} rows.")

    # dfAnalyzed --------------------------------------------------------------
-    if (strMethod == "poisson") {
+    if (strMethod == "funnel") {
+      lData$dfAnalyzed <- gsm::Analyze_Rate(lData$dfTransformed, bQuiet = bQuiet)


Suggested change

lData$dfAnalyzed <- gsm::Analyze_Rate(lData$dfTransformed, bQuiet = bQuiet)

lData$dfAnalyzed <- gsm::Analyze_NormalApprox(lData$dfTransformed, strType="rate", bQuiet = bQuiet)

gwu05 · 2022-10-28T05:15:25Z

@gwu05 @Zhongkai-Wang We're seeing lots of flag values of 1 and 2, but very few (only 1) negative flag. Is there a chance these new methods are missing situations where under-reporting is an issue? (Charts are from MakeSnapshot()$results_summary on the PR branch)

@jwildfire @Zhongkai-Wang
This one is an interesting observation - I tried running using our prior setup and got:

I think it's worthwhile to do comparison plots for each just to see what's going on in more detail. Based on the results so far, one guess is that there is more room for variability on the up-side, so the variance isn't symmetric but the normal approximation assumes so, which makes it more difficult to call under-reporting etc. (we can probably see this if we drew the boundaries [rather than making them 0] for the lower boundary, and we'll find that a large section of isn't a possible outcome for the lower boundary side)

Here, I drew an example:

non-jittered version:

Zhongkai-Wang · 2022-10-28T16:40:16Z

I think it's worthwhile to do comparison plots for each just to see what's going on in more detail. Based on the results so far, one guess is that there is more room for variability on the up-side, so the variance isn't symmetric but the normal approximation assumes so, which makes it more difficult to call under-reporting etc. (we can probably see this if we drew the boundaries [rather than making them 0] for the lower boundary, and we'll find that a large section of isn't a possible outcome for the lower boundary side)

Thanks, George. We quickly looked at a few of these scatter plots at scrum yesterday and this is what we suspected as well. I will create a summary and some comparisons.
Exactly same as what you are pointing out here, an initial thought is the funnel plot is always assuming symmetric limits around the overall mean regardless the variance at each sample size. However, in reality with the actual data, the distribution at each sample size isn't symmetric and also it's truncated at zero.

Zhongkai-Wang added 5 commits October 17, 2022 13:56

Create funnel methods

5e7c957

Update funnel functions

596c73c

Update default assessment and workflow

b90a940

Run devtools::document()

9cdf85a

minor updates

38e5c78

Zhongkai-Wang added this to the v1.3.0 milestone Oct 17, 2022

Zhongkai-Wang self-assigned this Oct 17, 2022

gwu05 self-requested a review October 18, 2022 18:02

Zhongkai-Wang requested review from mattroumaya, samussiah and kodesiba October 19, 2022 13:39

Zhongkai-Wang marked this pull request as ready for review October 19, 2022 13:39

mattroumaya and others added 5 commits October 19, 2022 21:33

unit tests

ce6a7ba

Merge pull request #820 from Gilead-BioStats/fix-750-mr

7a2ca90

Fix unit tests for new stats methods

update pkgdown index

d5d707d

merge dev

e0b69ae

run styler

d62d4cb

gwu05 reviewed Oct 21, 2022

View reviewed changes

samussiah approved these changes Oct 21, 2022

View reviewed changes

gwu05 reviewed Oct 21, 2022

View reviewed changes

R/Analyze_Binary.R Show resolved Hide resolved

gwu05 reviewed Oct 21, 2022

View reviewed changes

R/Analyze_Rate.R Show resolved Hide resolved

gwu05 reviewed Oct 21, 2022

View reviewed changes

Zhongkai-Wang added 2 commits October 22, 2022 21:03

Update overall mean

a4301f5

Output overdispersion factor

fbdc015

gwu05 reviewed Oct 24, 2022

View reviewed changes

R/Analyze_Binary.R Show resolved Hide resolved

Zhongkai-Wang added 3 commits October 24, 2022 13:36

Update dfBounds

f38e69b

Update overall mean

f2b0738

dfBounds

b6e9a11

Correct typo

88ddf0a

Zhongkai-Wang requested a review from jwildfire October 25, 2022 16:07

Zhongkai-Wang added 2 commits October 27, 2022 10:54

Update merge conflicts

4c313a4

Merge dev into fix-750 and resolve conflicts

221b959

jwildfire approved these changes Oct 27, 2022

View reviewed changes

jwildfire reviewed Oct 27, 2022

View reviewed changes

address snapshot tests

6db137a

mattroumaya mentioned this pull request Oct 27, 2022

QC: Refine new stats methods #833

Closed

Zhongkai-Wang merged commit 41930c2 into dev Oct 27, 2022

samussiah deleted the fix-750 branch December 19, 2022 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add funnel plot methods #812

Add funnel plot methods #812

Zhongkai-Wang commented Oct 17, 2022 •

edited

Loading

gwu05 Oct 21, 2022

samussiah Oct 21, 2022

gwu05 Oct 24, 2022

samussiah left a comment

gwu05 Oct 21, 2022

Zhongkai-Wang Oct 23, 2022 •

edited

Loading

gwu05 Oct 21, 2022

Zhongkai-Wang Oct 24, 2022

gwu05 Oct 24, 2022

Zhongkai-Wang Oct 24, 2022

gwu05 commented Oct 25, 2022

Zhongkai-Wang commented Oct 25, 2022

jwildfire left a comment

jwildfire Oct 27, 2022

jwildfire Oct 27, 2022

jwildfire Oct 27, 2022

jwildfire Oct 27, 2022

jwildfire Oct 27, 2022

jwildfire Oct 27, 2022

jwildfire commented Oct 27, 2022 •

edited

Loading

jwildfire Oct 27, 2022 •

edited

Loading

gwu05 commented Oct 28, 2022 •

edited

Loading

Zhongkai-Wang commented Oct 28, 2022

	lData$dfAnalyzed <- gsm::Analyze_Rate(lData$dfTransformed, bQuiet = bQuiet)
	lData$dfAnalyzed <- gsm::Analyze_NormalApprox(lData$dfTransformed, strType="rate", bQuiet = bQuiet)

Add funnel plot methods #812

Add funnel plot methods #812

Conversation

Zhongkai-Wang commented Oct 17, 2022 • edited Loading

Overview

Test Notes/Sample Code

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samussiah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zhongkai-Wang Oct 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gwu05 commented Oct 25, 2022

Zhongkai-Wang commented Oct 25, 2022

jwildfire left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwildfire commented Oct 27, 2022 • edited Loading

jwildfire Oct 27, 2022 • edited Loading

Choose a reason for hiding this comment

gwu05 commented Oct 28, 2022 • edited Loading

Zhongkai-Wang commented Oct 28, 2022

Zhongkai-Wang commented Oct 17, 2022 •

edited

Loading

Zhongkai-Wang Oct 23, 2022 •

edited

Loading

jwildfire commented Oct 27, 2022 •

edited

Loading

jwildfire Oct 27, 2022 •

edited

Loading

gwu05 commented Oct 28, 2022 •

edited

Loading