Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with reproducibility using recompute() after gs_pop_add() #374

Open
joliephan opened this issue Mar 8, 2022 · 3 comments
Open

Issue with reproducibility using recompute() after gs_pop_add() #374

joliephan opened this issue Mar 8, 2022 · 3 comments

Comments

@joliephan
Copy link

I am getting different cell counts every time I run the same script where I use recompute() after adding new gates using gs_pop_add() that are boolean combinations of preexisting gates. This has affected downstream analyses where I look for significant differences between subpopulation frequencies. How do I ensure that the gating is reproducible and consistent?

@mikejiang
Copy link
Member

That behavior you described has not been observed in the past, can you provide an example code that can reproduce the issue?

@joliephan
Copy link
Author

joliephan commented Mar 11, 2022

I think I figured out the source of this issue. The script is reproducible if I recompute with default settings using recompute(gs). However, it isn't reproducible if I specify another node name/path other than the default "root" (e.g., recompute(gs, "/Time/CD3+/CD14-CD19-/Lymphocytes/Singlet/Live/CD4+")).

Could you explain the difference between recomputing beginning at the "root" versus another node, assuming the newly added gates are descendants of the specified node? Thank you!

This is an example code that gives irreproducible counts under the new gates:

# Boolean combinations of existing gates
bool_gates <- c("CD4+/CD107a+&CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&CD4+/IL2+&!CD4+/IL4_5_13+&CD4+/TNFa+",
               "!CD4+/CD107a+&CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&CD4+/IL2+&CD4+/IL4_5_13+&CD4+/TNFa+",
               "CD4+/CD107a+&CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&!CD4+/IL2+&!CD4+/IL4_5_13+&CD4+/TNFa+",
               "!CD4+/CD107a+&CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&!CD4+/IL2+&!CD4+/IL4_5_13+&!CD4+/TNFa+",
               "!CD4+/CD107a+&!CD4+/CD154+&!CD4+/IFNg+&!CD4+/IL17a+&!CD4+/IL2+&!CD4+/IL4_5_13+&CD4+/TNFa+",
               "!CD4+/CD107a+&!CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&!CD4+/IL2+&!CD4+/IL4_5_13+&!CD4+/TNFa+")

names(bool_gates) <- paste0("CD4_", gsub("CD4\\+\\/", "", gsub("\\&", "_AND_", gsub("\\!", "NOT_", bool_gates))))

# Add new gates
for(booleanSubsetName in names(bool_gates)) {
  call <- substitute(flowWorkspace::booleanFilter(v), list(v = as.symbol(bool_gates[[booleanSubsetName]])))
  g <- eval(call)
  suppressWarnings(flowWorkspace::gs_pop_add(gs, g, parent = "CD4+", name=booleanSubsetName))
}

# Add a union gate of all the new gates
gs_pop_add(gs, eval(substitute(flowWorkspace::booleanFilter(v),
                               list(v = as.symbol(paste(names(bool_gates), collapse="|"))))),
           parent = "CD4+", name = "Boolean_Union")

# Recompute the gating set with the new gates
flowWorkspace::recompute(gs, "/Time/CD3+/CD14-CD19-/Lymphocytes/Singlet/Live/CD4+")

@mikejiang
Copy link
Member

I tried to mimic your use case but cannot reproduce the issue you reported here is my sample

library(flowCore)
library(flowWorkspace)
dataDir <- system.file("extdata",package="flowWorkspaceData")
suppressMessages(gs <- load_gs(list.files(dataDir, pattern = "gs_manual",full = TRUE)))
getNodes(gs)
# Boolean combinations of existing gates
bool_gates <- c("CD4/38+ DR+&CD4/CCR7+ 45RA+",
                "!CD4/38+ DR+&CD4/CCR7+ 45RA+",
                "CD4/38+ DR+&!CD4/CCR7+ 45RA+")

names(bool_gates) <- paste0("CD4_", gsub("CD4\\/", "", gsub("\\&", "_AND_", gsub("\\!", "NOT_", bool_gates))))

# Add new gates
for(booleanSubsetName in names(bool_gates)) {
  call <- substitute(flowWorkspace::booleanFilter(v), list(v = as.symbol(bool_gates[[booleanSubsetName]])))
  g <- eval(call)
  suppressWarnings(flowWorkspace::gs_pop_add(gs, g, parent = "CD4", name=booleanSubsetName))
}
getNodes(gs)
# Add a union gate of all the new gates
gs_pop_add(gs, eval(substitute(flowWorkspace::booleanFilter(v),
                               list(v = as.symbol(paste(names(bool_gates), collapse="|"))))),
           parent = "CD4", name = "Boolean_Union")

# Recompute the gating set with the new gates
flowWorkspace::recompute(gs, "CD4")
gs_pop_get_stats(gs)[25:28,]
               sample                                                           pop count
1: CytoTrol_CytoTrol_1.fcs     /not debris/singlets/CD3+/CD4/CD4_38+ DR+_AND_CCR7+ 45RA+   445
2: CytoTrol_CytoTrol_1.fcs /not debris/singlets/CD3+/CD4/CD4_NOT_38+ DR+_AND_CCR7+ 45RA+ 11671
3: CytoTrol_CytoTrol_1.fcs /not debris/singlets/CD3+/CD4/CD4_38+ DR+_AND_NOT_CCR7+ 45RA+   599
4: CytoTrol_CytoTrol_1.fcs                   /not debris/singlets/CD3+/CD4/Boolean_Union 12715

I get the same counts when I run the script multiple times, can you give it a try?
If it's possible, provide a similar minimal reproducible example for troubleshooting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants