Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison to MaGeck #26

Open
keenhl opened this issue Feb 28, 2023 · 1 comment
Open

Comparison to MaGeck #26

keenhl opened this issue Feb 28, 2023 · 1 comment

Comments

@keenhl
Copy link

keenhl commented Feb 28, 2023

Thanks for making this tool available. It was recommended to me by a colleague. I'm new to this kind of analysis and was just curious about the advantages/disadvantages of this tool compared to the MaGeck software.

Thanks for the help.

@mhorlbeck
Copy link
Owner

Good question. At least compared to the original version of MAGeCK (there have been a few updates and I haven't kept up to date), I'd say there are three differences:

  1. sgRNA counts -> phenotype scores: The approach MAGeCK uses is more sophisticated but may be harder to directly interpret. It uses modeling of dispersion to correct for noise at lowly-represented sgRNAs, similar to DESeq if you're familiar with RNA-seq analysis, whereas this pipeline just measures log2 fold-enrichment without correction and applies a counts threshold to exclude very lowly-represented sgRNAs.

  2. sgRNA-level -> gene-level phenotypes: This pipeline uses two partially orthogonal metrics to score genes based on the sgRNAs targeting that gene:

  • it performs a Mann-Whitney p-value, which reports the chance a particular set of sgRNAs could be randomly sampled from the negative controls. It is non-parametric (relies only on guide ranking), so one strong outlier sgRNA will not have a significant effect on the p-value.
  • it uses the average of the top 3 sgRNAs by absolute value (by default, can be adjusted in the settings). This provides an estimate of the actual effect size of the gene, implicitly assuming that the sgRNAs below the top 3 are less effective at repression/activation/cutting.
    This results in a volcano plot that reveals strong effect+significant hits, weak effect but significant, and strong effect but marginally significant (i.e. you'll want to look at those manually to see if those are driven by just 1-2 active sgRNAs or by low counts and noise).

MAGeCK uses just a rank-based p-value, which in my opinion is less interpretable. But there is no reason you couldn't take the MAGeCK results from sgRNA counts->phenotypes and apply whatever statistical tests you prefer to get gene scores.

  1. Related to 2, gene-level scoring in MAGeCK is done agnostic of negative controls. There's a lot that has been debated and written about controls in CRISPR nuclease/i/a screening libraries, but I think it is very important not to assume that the median gene has no phenotype, because some screens (like essential gene screens) can have a very skewed distribution and would cause negative controls and genes with no phenotype to appear enriched relative to essential genes. The Kampmann lab has a version of MAGeCK that fixes this: https://kampmannlab.ucsf.edu/mageck-inc

You can certainly try both and see what is easiest to implement (my pipeline may not be the most user-friendly) and what gives you results that are interpretable and can be functionally validated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants