Replies: 3 comments
-
Hi @Domoun, thanks for some great questions! Just for some background info, the clustering (and z label) is needed in both cases, with and without the background/raw matrix. This clustering is used to get a sense of what a “true” cell should look like for each cell population. The major difference is how the contamination distribution is calculated for each cell population. When not using a background matrix, there is a separate contamination distribution calculated for each cell population which is a weighted mixture of all other cell populations present in the data. This works well in many scenarios, but it does assume that the proportion of cell types captured in your data largely reflects the proportion of cell types that were present in the cell suspension. If one particular cell type is underrepresented in the data (maybe due to biases in the dissociation process or because cell sorting was performed and removed some cell types), then markers for that cell type may be present in the cell data but will not be subtracted out in this case. When the background matrix is supplied, a single contamination distribution is calculated for all cell type using the empty droplets. This has the advantage of capturing UMIs/reads from contamination of cell types that were not adequately represented in the cell data (i.e. the non-empty droplets). With regards to less correction happening when a background matrix is supplied. One technical question, what was the class of the matrix that was supplied to decontX. You can find this by typing ‘class(mat)’ or ‘class(counts(mat))’ if using a SingleCellExperiment object (where ‘mat’ is the name of your object). We were having one technical problem where a matrices of in “dgTMatrix” were not being converted to “dgCMatrix” properly and the contamination was under calculated. Also, can you tell me how you are reading the matrices (both raw and filtering) into R, what version of celda you are using, and what technology generated your data (e.g. 10X)? For the last question about a single cluster being more heavily affected when not using background matrix, I have seen this before when that cluster was small and likely to be a doublet cluster. Do you think that is the case for your cluster that shows dramatic differences? Does that cluster express high levels of multiple cell type markers? |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for the very prompt reply!
#prep filtered matrix with all low quality cells removed: clean #prep sce object with raw matrix #run decontX with background argument As requested, I checked and both sce objects are “dgCMatrix”. I use celda v1.8.1. Here is my entire session info: R version 4.1.1 (2021-08-10) Matrix products: default attached base packages: other attached packages: loaded via a namespace (and not attached): Actually, I have an extra question. Does subsetting the filtered matrix provided by CellRanger affect decontX performance with background ? (because in this case, I guess all barcodes not present in the “clean” matrix are considered as empty, although some were not identified as such by EmptyDrops during processing by CellRanger).
Sorry for the long message, and thank you again for your help! |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Hi,
First of all, thank you so much for this great tool!
I would like to get your advice about the use of the raw matrix as background.
I am working on scRNAseq data from mouse tissue. My major issue is that there is a high ambient RNA contamination (for some samples, the fraction of reads assigned to cells is as low as 30% on the 10X summary), especially by acinar transcripts (our acinar cells have a huge RNA content and do not appreciate the dissociation process).
I tried DecontX on “clean” data (low quality cells were filtered out based on nFeature) and then chose to remove all cells with a score > 0.7 for downstream analysis. I found that this strategy improved clustering by removing cells I previously identified as “junk”.
However, “playing” with the z argument (using my Seurat clusters) can sometimes dramatically change the set of cells identified as “highly contaminated” (which makes sense, though) and I had some concerns about eventual changes made to gene expression that I would miss.
So, I wanted to use the raw matrix (containing empty droplets) as background to estimate and reduce the contamination in a more objective way. The problem is that the contamination score obtained with the background is very low and seems underestimated in actual bad samples (compared to score obtained in high quality samples). This prevents me from finding a proper unique threshold to filter out cells based on the DecontX score. It also seems a bit less efficient in removing contamination.
Here is an example +/- background, showing the lower score and lower correction of acinar markers in all clusers when using the background argument:
But, in the meantime, I noticed that some clusters were highly corrected without background (but not with background, see cluster 2 above). Although these clusters had mixed markers, I am worried about over-correction in the case without background.
This happened also with high quality samples (cells were FACS-sorted and the fraction of reads assigned to cells is > 97%), as highlighted with black rectangles on the example below:
Based on your experience, does the use of background argument tend to under-correct the contamination?
Do you think I could I use an arbitrary threshold for the decontX score even if it is that low?
If I do not use the background argument, how can I assess a potential over-correction of gene expression in some clusters?
I sincerely thank you for your help!
Beta Was this translation helpful? Give feedback.
All reactions