You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the amazing work you have been doing to enable the community to perform single-cell data analysis.
I have noticed discrepancies in FindMarkers outputs regarding the average log2FC values in the different tests I did. Using the same input data and dealing with the same genes, I noticed that the average log2FC are different whether the test used in FindMarkers is "wilcox" or "negbinom". This problem has already been raised in different issues (#6701, #6654, #6976, #4892). Solutions have been proposed as in @caodudu's comment (#6701 (comment)). As suggested by @nathanhaigh (#6701 (comment)), I propose here a reproducible example to expose these discrepancies by reproducing SCTransform v2 vigenette: https://satijalab.org/seurat/articles/sctransform_v2_vignette.html
Identical average log2FC are obtained with wilcox and LR tests but different and unrealistic high values are obtained with negbinom test (see ISG15, ISG20 and IFI6 genes).
the FoldChange function is called with the previously set data.slot and mean.fxn and returns the average log2FC values to fc.results
the FindMarkers function is called with the previously set fc.results containing the average log2FC values which appear in FindMarkers outputs
The function set in mean.fxn to compute mean expression values is the correct function to use when the test uses the log-counts in the data slot, i.e. when FindMarkerstest.use parameter is set to one of the following tests: wilcox, bimod, roc, t, LR, DESeq2, but is not correct when the test uses the counts in the counts slot, i.e. when FindMarkerstest.use parameter is set to one of the following tests: negbinom, poisson, MAST.
Setting the FindMarkersmean.fxn parameter to the correct function enables to obtain average log2FC values with negbinom test identical to those obtained with wilcox or LR tests:
If slot is set to "data" and norm.method is not "LogNormalize", then default.mean.fxn is used, which is the correct function to use for counts values, but not for log-counts values.
Thanks for your patience @lepriolc and for suggesting changes (and the PR!). I have pushed a fix here: 66e655a that should solve this. Please feel free to reopen if you still notice issues. You should be able to pull latest changes from the develop branch:
Thank you very much for the fix.
We also encountered this problem and we used the workaround of passing the right mean.fxn to the FindMarkers function. We don't want to start using the develop branch in all of our projects in the lab and passing the right mean.fxn every time is also error-prone.
Is there an expected release of a new version that includes the fix?
Hi Seurat dev team,
Thanks for the amazing work you have been doing to enable the community to perform single-cell data analysis.
I have noticed discrepancies in
FindMarkers
outputs regarding the average log2FC values in the different tests I did. Using the same input data and dealing with the same genes, I noticed that the average log2FC are different whether the test used inFindMarkers
is "wilcox" or "negbinom". This problem has already been raised in different issues (#6701, #6654, #6976, #4892). Solutions have been proposed as in @caodudu's comment (#6701 (comment)). As suggested by @nathanhaigh (#6701 (comment)), I propose here a reproducible example to expose these discrepancies by reproducing SCTransform v2 vigenette: https://satijalab.org/seurat/articles/sctransform_v2_vignette.htmlA data.frame: 5 x 5 p_val avg_log2FC pct.1 pct.2 p_val_adj
ISG15 1.505693e-159 3.508888 0.998 0.229 2.003475e-155
IFIT3 4.128835e-154 2.568440 0.961 0.052 5.493827e-150
IFI6 2.479476e-153 2.460206 0.965 0.076 3.299190e-149
ISG20 9.385626e-152 2.561808 1.000 0.666 1.248851e-147
IFIT1 2.447118e-139 2.132814 0.904 0.029 3.256136e-135
A data.frame: 5 x 5 p_val avg_log2FC pct.1 pct.2 p_val_adj
ISG15 1.011450e-258 3.508888 0.998 0.229 1.345835e-254
ISG20 1.665382e-240 2.561808 1.000 0.666 2.215957e-236
IFIT3 1.078258e-230 2.568440 0.961 0.052 1.434731e-226
IFI6 2.781566e-228 2.460206 0.965 0.076 3.701151e-224
IFIT1 2.844733e-197 2.132814 0.904 0.029 3.785202e-193
A data.frame: 5 x 5 p_val avg_log2FC pct.1 pct.2 p_val_adj
ISG15 0.000000e+00 103.311763 0.998 0.229 0.000000e+00
ISG20 0.000000e+00 41.286540 1.000 0.666 0.000000e+00
B2M 7.201407e-161 34.209536 1.000 1.000 9.582192e-157
MX1 3.615401e-132 9.806964 0.900 0.115 4.810653e-128
IFI6 4.913761e-132 13.616252 0.965 0.076 6.538251e-128
Identical average log2FC are obtained with wilcox and LR tests but different and unrealistic high values are obtained with negbinom test (see ISG15, ISG20 and IFI6 genes).
After exploring the source code of
FindMarkers
function (https://github.com/satijalab/seurat/blob/HEAD/R/differential_expression.R), if I understood well the code, the following commands are invoked:FindMarkers.Seurat
function is calleddata.use
is set to the SCT assay of Seurat objectFindMarkers.SCTAssay
function is calleddata.slot
is set to "counts" iftest.use
is set to one of the following tests: negbinom, poisson, MAST, otherwise it is set to "data"mean.fxn
is set toFoldChange
function is called with the previously setdata.slot
andmean.fxn
and returns the average log2FC values tofc.results
FindMarkers
function is called with the previously setfc.results
containing the average log2FC values which appear inFindMarkers
outputsThe function set in
mean.fxn
to compute mean expression values is the correct function to use when the test uses the log-counts in thedata
slot, i.e. whenFindMarkers
test.use
parameter is set to one of the following tests: wilcox, bimod, roc, t, LR, DESeq2, but is not correct when the test uses the counts in thecounts
slot, i.e. whenFindMarkers
test.use
parameter is set to one of the following tests: negbinom, poisson, MAST.Setting the
FindMarkers
mean.fxn
parameter to the correct function enables to obtain average log2FC values with negbinom test identical to those obtained with wilcox or LR tests:A data.frame: 5 x 5 p_val avg_log2FC pct.1 pct.2 p_val_adj
ISG15 0.000000e+00 3.5088882 0.998 0.229 0.000000e+00
ISG20 0.000000e+00 2.5618077 1.000 0.666 0.000000e+00
B2M 7.201407e-161 0.5980104 1.000 1.000 9.582192e-157
MX1 3.615401e-132 1.9311875 0.900 0.115 4.810653e-128
IFI6 4.913761e-132 2.4602058 0.965 0.076 6.538251e-128
Thus, I suggest this correction in R/differential_expression.R:
Replace the following lines in
FindMarkers.SCTAssay
(lines 755 to 760):by:
The
FoldChange.Assay
function may also need to be corrected since it may generate incorrect average log2FC with SCTransform log-counts:If
slot
is set to "data" andnorm.method
is not "LogNormalize", thendefault.mean.fxn
is used, which is the correct function to use for counts values, but not for log-counts values.Here is the output of sessionInfo():
I hope my post can help !
Christophe
The text was updated successfully, but these errors were encountered: