Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set n_genes in SCTransform #7998

Closed
SihanLiUVa opened this issue Nov 9, 2023 · 19 comments
Closed

How to set n_genes in SCTransform #7998

SihanLiUVa opened this issue Nov 9, 2023 · 19 comments
Assignees

Comments

@SihanLiUVa
Copy link

Hi! Sorry for double posting the issue in sctransform and here. According to the issue#124 in satijalab/sctransform (satijalab/sctransform#124), I am trying to increase the number of genes in the function SCTransform. Here are my codes:

group1 <- SCTransform(group1_data,vars.to.regress = "nCount_RNA", ncells = 5000, vst.flavor = "v2", return.only.var.genes = FALSE, n_genes = 5000)
I noticed that in the argument of this function, the n_genes is supposed to pass to sctransform::vst, where the n_genes is used. However, i still got this message from my console.
Get Negative Binomial regression parameters per gene
Using 2000 genes

It seems still using default 2000 genes as input. So I wonder how to pass n_genes to vst according to the scTransform document. Thank you so much in advance!

@samuel-marsh
Copy link
Collaborator

Hi,

Not member of dev team but hopefully can be helpful. Can you provide the output of sessionInfo()?

Using Seurat 5.0.0, SeuratObject5.0.0, and sctransform 0.4.1 I cannot replicate your issue.

> test <- SCTransform(pbmc)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2700 cells

> test <- SCTransform(pbmc, n_genes = 3000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 3000 genes, 2700 cells

> test <- SCTransform(pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 2700 cells

Best,
Sam

@SihanLiUVa
Copy link
Author

Hey Sam:
Thank you so much for your input! I checked the version of packages and I have Seurat 5.0.0 and scTransform 0.4.1. I will try to generate a output ASAP and see how it works.
Thank you again!

@SihanLiUVa
Copy link
Author

Hi! I tried the method you mentioned above but for some reason I still cannot change the n_genes. I am also using the same version of packages that you used. Here are my code and output

`> SCTransform(pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2700 cells
Found 71 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 12572 genes
Computing corrected count matrix for 12572 genes
Calculating gene attributes
Wall clock passed: Time difference of 10.35649 secs
Determine variable features
Centering data matrix
|===================================================================================================================================================================| 100%
Set default assay to SCT
An object of class Seurat
26286 features across 2700 samples within 2 assays
Active assay: SCT (12572 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: RNA
There were 24 warnings (use warnings() to see them)`

Would you please give me some suggestions about what could be wrong?

@samuel-marsh
Copy link
Collaborator

Can you post the output of sessionInfo()?

@SihanLiUVa
Copy link
Author

Thank you for your quick response! Here is the output:

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] patchwork_1.1.3 Matrix_1.6-1 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.3 purrr_1.0.2 readr_2.1.4
[9] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0 Seurat_5.0.0 SeuratObject_5.0.0 sp_2.1-1

loaded via a namespace (and not attached):
[1] Rtsne_0.16 colorspace_2.1-0 deldir_1.0-9 ellipsis_0.3.2 ggridges_0.5.4 XVector_0.34.0
[7] GenomicRanges_1.46.1 RcppHNSW_0.5.0 rstudioapi_0.15.0 spatstat.data_3.0-3 leiden_0.4.3 listenv_0.9.0
[13] ggrepel_0.9.4 RSpectra_0.16-1 fansi_1.0.5 sparseMatrixStats_1.6.0 codetools_0.2-19 splines_4.1.2
[19] polyclip_1.10-6 spam_2.10-0 jsonlite_1.8.7 ica_1.0-3 cluster_2.1.4 png_0.1-8
[25] uwot_0.1.16 shiny_1.7.5.1 sctransform_0.4.1 spatstat.sparse_3.0-3 compiler_4.1.2 httr_1.4.7
[31] fastmap_1.1.1 lazyeval_0.2.2 cli_3.6.1 later_1.3.1 htmltools_0.5.7 tools_4.1.2
[37] igraph_1.5.1 dotCall64_1.1-0 GenomeInfoDbData_1.2.7 gtable_0.3.4 glue_1.6.2 RANN_2.6.1
[43] reshape2_1.4.4 Rcpp_1.0.11 Biobase_2.54.0 scattermore_1.2 vctrs_0.6.4 spatstat.explore_3.2-3
[49] nlme_3.1-163 progressr_0.14.0 DelayedMatrixStats_1.16.0 lmtest_0.9-40 spatstat.random_3.1-6 globals_0.16.2
[55] timechange_0.2.0 mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.3 irlba_2.3.5.1 goftest_1.2-3
[61] future_1.33.0 zlibbioc_1.40.0 MASS_7.3-55 zoo_1.8-12 scales_1.2.1 MatrixGenerics_1.6.0
[67] hms_1.1.3 promises_1.2.1 spatstat.utils_3.0-4 SummarizedExperiment_1.24.0 parallel_4.1.2 RColorBrewer_1.1-3
[73] reticulate_1.34.0 pbapply_1.7-2 gridExtra_2.3 stringi_1.7.12 S4Vectors_0.32.4 fastDummies_1.7.3
[79] BiocGenerics_0.40.0 GenomeInfoDb_1.30.1 bitops_1.0-7 rlang_1.1.2 pkgconfig_2.0.3 matrixStats_1.0.0
[85] glmGamPoi_1.6.0 lattice_0.22-5 ROCR_1.0-11 tensor_1.5 htmlwidgets_1.6.2 cowplot_1.1.1
[91] tidyselect_1.2.0 parallelly_1.36.0 RcppAnnoy_0.0.21 plyr_1.8.9 magrittr_2.0.3 R6_2.5.1
[97] IRanges_2.28.0 generics_0.1.3 DelayedArray_0.20.0 pillar_1.9.0 withr_2.5.2 fitdistrplus_1.1-11
[103] RCurl_1.98-1.13 survival_3.5-7 abind_1.4-5 future.apply_1.11.0 crayon_1.5.2 KernSmooth_2.23-20
[109] utf8_1.2.4 spatstat.geom_3.2-7 plotly_4.10.3 tzdb_0.4.0 grid_4.1.2 data.table_1.14.8
[115] digest_0.6.33 xtable_1.8-4 httpuv_1.6.12 stats4_4.1.2 munsell_0.5.0 viridisLite_0.4.2

@samuel-marsh
Copy link
Collaborator

hmm ok.

What happens if you run this using reproducible object. If you haven't already install seurat-data package (https://github.com/satijalab/seurat-data) and the pbmc3k dataset. Then in fresh R session, run the following code and post output.

library(Seurat)

pbmc <- pbmc3k.SeuratData::pbmc3k
pbmc <- UpdateSeuratObject(pbmc)

test <- SCTransform(object = pbmc, n_genes = 1000)

Best,
Sam

@SihanLiUVa
Copy link
Author

Thanks! I ran the SCTransform on pbmc object and it worked!

>pbmc <- UpdateSeuratObject(pbmc)
Validating object structure
Updating object slots
Ensuring keys are in the proper structure
Warning: Assay RNA changing from Assay to Assay
Ensuring keys are in the proper structure
Ensuring feature names don't have underscores or pipes
Updating slots in RNA
Validating object structure for Assay ‘RNA’
Object representation is consistent with the most current Seurat version
> SCTransform(pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 2700 cells
Found 34 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 12572 genes
Computing corrected count matrix for 12572 genes
Calculating gene attributes
Wall clock passed: Time difference of 7.973257 secs
Determine variable features
Centering data matrix
  |===================================================================================================================================================================| 100%
Place corrected count matrix in counts slot
Set default assay to SCT
An object of class Seurat 
26286 features across 2700 samples within 2 assays 
Active assay: SCT (12572 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: RNA

But it is interesting that it still not working on my own dataset.

> group1.data<- UpdateSeuratObject(group1.data)
Validating object structure
Updating object slots
Ensuring keys are in the proper structure
Ensuring keys are in the proper structure
Ensuring feature names don't have underscores or pipes
Updating slots in RNA
Validating object structure for Assay5 ‘RNA’
Object representation is consistent with the most current Seurat version
> SCTransform(group1.data, n_genes = 1000)
Running SCTransform on assay: RNA
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 21110 by 5615
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
Found 102 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 21110 genes
Computing corrected count matrix for 21110 genes
Calculating gene attributes
Wall clock passed: Time difference of 28.04482 secs
Determine variable features
Centering data matrix
  |===================================================================================================================================================================| 100%
Getting residuals for block 1(of 2) for counts dataset
Getting residuals for block 2(of 2) for counts dataset
Centering data matrix
  |===================================================================================================================================================================| 100%
Finished calculating residuals for counts
Set default assay to SCT
An object of class Seurat 
43492 features across 5615 samples within 2 assays 
Active assay: SCT (21110 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: RNA
There were 24 warnings (use warnings() to see them)

So I wonder if there are something wrong with my dataset?

@samuel-marsh
Copy link
Collaborator

What happens if you recreate your object from scratch and then run SCTransform?

@SihanLiUVa
Copy link
Author

Thank you for your help! It still doesn't work if I recreate the object. ;(

@samuel-marsh
Copy link
Collaborator

Hmmmm not sure then. Hopefully someone from Seurat team might be able to figure this one out. I’ll keep thinking but not sure why it’s not working.

@SihanLiUVa
Copy link
Author

Again! Thank you so much for your help! :D

@SihanLiUVa
Copy link
Author

Hi Sam:
I figure out the problem. It seems to be the n_genes can be set to an object with Class=Assay rather than Assay. So after I used this line:

options(Seurat.object.assay.version = "v3")
group1.data <- CreateSeuratObject(Read10X(data.dir = "../Raw_data/1_3_filtered_feature_bc_matrix"))

Then the SCTransform gives me correct result:

> SCTransform(group1.data, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 21110 by 5615
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 5000 cells
Found 45 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 21110 genes
Computing corrected count matrix for 21110 genes
Calculating gene attributes
Wall clock passed: Time difference of 23.46258 secs
Determine variable features
Centering data matrix
  |===================================================================================================================================================================| 100%
Place corrected count matrix in counts slot
Set default assay to SCT
An object of class Seurat 
53395 features across 5615 samples within 2 assays 
Active assay: SCT (21110 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: RNA
There were 12 warnings (use warnings() to see them)

@SihanLiUVa
Copy link
Author

Thank you so much, Sam!

@samuel-marsh
Copy link
Collaborator

Actually wait sorry my brain was playing tricks. It looks like from your output it worked in latest version:

SCTransform(group1.data, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 21110 by 5615
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 5000 cells
…

@SihanLiUVa
Copy link
Author

Yeah you are right. When I create a object without 'options(Seurat.object.assay.version = "v3")', then the class will be 'Assay5' and the n_genes cannot be changed in SCTransform.
But if I create the object with v3, then the class is changed to 'Assay' and now it is compatible with n_genes.

@samuel-marsh
Copy link
Collaborator

Ok well we found issue now :). @saketkc hope you don't mind me tagging you here since I've seen you handle most sctransform related issues.

There is seems to be failure to pass n_genes when using V5 structure.
Reproducible example here:

library(Seurat)

> options(Seurat.object.assay.version = "v3")
> pbmc <- pbmc3k.SeuratData::pbmc3k
> pbmc <- CreateSeuratObject(counts = pbmc@assays$RNA$counts)
> test <- SCTransform(object = pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 2700 cells


> options(Seurat.object.assay.version = "v5")
> pbmc <- pbmc3k.SeuratData::pbmc3k
> pbmc <- CreateSeuratObject(counts = pbmc@assays$RNA$counts)
> test <- SCTransform(object = pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2700 cells

Best,
Sam

@saketkc
Copy link
Collaborator

saketkc commented Nov 10, 2023

Thanks for flagging it both! This is because the extra arguments currently not being passed internally (...) in StdAssay. I will fix it in v5.0.1.

@samuel-marsh
Copy link
Collaborator

Awesome. Thanks as always!!!

@saketkc saketkc self-assigned this Nov 10, 2023
@saketkc
Copy link
Collaborator

saketkc commented Nov 10, 2023

Hi @SihanLiUVa and @samuel-marsh , thanks once again for noticing this. It should be fixed by e2a956a (currently in the develop branch).

Please let me know if you notice any other issues. Thanks!

@saketkc saketkc closed this as completed Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants