Fixes upon CRAN 2.0.0 feedbacks

welch-lab · Mar 24, 2024 · 3bf4295 · 3bf4295
1 parent acaa4ce
commit 3bf4295
Show file tree

Hide file tree

Showing 24 changed files with 340 additions and 203 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: rliger
-Version: 2.0.0
-Date: 2024-03-20
+Version: 2.0.1
+Date: 2024-03-24
 Type: Package
 Title: Linked Inference of Genomic Experimental Relationships
 Description: Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.

diff --git a/NEWS.md b/NEWS.md
@@ -4,13 +4,19 @@
   - Currently we allow analysis with 10X cellranger output H5 file and H5AD file from anndata>=0.8.0
   - Writing to H5AD file should follow anndata specification otherwise the file cannot be read back to a Python seesion.
   - Writing to 10X H5 file should be carefully investigated.
+  - Consider using object backend to store information instead of active H5 binding, which cannot be serialized to RDS.
+  - Investigate whether to use existing backend implementation like HDF5Array, DelayedArray.
 - Ability to reorganize datasets
   - Allow doing something like `reorganize(ligerObj, variable = "somethingNotDataset")` and resulting in a new liger object with different ligerDataset grouping.
 - Ability to do downstream analysis on H5 data
   - Pseudo-bulk should be easy because we are just aggregating cells.
   - Wilcoxon might be a bit harder because ranks are calculated per gene but the H5 sparse data is column majored. Might need to find a fast on-disk transposition method.
-- Fix runUINMF aborting criteria
-  - UINMF is capable of running with k > number of shared genes. Don't have to abort on it.
+
+## rliger 2.0.1
+
+- Fixed wrong UINMF aborting criteria
+- Fixed example/test skipping criteria for nonexisting dependencies
+- Fixed file access issue when checking package on CRAN
 
 ## rliger 2.0.0
 
@@ -23,6 +29,7 @@
 - Added native Seurat object support for the core integration workflow
 - Added a documentation website built with pkgdown
 - Added new iNMF variant method, consensus iNMF (c-iNMF), in `runCINMF()`. Not stable.
+- Added GO enrichment dowsntream analysis in `runGOEnrich()`
 - Changed `liger` object class structure
 - Moved iNMF (previously `optimizeALS()`), UINMF (previously `optimizeALS(unshared = TRUE)`) and online iNMF (previously `online_iNMF()`) implementation to new package *RcppPlanc* with vastly improved performance. Now wrapped in `runINMF()`, `runUINMF()` and `runOnlineINMF()` respectively, and all can be invoked with `runIntegration()`.
 - Updated H5AD support to match up with Python anndata package 0.8.0 specs

diff --git a/R/ATAC.R b/R/ATAC.R
@@ -169,10 +169,13 @@ imputeKNN <- function(
 #' @export
 #' @examples
 #' \donttest{
-#' bmmc <- normalize(bmmc)
-#' bmmc <- selectGenes(bmmc)
-#' bmmc <- scaleNotCenter(bmmc)
-#' if (requireNamespace("RcppPlanc", quietly = TRUE)) {
+#' if (requireNamespace("RcppPlanc", quietly = TRUE) &&
+#'     requireNamespace("GenomicRanges", quietly = TRUE) &&
+#'     requireNamespace("IRanges", quietly = TRUE) &&
+#'     requireNamespace("psych", quietly = TRUE)) {
+#'     bmmc <- normalize(bmmc)
+#'     bmmc <- selectGenes(bmmc)
+#'     bmmc <- scaleNotCenter(bmmc)
 #'     bmmc <- runINMF(bmmc, miniBatchSize = 100)
 #'     bmmc <- quantileNorm(bmmc)
 #'     bmmc <- normalizePeak(bmmc)
@@ -362,7 +365,10 @@ linkGenesAndPeaks <- function(
 #' bmmc <- normalize(bmmc)
 #' bmmc <- selectGenes(bmmc)
 #' bmmc <- scaleNotCenter(bmmc)
-#' if (requireNamespace("RcppPlanc", quietly = TRUE)) {
+#' if (requireNamespace("RcppPlanc", quietly = TRUE) &&
+#'     requireNamespace("GenomicRanges", quietly = TRUE) &&
+#'     requireNamespace("IRanges", quietly = TRUE) &&
+#'     requireNamespace("psych", quietly = TRUE)) {
 #'     bmmc <- runINMF(bmmc)
 #'     bmmc <- quantileNorm(bmmc)
 #'     bmmc <- normalizePeak(bmmc)

diff --git a/R/GSEA.R b/R/GSEA.R
@@ -17,7 +17,12 @@
 #' @export
 #' @examples
 #' \donttest{
-#' runGSEA(pbmcPlot)
+#' if (requireNamespace("org.Hs.eg.db", quietly = TRUE) &&
+#'     requireNamespace("reactome.db", quietly = TRUE) &&
+#'     requireNamespace("fgsea", quietly = TRUE) &&
+#'     requireNamespace("AnnotationDbi", quietly = TRUE)) {
+#'     runGSEA(pbmcPlot)
+#' }
 #' }
 runGSEA <- function(
         object,
@@ -47,7 +52,13 @@ runGSEA <- function(
         cli::cli_abort(
             "Package {.pkg fgsea} is needed for this function to work.
             Please install it by command:
-            {.code BiocManager::install('fgsea')}") # nocov end
+            {.code BiocManager::install('fgsea')}")
+
+    if (!requireNamespace("AnnotationDbi", quietly = TRUE))
+        cli::cli_abort(
+            "Package {.pkg AnnotationDbi} is needed for this function to work.
+            Please install it by command:
+            {.code BiocManager::install('AnnotationDbi')}")  # nocov end
 
     .deprecateArgs(list(gene_sets = "genesets",
                         mat_w = "useW",
@@ -151,7 +162,9 @@ runGSEA <- function(
 #' # Setting `significant = FALSE` because it's hard for a gene list obtained
 #' # from small test dataset to represent real-life biology.
 #' \donttest{
-#' go <- runGOEnrich(res, group = 0, significant = FALSE)
+#' if (requireNamespace("gprofiler2", quietly = TRUE)) {
+#'     go <- runGOEnrich(res, group = 0, significant = FALSE)
+#' }
 #' }
 runGOEnrich <- function(
         result,

diff --git a/R/cINMF.R b/R/cINMF.R
@@ -1,10 +1,14 @@
 #' Perform consensus iNMF on scaled datasets
 #' @description
+#' \bold{NOT STABLE} - This is an experimental function and is subject to change.
+#'
 #' Performs consensus integrative non-negative matrix factorization (c-iNMF)
-#' to return factorized \eqn{H}, \eqn{W}, and \eqn{V} matrices. We run the
-#' regular iNMF multiple times with different random starts, and then take the
-#' consensus of frequently appearing factors from gene loading matrices, \eqn{W}
-#' and \eqn{V}. The cell factor loading \eqn{H} matrices are eventually solved
+#' to return factorized \eqn{H}, \eqn{W}, and \eqn{V} matrices. In order to
+#' address the non-convex nature of NMF, we built on the cNMF method proposed by
+#' D. Kotliar, 2019. We run the regular iNMF multiple times with different
+#' random starts, and cluster the pool of all the factors in \eqn{W} and
+#' \eqn{V}s and take the consensus of the clusters of the largest population.
+#' The cell factor loading \eqn{H} matrices are eventually solved
 #' with the consensus \eqn{W} and \eqn{V} matrices.
 #'
 #' Please see \code{\link{runINMF}} for detailed introduction to the regular
@@ -257,7 +261,7 @@ runCINMF.Seurat <- function(
         cli::cli_abort(
             "Package {.pkg RcppPlanc} is required for c-iNMF integration.
         Please install it by command:
-        {.code devtools::install_github('welch-lab/RcppPlanc')}") # nocov end
+        {.code install.packages('RcppPlanc', repos = 'https:/welch-lab.r-universe.dev')}") # nocov end
     if (nRandomStarts <= 1) {
         cli::cli_abort("{.var nRandomStarts} must be greater than 1 for taking the consensus.")
     }

diff --git a/R/import.R b/R/import.R
@@ -1,6 +1,10 @@
 #' Create liger object
 #' @description This function allows creating \linkS4class{liger} object from
 #' multiple datasets of various forms (See \code{rawData}).
+#'
+#' \bold{DO} make a copy of the H5AD files because rliger functions write to
+#' the files and they will not be able to be read back to Python. This will be
+#' fixed in the future.
 #' @param rawData Named list of datasets. Required. Elements allowed include a
 #' matrix, a \code{Seurat} object, a \code{SingleCellExperiment} object, an
 #' \code{AnnData} object, a \linkS4class{ligerDataset} object or a filename to
@@ -239,10 +243,14 @@ createLigerDataset <- function(
 #' @description
 #' For convenience, the default \code{formatType = "10x"} directly fits the
 #' structure of cellranger output. \code{formatType = "anndata"} works for
-#' current AnnData H5AD file specification (see Details). If there a customized
-#' H5 file  structure is presented, any of the \code{rawData},
+#' current AnnData H5AD file specification (see Details). If a customized H5
+#' file structure is presented, any of the \code{rawData},
 #' \code{indicesName}, \code{indptrName}, \code{genesName}, \code{barcodesName}
 #' should be specified accordingly to override the \code{formatType} preset.
+#'
+#' \bold{DO} make a copy of the H5AD files because rliger functions write to
+#' the files and they will not be able to be read back to Python. This will be
+#' fixed in the future.
 #' @details
 #' For H5AD file written from an AnnData object, we allow using
 #' \code{formatType = "anndata"} for the function to infer the proper structure.
@@ -282,7 +290,9 @@ createLigerDataset <- function(
 #' @return H5-based \linkS4class{ligerDataset} object
 #' @examples
 #' h5Path <- system.file("extdata/ctrl.h5", package = "rliger")
-#' ld <- createH5LigerDataset(h5Path)
+#' tempPath <- tempfile(fileext = ".h5")
+#' file.copy(from = h5Path, to = tempPath)
+#' ld <- createH5LigerDataset(tempPath)
 createH5LigerDataset <- function(
         h5file,
         formatType = "10X",

diff --git a/R/integration.R b/R/integration.R
@@ -384,7 +384,7 @@ runINMF.Seurat <- function(
         cli::cli_abort(
         "Package {.pkg RcppPlanc} is required for iNMF integration.
         Please install it by command:
-        {.code devtools::install_github('welch-lab/RcppPlanc')}") # nocov end
+        {.code install.packages('RcppPlanc', repos = 'https:/welch-lab.r-universe.dev')}") # nocov end
 
     barcodeList <- lapply(object, colnames)
     allFeatures <- lapply(object, rownames)
@@ -822,7 +822,7 @@ runOnlineINMF.liger <- function(
         cli::cli_abort(
             "Package {.pkg RcppPlanc} is required for online iNMF integration.
         Please install it by command:
-        {.code devtools::install_github('welch-lab/RcppPlanc')}") # nocov end
+        {.code install.packages('RcppPlanc', repos = 'https:/welch-lab.r-universe.dev')}") # nocov end
     nDatasets <- length(object) + length(newDatasets)
     barcodeList <- c(lapply(object, colnames), lapply(newDatasets, colnames))
     names(barcodeList) <- c(names(object), names(newDatasets))
@@ -1207,17 +1207,14 @@ runUINMF.liger <- function(
         cli::cli_abort(
         "Package {.pkg RcppPlanc} is required for mosaic iNMF integration with unshared features.
         Please install it by command:
-        {.code devtools::install_github('welch-lab/RcppPlanc')}")# nocov end
+        {.code install.packages('RcppPlanc', repos = 'https:/welch-lab.r-universe.dev')}")# nocov end
     barcodeList <- lapply(object, colnames)
     allFeatures <- lapply(object, rownames)
     features <- Reduce(.same, allFeatures)
 
     if (min(lengths(barcodeList)) < k) {
         cli::cli_abort("Number of factors (k={k}) should be less than the number of cells in the smallest dataset ({min(lengths(barcodeList))}).")
     }
-    if (length(features) < k) {
-        cli::cli_abort("Number of factors (k={k}) should be less than the number of shared features ({length(features)}).")
-    }
 
     bestObj <- Inf
     bestRes <- NULL
@@ -1685,12 +1682,6 @@ calcAgreement <- function(
               "i" = "e.g. {.code memCopy <- subsetLiger(object, useSlot = 'scaleData', newH5 = FALSE)}")
         )
     }
-    if (!requireNamespace("RcppPlanc", quietly = TRUE))
-        cli::cli_abort(
-            "Package {.pkg RcppPlanc} is needed for this function to work.
-            Please install it by command:
-            {.code devtools::install_github('RcppPlanc')}")
-
 
     scaled <- getMatrix(object, "scaleData", returnList = TRUE)
     scaleDataIsNull <- sapply(scaled, is.null)

diff --git a/R/visualization.R b/R/visualization.R
@@ -1068,9 +1068,11 @@ plotGeneLoadingRank <- function(
 #'     cellMeta(pbmcPlot, "leiden_cluster", "ctrl")
 #' cellMeta(pbmcPlot, "stim_cluster", "stim") <-
 #'     cellMeta(pbmcPlot, "leiden_cluster", "stim")
-#' plotSankey(pbmcPlot, "ctrl_cluster", "stim_cluster",
-#'            titles = c("control", "LIGER", "stim"),
-#'            prefixes = c("c", NA, "s"))
+#' if (requireNamespace("sankey", quietly = TRUE)) {
+#'     plotSankey(pbmcPlot, "ctrl_cluster", "stim_cluster",
+#'                titles = c("control", "LIGER", "stim"),
+#'                prefixes = c("c", NA, "s"))
+#' }
 plotSankey <- function(
         object,
         cluster1,

diff --git a/README.md b/README.md
@@ -33,6 +33,26 @@ analysis, and visualization. Users can:
 We have also designed LIGER to interface with existing single-cell analysis packages, including
 [Seurat](https://satijalab.org/seurat/).
 
+## Citation
+
+If you use LIGER in your research please cite our paper correspondingly:
+
+* Generally the *Cell* paper should be cited:
+
+>Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, VOLUME 177, ISSUE 7, P1873-1887.E17 (2019), [https://doi.org/10.1016/j.cell.2019.05.006](https://doi.org/10.1016/j.cell.2019.05.006)
+
+* For the *rliger* package:
+
+>Liu, J., Gao, C., Sodicoff, J. et al. Jointly defining cell types from multiple single-cell datasets using LIGER. Nat Protoc 15, 3632–3662 (2020), [https://doi.org/10.1038/s41596-020-0391-8](https://doi.org/10.1038/s41596-020-0391-8)
+
+* For online iNMF integration method:
+
+>Gao, C., Liu, J., Kriebel, A.R. et al. Iterative single-cell multi-omic integration using online learning. Nat Biotechnol 39, 1000–1007 (2021), [https://doi.org/10.1038/s41587-021-00867-x](https://doi.org/10.1038/s41587-021-00867-x)
+
+* For UINMF integration method:
+
+>Kriebel, A.R., Welch, J.D. UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization. Nat Commun 13, 780 (2022), [https://doi.org/10.1038/s41467-022-28431-4](https://doi.org/10.1038/s41467-022-28431-4)
+
 ## Feedback
 
 If you have any questions, comments, or suggestions, you are welcomed to [open an Issue](https://github.com/welch-lab/liger/issues)!

diff --git a/man/createH5LigerDataset.Rd b/man/createH5LigerDataset.Rd
diff --git a/man/createLiger.Rd b/man/createLiger.Rd
diff --git a/man/exportInteractTrack.Rd b/man/exportInteractTrack.Rd
diff --git a/man/linkGenesAndPeaks.Rd b/man/linkGenesAndPeaks.Rd
diff --git a/man/plotSankey.Rd b/man/plotSankey.Rd
diff --git a/man/runCINMF.Rd b/man/runCINMF.Rd
diff --git a/man/runGOEnrich.Rd b/man/runGOEnrich.Rd
diff --git a/man/runGSEA.Rd b/man/runGSEA.Rd