Skip to content

Commit

Permalink
fix issue when merging named lists of processed spectra
Browse files Browse the repository at this point in the history
Names are now dropped and not propagated
  • Loading branch information
cpauvert committed May 6, 2024
1 parent 54cafe4 commit 62ecd77
Show file tree
Hide file tree
Showing 5 changed files with 79 additions and 3 deletions.
21 changes: 20 additions & 1 deletion R/merge_processed_spectra.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,18 @@
#' Aggregate multiple processed spectra, their associated peaks and metadata into a feature matrix and a concatenated metadata table.
#'
#' @param processed_spectra A [list] of the processed spectra and associated peaks and metadata in two possible formats:
#' * A list of **in-memory objects** (named `spectra`, `peaks`, `metadata`) produced by [process_spectra].
#' * A list of **in-memory objects** (named `spectra`, `peaks`, `metadata`) produced by [process_spectra]. Named lists will have names dropped, see Note.
#' * `r lifecycle::badge('deprecated')` A list of **paths** to RDS files produced by [process_spectra] when using the `rds_prefix` option.
#' @param remove_peakless_spectra A logical indicating whether to discard the spectra without detected peaks.
#' @param interpolate_missing A logical indicating if intensity values for missing peaks should be interpolated from the processed spectra signal or left NA which would then be converted to 0.
#'
#' @return A *n*×*p* matrix, with *n* spectra as rows and *p* features as columns that are the peaks found in all the processed spectra.
#'
#' @note When aggregating multiple runs of processed spectra, if a named list is
#' provided, note that the names will be dropped, to prevent further downstream
#' issues when these names were being appended to the rownames of the matrix
#' thus preventing downstream metadata merge.
#'
#' @seealso [process_spectra], the "Value" section in [`MALDIquant::intensityMatrix`](https://rdrr.io/cran/MALDIquant/man/intensityMatrix-functions.html)
#' @export
#' @examples
Expand Down Expand Up @@ -43,6 +48,14 @@
#' # The feature matrix has 3×6=18 spectra as rows and
#' # 35 peaks as columns
#' dim(fm_all)
#'
#' # If using a list, names will be dropped and are not propagated to the matrix.
#' \dontrun{
#' fm_all <- merge_processed_spectra(
#' list("A" = processed, "B" = processed, "C" = processed))
#' any(grepl("A|B|C", rownames(fm_all))) # FALSE
#' }
#'
merge_processed_spectra <- function(processed_spectra, remove_peakless_spectra = TRUE, interpolate_missing = TRUE) {
if (any(
is.null(processed_spectra),
Expand All @@ -68,6 +81,12 @@ merge_processed_spectra <- function(processed_spectra, remove_peakless_spectra =
processed <- processed_spectra
}

# Names at the upper level causes problems when aggregating multiple runs by
# being appended to the rownames of matrix thus preventing downstream metadata
# merge.
if(!is.null(names(processed))){
processed <- unname(processed)
}
stopifnot(is_a_processed_spectra_list(processed))

peakless <- list()
Expand Down
29 changes: 28 additions & 1 deletion dev/dereplicate-spectra.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -217,13 +217,18 @@ The current function enables the analyst to decide whether to interpolate the va
#' Aggregate multiple processed spectra, their associated peaks and metadata into a feature matrix and a concatenated metadata table.
#'
#' @param processed_spectra A [list] of the processed spectra and associated peaks and metadata in two possible formats:
#' * A list of **in-memory objects** (named `spectra`, `peaks`, `metadata`) produced by [process_spectra].
#' * A list of **in-memory objects** (named `spectra`, `peaks`, `metadata`) produced by [process_spectra]. Named lists will have names dropped, see Note.
#' * `r lifecycle::badge('deprecated')` A list of **paths** to RDS files produced by [process_spectra] when using the `rds_prefix` option.
#' @param remove_peakless_spectra A logical indicating whether to discard the spectra without detected peaks.
#' @param interpolate_missing A logical indicating if intensity values for missing peaks should be interpolated from the processed spectra signal or left NA which would then be converted to 0.
#'
#' @return A *n*×*p* matrix, with *n* spectra as rows and *p* features as columns that are the peaks found in all the processed spectra.
#'
#' @note When aggregating multiple runs of processed spectra, if a named list is
#' provided, note that the names will be dropped, to prevent further downstream
#' issues when these names were being appended to the rownames of the matrix
#' thus preventing downstream metadata merge.
#'
#' @seealso [process_spectra], the "Value" section in [`MALDIquant::intensityMatrix`](https://rdrr.io/cran/MALDIquant/man/intensityMatrix-functions.html)
#' @export
merge_processed_spectra <- function(processed_spectra, remove_peakless_spectra = TRUE, interpolate_missing = TRUE) {
Expand Down Expand Up @@ -251,6 +256,12 @@ merge_processed_spectra <- function(processed_spectra, remove_peakless_spectra =
processed <- processed_spectra
}
# Names at the upper level causes problems when aggregating multiple runs by
# being appended to the rownames of matrix thus preventing downstream metadata
# merge.
if(!is.null(names(processed))){
processed <- unname(processed)
}
stopifnot(is_a_processed_spectra_list(processed))
peakless <- list()
Expand Down Expand Up @@ -335,6 +346,14 @@ fm_all <- merge_processed_spectra(list(processed, processed, processed))
# The feature matrix has 3×6=18 spectra as rows and
# 35 peaks as columns
dim(fm_all)
# If using a list, names will be dropped and are not propagated to the matrix.
#' \dontrun{
#' fm_all <- merge_processed_spectra(
#' list("A" = processed, "B" = processed, "C" = processed))
#' any(grepl("A|B|C", rownames(fm_all))) # FALSE
#' }
#'
```

```{r tests-merge_processed_spectra}
Expand All @@ -351,6 +370,14 @@ test_that("merge_processed_spectra works", {
expect_identical(
sum(fm == 0), 0L
)
expect_no_error(
fm_multiple <- merge_processed_spectra(
list("with_name_bar" = processed_test, "with_name_foo" = processed_test)
)
)
expect_equal(
dim(fm_multiple), c(4, 26)
)
})
test_that("merge_processed_spectra works without interpolation", {
expect_no_error(
Expand Down
16 changes: 15 additions & 1 deletion man/merge_processed_spectra.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions tests/testthat/test-merge_processed_spectra.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@ test_that("merge_processed_spectra works", {
expect_identical(
sum(fm == 0), 0L
)
expect_no_error(
fm_multiple <- merge_processed_spectra(
list("with_name_bar" = processed_test, "with_name_foo" = processed_test)
)
)
expect_equal(
dim(fm_multiple), c(4, 26)
)
})
test_that("merge_processed_spectra works without interpolation", {
expect_no_error(
Expand Down
8 changes: 8 additions & 0 deletions vignettes/dereplicate-bruker-maldi-biotyper-spectra.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,14 @@ fm_all <- merge_processed_spectra(list(processed, processed, processed))
# The feature matrix has 3×6=18 spectra as rows and
# 35 peaks as columns
dim(fm_all)
# If using a list, names will be dropped and are not propagated to the matrix.
#' \dontrun{
#' fm_all <- merge_processed_spectra(
#' list("A" = processed, "B" = processed, "C" = processed))
#' any(grepl("A|B|C", rownames(fm_all))) # FALSE
#' }
#'
```


Expand Down

0 comments on commit 62ecd77

Please sign in to comment.