Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with LoadXenium #8265

Open
SarahE97 opened this issue Jan 4, 2024 · 21 comments
Open

Error with LoadXenium #8265

SarahE97 opened this issue Jan 4, 2024 · 21 comments

Comments

@SarahE97
Copy link

SarahE97 commented Jan 4, 2024

I ran 2 different xenium runs and when trying to use the LoadXenium function to create a seurat object, one of them works great, the other responds with this error:

xenium.obj <- LoadXenium(path, fov = "fov")
10X data contains more than one type and is being returned as a list containing matrices of each type.
|--------------------------------------------------|
|==================================================|
Error in if (ncol(x = coords) >= 3) { : argument is of length zero

Any ideas where I can start troubleshooting it? Is it possibly a data issue?
Thanks

@zskylarli
Copy link
Contributor

Hi! I'm unable to reproduce your error without the object - could you copy/paste your traceback() here? Additionally, have you checked that the required files are nested in the folder that your path points to, especially cells.csv.gz and cell_boundaries.csv.gz?
?

@SarahE97
Copy link
Author

SarahE97 commented Jan 8, 2024

When I run traceback this is the result: > xenium.obj <- LoadXenium(path, fov = "fov")

10X data contains more than one type and is being returned as a list containing matrices of each type.
==================================================
Error in if (ncol(x = coords) >= 3) { : argument is of length zero

traceback()
3: CreateCentroids.default(data$centroids)
2: CreateCentroids(data$centroids)
1: LoadXenium(path, fov = "fov")

Here's the list of present files:

list.files(path=path, pattern=NULL, all.files=FALSE, full.names=FALSE)
[1] "analysis" "analysis.zarr.zip"
[3] "analysis_summary.html" "aux_outputs"
[5] "cell_boundaries.csv.gz" "cell_boundaries.parquet"
[7] "cell_feature_matrix" "cell_feature_matrix.h5"
[9] "cell_feature_matrix.zarr.zip" "cells.csv.gz"
[11] "cells.parquet" "cells.zarr.zip"
[13] "experiment.xenium" "gene_panel.json"
[15] "Men_MSC1_Meninges_cells_stats.csv" "Men_MSC1_Meninges_coordinates.csv"
[17] "metrics_summary.csv" "morphology.ome.tif"
[19] "morphology_focus.ome.tif" "morphology_mip.ome.tif"
[21] "nucleus_boundaries.csv.gz" "nucleus_boundaries.parquet"
[23] "transcripts.csv.gz" "transcripts.parquet"
[25] "transcripts.zarr.zip"

@zskylarli
Copy link
Contributor

Likely, your cell centroids file is not being converted to a data frame correctly when reading in the file; one way to check is to use debug() on the ReadXenium function and then checking if data$centroids exists after running parts of this function that extracts cell_info from your cells.csv.gz file and creates cell_centroid_df. If you try this and anything becomes too confusing, please let me know!

@SarahE97
Copy link
Author

SarahE97 commented Jan 8, 2024

When running debug(ReadXenium), it does give me a data file with no errors, but when I try to run data$centroids it doesn't seem to exist.

debug(ReadXenium)
data <- ReadXenium(data.dir = path,type =c('centroids', 'segmentation'))
debugging in: ReadXenium(data.dir = path, type = c("centroids", "segmentation"))
debug: {
type <- match.arg(arg = type, choices = c("centroids", "segmentations"),
several.ok = TRUE)
outs <- match.arg(arg = outs, choices = c("matrix", "microns"),
several.ok = TRUE)
outs <- c(outs, type)
has_dt <- requireNamespace("data.table", quietly = TRUE) &&
requireNamespace("R.utils", quietly = TRUE)
data <- sapply(outs, function(otype) {
switch(EXPR = otype, matrix = {
pmtx <- progressor()
pmtx(message = "Reading counts matrix", class = "sticky",
amount = 0)
matrix <- suppressWarnings(Read10X(data.dir = file.path(data.dir,
"cell_feature_matrix/")))
pmtx(type = "finish")
matrix
}, centroids = {
pcents <- progressor()
pcents(message = "Loading cell centroids", class = "sticky",
amount = 0)
if (has_dt) {
cell_info <- as.data.frame(data.table::fread(file.path(data.dir,
"cells.csv.gz")))
} else {
cell_info <- read.csv(file.path(data.dir, "cells.csv.gz"))
}
cell_centroid_df <- data.frame(x = cell_info$x_centroid,
y = cell_info$y_centroid, cell = cell_info$cell_id,
stringsAsFactors = FALSE)
pcents(type = "finish")
cell_centroid_df
}, segmentations = {
psegs <- progressor()
psegs(message = "Loading cell segmentations", class = "sticky",
amount = 0)
if (has_dt) {
cell_boundaries_df <- as.data.frame(data.table::fread(file.path(data.dir,
"cell_boundaries.csv.gz")))
} else {
cell_boundaries_df <- read.csv(file.path(data.dir,
"cell_boundaries.csv.gz"), stringsAsFactors = FALSE)
}
names(cell_boundaries_df) <- c("cell", "x", "y")
psegs(type = "finish")
cell_boundaries_df
}, microns = {
pmicrons <- progressor()
pmicrons(message = "Loading molecule coordinates",
class = "sticky", amount = 0)
if (has_dt) {
tx_dt <- as.data.frame(data.table::fread(file.path(data.dir,
"transcripts.csv.gz")))
transcripts <- subset(tx_dt, qv >= mols.qv.threshold)
} else {
transcripts <- read.csv(file.path(data.dir, "transcripts.csv.gz"))
transcripts <- subset(transcripts, qv >= mols.qv.threshold)
}
df <- data.frame(x = transcripts$x_location, y = transcripts$y_location,
gene = transcripts$feature_name, stringsAsFactors = FALSE)
pmicrons(type = "finish")
df
}, stop("Unknown Xenium input type: ", otype))
}, USE.NAMES = TRUE)
return(data)
}
Browse[2]> c
10X data contains more than one type and is being returned as a list containing matrices of each type.
|--------------------------------------------------|
|==================================================|
exiting from: ReadXenium(data.dir = path, type = c("centroids", "segmentation"))

data$centroids
NULL

When running just the centroids section of code:
Browse[2]> pcents <- progressor()
Browse[2]> pcents(message = "Loading cell centroids", class = "sticky",

  •     amount = 0)
    

Browse[2]> if (has_dt) {

  •     cell_info <- as.data.frame(data.table::fread(file.path(data.dir, 
    
  •       "cells.csv.gz")))
    
  •   } else {
    
  •     cell_info <- read.csv(file.path(data.dir, "cells.csv.gz"))
    
  •   }
    

debug at #2: cell_info <- as.data.frame(data.table::fread(file.path(data.dir,
"cells.csv.gz")))
I get this error - but if I run the code in the if statement individually:
Browse[3]> cell_info <- as.data.frame(data.table::fread(file.path(data.dir,

  •       "cells.csv.gz")))
    

Browse[3]> cell_info <- read.csv(file.path(data.dir, "cells.csv.gz"))
Browse[3]> cell_centroid_df <- data.frame(x = cell_info$x_centroid,

  •     y = cell_info$y_centroid, cell = cell_info$cell_id, 
    
  •     stringsAsFactors = FALSE)
    

Browse[3]> pcents(type = "finish")
Browse[3]> cell_centroid_df
And I can see the cell_info and Cell_centroid_df as objects in the environment.

The same thing happens with segmentation:
Browse[3]> if (has_dt) {

  •     cell_boundaries_df <- as.data.frame(data.table::fread(file.path(data.dir, 
    
  •       "cell_boundaries.csv.gz")))
    
  •   } else {
    
  •     cell_boundaries_df <- read.csv(file.path(data.dir, 
    
  •       "cell_boundaries.csv.gz"), stringsAsFactors = FALSE)
    
  •   }
    

debug at #2: cell_boundaries_df <- as.data.frame(data.table::fread(file.path(data.dir,
"cell_boundaries.csv.gz")))

But if I run the same code outside of the block - (each line individually) I can load in cell_boundaries_df
Browse[4]> cell_boundaries_df <- as.data.frame(data.table::fread(file.path(data.dir,

  •       "cell_boundaries.csv.gz")))
    

Browse[4]> names(cell_boundaries_df) <- c("cell", "x", "y")
Browse[4]> psegs(type = "finish")
Browse[4]> cell_boundaries_df

Any thoughts?
Thanks so much,
Sarah

@alikhuseynov
Copy link
Contributor

@SarahE97 which branch of Seurat are you using?
does it work to read manually the centroids and segmentations files, then you can check if they are NULL or not

cell_info <- data.table::fread(file.path(data.dir, "cells.csv.gz"), data.table = FALSE)
cell_centroid_df <- data.frame(
  x = cell_info$x_centroid,
  y = cell_info$y_centroid,
  cell = cell_info$cell_id,
  stringsAsFactors = FALSE
)

@veldsla
Copy link

veldsla commented Jan 9, 2024

You are probably reading recent Xenium data. The LoadXenium function tries to load either the Blank Codeword matrix or when not present the Unassigned Codeword matrix into the assay BlankCodeword. Both are not present in my fairly recent Xenium run.

When you remove this:

  if("Blank Codeword" %in% names(data$matrix))
    xenium.obj[["BlankCodeword"]] <- CreateAssayObject(counts = data$matrix[["Blank Codeword"]])
  else
    xenium.obj[["BlankCodeword"]] <- CreateAssayObject(counts = data$matrix[["Unassigned Codeword"]])

from the LoadXenium function (use fix(LoadXenium) for a temporary fix) It probably works.

@SarahE97
Copy link
Author

SarahE97 commented Jan 9, 2024

@alikhuseynov
I'm using the most up to date Seurat version from CRAN (Seurat 5.0.1).
When I run this code:
data.dir <- "E:/20231215__212238__MeningesAB_MSC_1/output-XETG00198__0010134__Men_MSC1__20231215__212257/"
cell_info <- data.table::fread(file.path(data.dir, "cells.csv.gz"), data.table = FALSE)
cell_centroid_df <- data.frame(
x = cell_info$x_centroid,
y = cell_info$y_centroid,
cell = cell_info$cell_id,
stringsAsFactors = FALSE
)
I can pull cell_info$x_centroid or y_centroid without issue.

@SarahE97
Copy link
Author

SarahE97 commented Jan 9, 2024

@veldsla
I tried this and unfortunately had the same error. When running with debug, here's result:

xenium.obj <- LoadXenium(path, fov = "fov")
10X data contains more than one type and is being returned as a list containing matrices of each type.
|--------------------------------------------------|
|==================================================|
Error in if (ncol(x = coords) >= 3) { : argument is of length zero
Called from: CreateCentroids.default(data$centroids)

@alikhuseynov
Copy link
Contributor

@alikhuseynov I'm using the most up to date Seurat version from CRAN (Seurat 5.0.1). When I run this code: data.dir <- "E:/20231215__212238__MeningesAB_MSC_1/output-XETG00198__0010134__Men_MSC1__20231215__212257/" cell_info <- data.table::fread(file.path(data.dir, "cells.csv.gz"), data.table = FALSE) cell_centroid_df <- data.frame( x = cell_info$x_centroid, y = cell_info$y_centroid, cell = cell_info$cell_id, stringsAsFactors = FALSE ) I can pull cell_info$x_centroid or y_centroid without issue.

When running debug(ReadXenium), it does give me a data file with no errors, but when I try to run data$centroids it doesn't seem to exist.

Ok, so data it not the problem then. I think that there might be a bug.
Could you show the output of ReadXenium data like this:

data <- ReadXenium(
    data.dir = data.dir,
    type = c("centroids", "segmentations")
  )
data %>% str

@SarahE97
Copy link
Author

SarahE97 commented Jan 9, 2024

@alikhuseynov
Here's the result of that.

data %>% str
List of 12
$ :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:10730052] 0 1 8 12 15 17 20 24 27 28 ...
.. ..@ p : int [1:99609] 0 142 191 295 448 575 714 858 1048 1168 ...
.. ..@ Dim : int [1:2] 480 99608
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:480] "A2m" "Abhd3" "Acan" "Ackr4" ...
.. .. ..$ : chr [1:99608] "aaaaacgj-1" "aaaacidn-1" "aaaacjka-1" "aaabinfc-1" ...
.. ..@ x : num [1:10730052] 1 2 1 2 9 1 10 1 1 1 ...
.. ..@ factors : list()
$ :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:2227] 17 39 35 5 9 21 21 37 39 3 ...
.. ..@ p : int [1:99609] 0 0 0 0 0 0 0 0 0 0 ...
.. ..@ Dim : int [1:2] 41 99608
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:41] "NegControlCodeword_0500" "NegControlCodeword_0501" "NegControlCodeword_0502" "NegControlCodeword_0503" ...
.. .. ..$ : chr [1:99608] "aaaaacgj-1" "aaaacidn-1" "aaaacjka-1" "aaabinfc-1" ...
.. ..@ x : num [1:2227] 1 1 1 1 1 1 1 1 1 1 ...
.. ..@ factors : list()
$ :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:12643] 9 0 18 2 13 15 14 11 15 5 ...
.. ..@ p : int [1:99609] 0 0 0 0 1 1 1 2 2 2 ...
.. ..@ Dim : int [1:2] 20 99608
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:20] "NegControlProbe_00042" "NegControlProbe_00041" "NegControlProbe_00039" "NegControlProbe_00035" ...
.. .. ..$ : chr [1:99608] "aaaaacgj-1" "aaaacidn-1" "aaaacjka-1" "aaabinfc-1" ...
.. ..@ x : num [1:12643] 1 1 1 1 1 1 1 1 1 1 ...
.. ..@ factors : list()
$ : num [1:48524951] 223 224 111 157 171 ...
$ : num [1:48524951] 2159 2160 2246 2196 2248 ...
$ : chr [1:48524951] "Gpm6a" "Gpm6a" "Ppp1r1b" "Kng2" ...
$ : num [1:99608] 819 821 831 849 826 ...
$ : num [1:99608] 7128 7139 7142 7132 7110 ...
$ : chr [1:99608] "aaaaacgj-1" "aaaacidn-1" "aaaacjka-1" "aaabinfc-1" ...
$ : chr [1:1294888] "aaaaacgj-1" "aaaaacgj-1" "aaaaacgj-1" "aaaaacgj-1" ...
$ : num [1:1294888] 818 812 806 807 811 ...
$ : num [1:1294888] 7114 7127 7137 7139 7141 ...

  • attr(*, "dim")= int [1:2] 3 4
  • attr(*, "dimnames")=List of 2
    ..$ : chr [1:3] "Gene Expression" "Negative Control Codeword" "Negative Control Probe"
    ..$ : chr [1:4] "matrix" "microns" "centroids" "segmentations"

@alikhuseynov
Copy link
Contributor

ok, thanks. The good news is that it does seem to read what is needed, but for some reason it stores each variable as vector. So, there is definitely a bug. I'm not from Seurat develop team, but I can take a look tomorrow to see if there can be a quick fix.

the centroids are:

$ : num [1:99608] 819 821 831 849 826 ...
$ : num [1:99608] 7128 7139 7142 7132 7110 ...
$ : chr [1:99608] "aaaaacgj-1" "aaaacidn-1" "aaaacjka-1" "aaabinfc-1" ...

segmentations:

$ : chr [1:1294888] "aaaaacgj-1" "aaaaacgj-1" "aaaaacgj-1" "aaaaacgj-1" ...
$ : num [1:1294888] 818 812 806 807 811 ...
$ : num [1:1294888] 7114 7127 7137 7139 7141 ...

molecules coords:

$ : num [1:48524951] 223 224 111 157 171 ...
$ : num [1:48524951] 2159 2160 2246 2196 2248 ...
$ : chr [1:48524951] "Gpm6a" "Gpm6a" "Ppp1r1b" "Kng2" ...

@alikhuseynov
Copy link
Contributor

alikhuseynov commented Jan 9, 2024

I think the bug is using sapply in data <- sapply(outs, function(otype) {...
https://github.com/satijalab/seurat/blob/develop/R/preprocessing.R#L2155
since by default it returns a matrix or a vector. The solution would be to pass simplify = FALSE inside sapply. I can test and PR that to develop branch

@SarahE97
Copy link
Author

SarahE97 commented Jan 9, 2024

@alikhuseynov
Thank you very much for this, how would I modify the ReadXenium function to pass that?

@alikhuseynov
Copy link
Contributor

@alikhuseynov Thank you very much for this, how would I modify the ReadXenium function to pass that?

no problem :)
if you want to test it, just add simplify = FALSE at this line
https://github.com/satijalab/seurat/blob/develop/R/preprocessing.R#L2233
basically }, simplify = FALSE, USE.NAMES = TRUE)

@veldsla
Copy link

veldsla commented Jan 10, 2024

I can confirm that this was also an issue on my end. I totally forgot about having changed it as well.

alikhuseynov added a commit to alikhuseynov/seurat that referenced this issue Jan 10, 2024
@alikhuseynov
Copy link
Contributor

just did a PR to fix that.

@SarahE97
Copy link
Author

Thanks so much to all of you for the suggestions, unfortunately I tried adding simplify = FALSE, and got some different errors:

This is my fix (ReadXenium) code:
function (data.dir, outs = c("matrix", "microns"), type = "centroids",
mols.qv.threshold = 20)
{
type <- match.arg(arg = type, choices = c("centroids", "segmentations"),
several.ok = TRUE)
outs <- match.arg(arg = outs, choices = c("matrix", "microns"),
several.ok = TRUE)
outs <- c(outs, type)
has_dt <- requireNamespace("data.table", quietly = TRUE) &&
requireNamespace("R.utils", quietly = TRUE)
data <- sapply(outs, function(otype) {
switch(EXPR = otype, matrix = {
pmtx <- progressor()
pmtx(message = "Reading counts matrix", class = "sticky",
amount = 0)
matrix <- suppressWarnings(Read10X(data.dir = file.path(data.dir,
"cell_feature_matrix/")))
pmtx(type = "finish")
matrix
}, centroids = {
pcents <- progressor()
pcents(message = "Loading cell centroids", class = "sticky",
amount = 0)
if (has_dt) {
cell_info <- as.data.frame(data.table::fread(file.path(data.dir,
"cells.csv.gz")))
} else {
cell_info <- read.csv(file.path(data.dir, "cells.csv.gz"))
}
cell_centroid_df <- data.frame(x = cell_info$x_centroid,
y = cell_info$y_centroid, cell = cell_info$cell_id,
stringsAsFactors = FALSE)
pcents(type = "finish")
cell_centroid_df
}, segmentations = {
psegs <- progressor()
psegs(message = "Loading cell segmentations", class = "sticky",
amount = 0)
if (has_dt) {
cell_boundaries_df <- as.data.frame(data.table::fread(file.path(data.dir,
"cell_boundaries.csv.gz")))
} else {
cell_boundaries_df <- read.csv(file.path(data.dir,
"cell_boundaries.csv.gz"), stringsAsFactors = FALSE)
}
names(cell_boundaries_df) <- c("cell", "x", "y")
psegs(type = "finish")
cell_boundaries_df
}, microns = {
pmicrons <- progressor()
pmicrons(message = "Loading molecule coordinates",
class = "sticky", amount = 0)
if (has_dt) {
tx_dt <- as.data.frame(data.table::fread(file.path(data.dir,
"transcripts.csv.gz")))
transcripts <- subset(tx_dt, qv >= mols.qv.threshold)
} else {
transcripts <- read.csv(file.path(data.dir,
"transcripts.csv.gz"))
transcripts <- subset(transcripts, qv >= mols.qv.threshold)
}
df <- data.frame(x = transcripts$x_location, y = transcripts$y_location,
gene = transcripts$feature_name, stringsAsFactors = FALSE)
pmicrons(type = "finish")
df
}, stop("Unknown Xenium input type: ", otype))
}, simplify = FALSE, USE.NAMES = TRUE)
return(data)
}

This is the error I get:

fix(ReadXenium)
data <- ReadXenium(data.dir = path,type =c('segmentation','centroids'))
Error in progressor() : could not find function "progressor"

when I rerun with debug:
debug at #2: pmtx <- progressor()
Browse[4]> pmtx <- progressor()
Error during wrapup: could not find function "progressor"

@alikhuseynov
Copy link
Contributor

Thanks so much to all of you for the suggestions, unfortunately I tried adding simplify = FALSE, and got some different errors:

This is the error I get:

fix(ReadXenium)
data <- ReadXenium(data.dir = path,type =c('segmentation','centroids'))
Error in progressor() : could not find function "progressor"

when I rerun with debug: debug at #2: pmtx <- progressor() Browse[4]> pmtx <- progressor() Error during wrapup: could not find function "progressor"

progressor comes from progressr package.
just try to install Seurat with that small fix I did PR #8298
devtools::install_github(repo = "alikhuseynov/seurat", ref = "develop") and run again Xenium read and load functions.

@jsicherman
Copy link
Contributor

jsicherman commented Jan 10, 2024

@alikhuseynov thanks for the help on this! I added on a quick one liner here that will fix an issue that this will eventually hit (unrelated to the simplify = FALSE issue)

Once it merges then @SarahE97 can devtools::install_github and should be able to successfully load their data (or could devtools::install_github("jsich/seurat", ref = "js/unassigned_cw") in the meantime)

[edit] merged, so feel free to install from alikhuseynov's branch and you should be good to go until this lands in a Seurat release!

@alikhuseynov
Copy link
Contributor

@alikhuseynov thanks for the help on this! I added on a quick one liner here that will fix an issue that this will eventually hit (unrelated to the simplify = FALSE issue)

Once it merges then @SarahE97 can devtools::install_github and should be able to successfully load their data (or could devtools::install_github("jsich/seurat", ref = "js/unassigned_cw") in the meantime)

Thanks @jsicherman, just merged that. Then users are all set 😊

@SarahE97
Copy link
Author

It worked!!! Thank you all SO much. Really really appreciate it, you all are the best :)

dcollins15 pushed a commit that referenced this issue Jul 22, 2024
This was referenced Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants