-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dream(): Evaluate function error --> "Error evaluating fxn:" #66
Comments
Can you show sessionInfo(). This can happen if the covariates are almost colinear, or if the residual variance is near zero. Can you provide a reproducible example? |
As I mentioned, when calling
How can I know in which gene I have the error? At the moment I can't share the data, but you can see some features of the arguments here:
Here you can find the sessionInfo():
|
Hi Aida, But you can iterate through each gene to find where the error is: for(j in 1:nrow(vobjDream_obj)){
message(j)
fit = dream(vobjDream_obj[j,], form_model, metadata_obj, L)
} A few options for the issue is
I want to fix this for other users, so please let me know if you find the answer Gabriel |
Hi, I attach a reproducible example which uses testing data host at Molgenis server:
You can see that the execution stops when we run Error in dream(vobjDream, form, aggregate_metadata, L) : Lately, we have done a couple of checks (lines 83-115) trying to answer your questions about the potential reasons of the errors, but apparently, the error is not due to any of those cases. I hope you can find the reason. At the moment, I'm just running Thanks a lot! #!/usr/bin/env Rscript
######## Install and load required packages ########
shhh <- suppressPackageStartupMessages
shhh(library(data.table))
shhh(library(lme4))
shhh(library(plyr))
shhh(library(dplyr))
shhh(library(tidyr))
shhh(library(tibble))
shhh(library(reshape2))
shhh(library(stringi))
shhh(library(stringr))
shhh(library(edgeR))
shhh(library(limma))
shhh(library(variancePartition))
shhh(library(BiocParallel))
shhh(library(caret))
######## Read testing input files ########
# Example: Gene expression (DGEList object) --> 500 genes x 981 samples
## Gene expression (DGEList object)
geneExpr.url <- 'https://molgenis26.gcc.rug.nl/downloads/marta-transcriptomics/limma_pseudobulk_error/geneExpr.rds'
geneExpr <- readRDS(url(geneExpr.url))
## Donor metadata (data.frame object)
aggregate_metadata.url <- 'https://molgenis26.gcc.rug.nl/downloads/marta-transcriptomics/limma_pseudobulk_error/aggregate_metadata.rds'
aggregate_metadata <- readRDS(url(aggregate_metadata.url))
######## Dream() ########
# Prepare data form dream(): testing Sex differences and correcting for Age (fixed) and date (random) factors
form_vars <- "~Sex+Age+(1|date)"
form <- as.formula(form_vars)
vobjDream = voomWithDreamWeights(geneExpr, form, aggregate_metadata)
comb <- c('Sex1', 'Sex2')
contrast <- 'Sex'
L = getContrast(vobjDream, form, aggregate_metadata, comb[[2]])
# Dream()
## 1. Regular: dream() on the whole vobjDream --> to reproduce the error (we don't know the gene/s)
fit = dream(vobjDream, form, aggregate_metadata, L)
## 2. Alternative: dream() by gene using try() --> to pick the gene/s that gives the error
dream_list <- list()
for(j in 1:nrow(vobjDream)){
g <- rownames(vobjDream)[j]
print(paste0(j, ': ', g))
fit = try(dream(vobjDream[j,], form, aggregate_metadata, L))
limma_res <- NA
g_report <- TRUE
if(class(fit)!='try-error'){
fit = eBayes(fit)
limma_res <- topTable(fit, coef=c('L1'), number=length(fit$F.p.value))
limma_res <- limma_res %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
arrange(adj.P.Val)
}else{
fit <- NA
g_report <- FALSE
}
names(g_report) <- g
dream_list[[g]] <- list(fit = fit,
top_table = limma_res,
g_report = g_report)
}
### 2.1 Get only the genes that do not give an error
g_passed <- unlist(unname(lapply(dream_list, function(x) x$g_report)))
g_out.names <- names(which(g_passed==FALSE))
dream_list.passed <- dream_list[g_passed]
g_out.tag <- paste(g_out.names, collapse = ', ')
print(paste0('Genes with an error in dream(): ', g_out.tag))
### 2.2 Get the toptable for all genes (data frame)
limma_res <- do.call("rbind", lapply(dream_list.passed, function(x) x$top_table))
limma_res$adj.P.Val <- p.adjust(limma_res$P.Value, method = "BH")
limma_res <- limma_res[order(limma_res$adj.P.Val),]
### 2.3 Get the fit objects for all genes (list)
fit <- lapply(dream_list.passed, function(x) x$fit)
######## Checks for potential errors ########
## Is Age coded as numeric? --> It is not the case
lapply(aggregate_metadata, class)
str(aggregate_metadata)
## If there are missing expression values --> It is not the case
lapply(geneExpr, function(x) table(is.na(x)))
## If you have a gene with no variance
### Variance --> It is not the case
var_by_gene <- apply(vobjDream[[2]], 1, var)
summary(var_by_gene)
min_var <- sort(var_by_gene)[1] #the gene that gives the error --> RPL11
max_var <- sort(var_by_gene,decreasing = T)[1]
hist(apply(vobjDream[[2]], 1, var))
plot(density(apply(vobjDream[[2]], 1, var)))
### Also check nearZeroVar (with caret::nearZeroVar) --> It is not the case
nearZeroVar.idx <- nearZeroVar(t(vobjDream[[2]]))
nearZeroVar.metrics <- nearZeroVar(t(vobjDream[[2]]), saveMetrics=TRUE)
## If you have a gene with no variance within a category. For example, if gene j is constant in females. --> --> It is not the case
check.func <- function(g_id, vobjDream_obj = vobjDream, metadata_obj = aggregate_metadata, contrast_var = contrast){
print(g_id)
gene_donor.expr <- data.frame(gene_expr = unname(vobjDream_obj[[2]][g_id,]),
donor_id = names(vobjDream_obj[[2]][g_id,]))
metadata_obj$donor_id <- rownames(metadata_obj)
metadata_obj.gene <- merge(metadata_obj, gene_donor.expr, by = 'donor_id')
var_by_contrast <- lapply(split(metadata_obj.gene, metadata_obj.gene[[contrast_var]]), function(x) var(x$gene_expr))
return(var_by_contrast)
}
check.res <- sapply(g_out.names, function(i) check.func(i), simplify = FALSE)
check.res |
Thanks for the example. I can reproduce the error now and I am looking in to a solution. |
There are two ways to deal with an error like this. 1) detect the failure internally and deal with it gracefully. That takes a lot of engineering, and is a medium term solution. 2) fix the input leading to the error. Since my colleagues and I use this function every day and haven't see this error, lets start with (2). Here I recreate the error and find that the input is the cause of the problem. It's a very pathological case that seems to be extremely rare. shhh <- suppressPackageStartupMessages
shhh(library(lme4))
shhh(library(plyr))
shhh(library(edgeR))
shhh(library(variancePartition))
######## Read testing input files ########
# Example: Gene expression (DGEList object) --> 500 genes x 981 samples
## Gene expression (DGEList object)
geneExpr.url <- 'https://molgenis26.gcc.rug.nl/downloads/marta-transcriptomics/limma_pseudobulk_error/geneExpr.rds'
geneExpr <- readRDS(url(geneExpr.url))
## Donor metadata (data.frame object)
aggregate_metadata.url <- 'https://molgenis26.gcc.rug.nl/downloads/marta-transcriptomics/limma_pseudobulk_error/aggregate_metadata.rds'
aggregate_metadata <- readRDS(url(aggregate_metadata.url))
# make sure your meta-data matches the expression data
# This changes the result, but the error still comes ups
idx = match(colnames(geneExpr), rownames(aggregate_metadata) )
aggregate_metadata = aggregate_metadata[idx,]
######## Dream() ########
form <- ~ Sex + Age + (1|date)
vobjDream = voomWithDreamWeights(geneExpr, form, aggregate_metadata, plot=TRUE)
# fit dream() one gene 336 to get error
fit = dream(vobjDream[336,], form, aggregate_metadata)
# Dividing work into 1 chunks...
# Error: BiocParallel errors
# 1 remote errors, element index: 1
# 0 unevaluated and other errors
# first remote error:
# Error in devfun(rho$pp$theta): Downdated VtV is not positive definite But why does this fail? Lets examine the
So we can reproduce the error at a lower level, that's a start. Lets examine the weights range(w)
# [1] 4.800174e+02 5.958613e+16 That is a HUGE weight. That, combined with the fact that there is no error when I exclude the weights, means that the weights are the issue. In your case, a different gene had a huge weight (since I sorted the metadata rows). The question is why. Lets look at the mean-variance trend from voom. In the right of the plot, the trend line is a very poor fit to the data. More importantly it dives towards zero. Since the voom weights are based on the reciprocal of that line, you get huge weights like 5e16 that cause some numerical instability in the linear algebra in the model fit. If you reduce the So it's important to look at the mean-variance trend from voom to find pathological cases. Of course, this is test data so the error on your real dataset might be different. Does this fix your issue? I the short term, I think you can avoid the issue by checking the voom plot and the weights. In the longer term, I will add a check of the weights to detect pathological cases and build in better failure detection. Gabriel |
Hi Aida, Gabriel |
Hi again, As we had very few cases showing this type of error, I used the workaround you suggested to catch and skip these cases by passing to Here you have the code to reproduce the two examples (small and big datasets): You can see that in the small dataset, for instance, the 'RPS4Y1' gene is showing very different t, P.Value, adj.P.Val, B, and z.std when we compare the regular and the alternative approach. These types of differences are also observed in the other genes. #!/usr/bin/env Rscript
######## Install and load required packages ########
shhh <- suppressPackageStartupMessages
shhh(library(data.table))
shhh(library(lme4))
shhh(library(plyr))
shhh(library(dplyr))
shhh(library(tidyr))
shhh(library(tibble))
shhh(library(reshape2))
shhh(library(stringi))
shhh(library(stringr))
shhh(library(edgeR))
shhh(library(limma))
shhh(library(variancePartition))
shhh(library(BiocParallel))
shhh(library(caret))
############################################################################
#################### SMALL DATASET (13 genes x 45 samples) #################
############################################################################
######## Read testing input files ########
# Example: Gene expression (DGEList object) --> 13 genes x 45 samples
## Gene expression (DGEList object)
geneExpr.url <- 'https://molgenis26.gcc.rug.nl/downloads/marta-transcriptomics/limma_pseudobulk_error/pilot3_subset.geneExpr.rds'
geneExpr <- readRDS(url(geneExpr.url))
## Donor metadata (data.frame object)
aggregate_metadata.url <- 'https://molgenis26.gcc.rug.nl/downloads/marta-transcriptomics/limma_pseudobulk_error/pilot3_subset.aggregate_metadata.rds'
aggregate_metadata <- readRDS(url(aggregate_metadata.url))
## Make sure your meta-data matches the expression data
idx = match(colnames(geneExpr), rownames(aggregate_metadata))
aggregate_metadata = aggregate_metadata[idx,]
######## Dream() ########
# Prepare data form dream(): testing Sex differences and correcting for Age (fixed) and date (random) factors
form_vars <- "~Sex+Age+(1|date)"
form <- as.formula(form_vars)
vobjDream = voomWithDreamWeights(geneExpr, form, aggregate_metadata)
# Dream()
comb <- c('Sex1', 'Sex2')
contrast <- 'Sex'
L = getContrast(vobjDream, form, aggregate_metadata, comb[[2]])
## 1. Regular: dream() on the whole vobjDream --> to reproduce the error (we don't know the gene/s)
fit = dream(vobjDream, form, aggregate_metadata, L)
fit.reg = eBayes(fit)
limma_res <- topTable(fit.reg, coef=c('L1'), number=length(fit.reg$F.p.value))
limma_res <- limma_res %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
arrange(adj.P.Val)
limma_res.reg <- limma_res[order(limma_res$P.Value),]
## 2. Alternative: dream() by gene using try() --> to pick the gene/s that gives the error
dream_list <- list()
for(j in 1:nrow(vobjDream)){
g <- rownames(vobjDream)[j]
print(paste0(j, ': ', g))
fit = try(dream(vobjDream[j,], form, aggregate_metadata, L))
limma_res <- NA
g_report <- TRUE
if(class(fit)!='try-error'){
fit = eBayes(fit)
limma_res <- topTable(fit, coef=c('L1'), number=length(fit$F.p.value))
limma_res <- limma_res %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
arrange(adj.P.Val)
}else{
fit <- NA
g_report <- FALSE
}
names(g_report) <- g
dream_list[[g]] <- list(fit = fit,
top_table = limma_res,
g_report = g_report)
}
### 2.1 Get only the genes that do not give an error
g_passed <- unlist(unname(lapply(dream_list, function(x) x$g_report)))
g_out.names <- names(which(g_passed==FALSE))
dream_list.passed <- dream_list[g_passed]
g_out.tag <- paste(g_out.names, collapse = ', ')
print(paste0('Genes with an error in dream(): ', g_out.tag))
### 2.2 Get the toptable for all genes (data frame)
limma_res <- do.call("rbind", lapply(dream_list.passed, function(x) x$top_table))
limma_res$adj.P.Val <- p.adjust(limma_res$P.Value, method = "BH")
limma_res.alt <- limma_res[order(limma_res$adj.P.Val),]
rownames(limma_res.alt) <- NULL
limma_res.alt <- limma_res.alt[match(limma_res.reg$gene, limma_res.alt$gene),]
### 2.3 Get the fit objects for all genes (list)
fit.alt <- lapply(dream_list.passed, function(x) x$fit)
######## Check differences ########
# Overview
limma_res.list <- list(regular = limma_res.reg,
alternative = limma_res.alt)
print(limma_res.list)
# Modify P.Value and adj.P.Val
limma_res_mod.list <- lapply(limma_res.list, function(x){
x$P.Value_log10 <- -log10(x$P.Value)
x$adj.P.Val_log10 <- -log10(x$adj.P.Val)
return(x)
})
# Check differences
vars_int <- colnames(limma_res_mod.list[[1]])[colnames(limma_res_mod.list[[1]])!='gene']
vars_int.out <- sapply(vars_int, function(i){
print(i)
i_list <- lapply(limma_res_mod.list, function(x) x[[i]])
cor_r <- Reduce("cor", i_list)
print(paste0('Pearson correlation (between regular and alternative): ', cor_r))
j <- names(i_list)[1]
summary_out <- sapply(names(i_list), function(j){
print(paste0('Summary --> ', j, ':'))
print(summary(i_list[[j]]))
}, simplify = FALSE)
cat('\n')
return(NULL)
})
# # Plots
# plot(-log10(limma_res.reg$P.Value),-log10(limma_res.alt$P.Value))
# plot(limma_res.reg$B, limma_res.alt$B)
############################################################################
###################### BIG DATASET (4 genes x 981 samples) #################
############################################################################
######## Read testing input files ########
# Example: Gene expression (DGEList object) --> 13 genes x 45 samples
## Gene expression (DGEList object)
geneExpr.url <- 'https://molgenis26.gcc.rug.nl/downloads/marta-transcriptomics/limma_pseudobulk_error/large_subset.geneExpr.rds'
geneExpr <- readRDS(url(geneExpr.url))
## Donor metadata (data.frame object)
aggregate_metadata.url <- 'https://molgenis26.gcc.rug.nl/downloads/marta-transcriptomics/limma_pseudobulk_error/large_subset.aggregate_metadata.rds'
aggregate_metadata <- readRDS(url(aggregate_metadata.url))
## Make sure your meta-data matches the expression data
idx = match(colnames(geneExpr), rownames(aggregate_metadata))
aggregate_metadata = aggregate_metadata[idx,]
######## Dream() ########
# Prepare data form dream(): testing Sex differences and correcting for Age (fixed) and date (random) factors
form_vars <- "~Sex+Age+(1|date)"
form <- as.formula(form_vars)
vobjDream = voomWithDreamWeights(geneExpr, form, aggregate_metadata)
# Dream()
comb <- c('Sex1', 'Sex2')
contrast <- 'Sex'
L = getContrast(vobjDream, form, aggregate_metadata, comb[[2]])
## 1. Regular: dream() on the whole vobjDream --> to reproduce the error (we don't know the gene/s)
fit = dream(vobjDream, form, aggregate_metadata, L)
fit.reg = eBayes(fit)
limma_res <- topTable(fit.reg, coef=c('L1'), number=length(fit.reg$F.p.value))
limma_res <- limma_res %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
arrange(adj.P.Val)
limma_res.reg <- limma_res[order(limma_res$P.Value),]
## 2. Alternative: dream() by gene using try() --> to pick the gene/s that gives the error
dream_list <- list()
for(j in 1:nrow(vobjDream)){
g <- rownames(vobjDream)[j]
print(paste0(j, ': ', g))
fit = try(dream(vobjDream[j,], form, aggregate_metadata, L))
limma_res <- NA
g_report <- TRUE
if(class(fit)!='try-error'){
fit = eBayes(fit)
limma_res <- topTable(fit, coef=c('L1'), number=length(fit$F.p.value))
limma_res <- limma_res %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
arrange(adj.P.Val)
}else{
fit <- NA
g_report <- FALSE
}
names(g_report) <- g
dream_list[[g]] <- list(fit = fit,
top_table = limma_res,
g_report = g_report)
}
### 2.1 Get only the genes that do not give an error
g_passed <- unlist(unname(lapply(dream_list, function(x) x$g_report)))
g_out.names <- names(which(g_passed==FALSE))
dream_list.passed <- dream_list[g_passed]
g_out.tag <- paste(g_out.names, collapse = ', ')
print(paste0('Genes with an error in dream(): ', g_out.tag))
### 2.2 Get the toptable for all genes (data frame)
limma_res <- do.call("rbind", lapply(dream_list.passed, function(x) x$top_table))
limma_res$adj.P.Val <- p.adjust(limma_res$P.Value, method = "BH")
limma_res.alt <- limma_res[order(limma_res$adj.P.Val),]
rownames(limma_res.alt) <- NULL
limma_res.alt <- limma_res.alt[match(limma_res.reg$gene, limma_res.alt$gene),]
### 2.3 Get the fit objects for all genes (list)
fit.alt <- lapply(dream_list.passed, function(x) x$fit)
######## Check differences ########
# Overview
limma_res.list <- list(regular = limma_res.reg,
alternative = limma_res.alt)
print(limma_res.list)
# Modify P.Value and adj.P.Val
limma_res_mod.list <- lapply(limma_res.list, function(x){
x$P.Value_log10 <- -log10(x$P.Value)
x$adj.P.Val_log10 <- -log10(x$adj.P.Val)
return(x)
})
# Check differences
vars_int <- colnames(limma_res_mod.list[[1]])[colnames(limma_res_mod.list[[1]])!='gene']
vars_int.out <- sapply(vars_int, function(i){
print(i)
i_list <- lapply(limma_res_mod.list, function(x) x[[i]])
cor_r <- Reduce("cor", i_list)
print(paste0('Pearson correlation (between regular and alternative): ', cor_r))
j <- names(i_list)[1]
summary_out <- sapply(names(i_list), function(j){
print(paste0('Summary --> ', j, ':'))
print(summary(i_list[[j]]))
}, simplify = FALSE)
cat('\n')
return(NULL)
})
# # Plots
# plot(-log10(limma_res.reg$P.Value),-log10(limma_res.alt$P.Value))
# plot(limma_res.reg$B, limma_res.alt$B)
Thank you a lot in advance! Aida |
Hi Aida,
dream and limma are both meant to be run on all genes at the same time. Earlier I suggested testing dream on one gene at a time for debugging purposes, but I didn't intend it for real analysis. The empirical Bayes step only makes sense if you are borrowing information across many genes. Running it on the result of 1 gene is not useful. So the difference you are seeing is due to Also, this explains why there is a difference between the full and gene-wise analyses only in the small dataset. The empirical Bayes step improves performance for small dataset, but as sample size increases its effect becomes very minimal. So for large datasets, full and gene-wise analyses are very similar. This is based on my understanding of your issue, but I didn't test it out in your code. Best, |
Thanks for your quick answer! I was not considering that Then, I guess one option could be running Another option you suggested was to reduce the With all this in mind, do you have any suggestions? I assume that, ideally, there should be a way to detect this type of errors internally when running Thanks a lot! |
When I run the current version of The maximum precision weight is 4.30033e+17, suggesting a poor smoothing fit
on the mean-variance plot for large expression values. Such large weights can
have unexpected effects downstream. Consider examining the mean-variance plot
and reducing the span parameter. This gives a problematic voom plot we described above. This means that the voom trend produced precision weights that are nonsense (i.e. 4.30033e+17). You could just detect and remove genes with super high weights. Running dream for each gene separately is never the answer unless you are debugging something. As far as changing the vobjDream = voomWithDreamWeights(geneExpr, form, aggregate_metadata, plot=TRUE, span=.1) So I have no issue with changing it slightly. Are you doing this on bulk RNA-seq, or is this single cell RNA-seq converted to pseudobulk? Or something else? Gabriel |
Nice, the variancePartition I was using didn't give this warning, so, this is much more informative :) Then, the options would be:
Is it right? About the type of data, I'm using pseudobulk data. Specifically, I use the sum to aggregate the scRNA-seq data, and then filtering by gene expression + normalising as it was bulkRNA-seq. For filtering, I use Thank you very much, |
Yes, those are the two options. Either choice is fine. Concerning pseudobulk from scRNA-seq, this is just one of the issues that comes up. I have a solution that solves these problems and others, and also extends dream. I’ll have public software and a biorxiv preprint in 2 weeks. Gabriel |
Wow, I can't wait to read it! Btw, when using the pseudobulk mean data for the same subset of genes in the big dataset (from previously normalized scRNA-seq data), instead of the pseudobulk sum data (+bulk normalization), we don't find the issue anymore. Actually, you can see how different is the distribution of the maximum gene weights computed by # pseudobulk sum
vobjDream_sum.max <- apply(vobjDream_sum, 1, max)
summary(vobjDream_sum.max )
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 2.743 2.743 3.875 4.257 5.094 13.494
head(sort(vobjDream_sum.max, decreasing=T))
# RPL11 RPL22 CD52 SH3BGRL3 LAPTM5 IFI6
# 13.493915 11.961673 11.465821 10.389491 9.699667 9.016967
# pseudobulk mean
vobjDream_mean.max <- apply(vobjDream_mean, 1, max)
summary(vobjDream_mean.max)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 14.30 14.30 14.32 14.44 14.44 17.42
head(sort(vobjDream_mean.max, decreasing=T))
# RPL11 RPL22 CD52 SH3BGRL3 LAPTM5 CDC42
# 17.41889 16.97915 16.79101 16.24833 15.92449 15.69318 Thanks again, Aida |
Hi,
I'm calling
variancePartition::dream()
function to fit a mixed-model and I'm getting the following error:Error in dream(vobjDream_obj, form_model, metadata_obj, L) :
Error evaluating fxn:
Error in eval_f(x, ...): Downdated VtV is not positive definite
I've gone through the
variancePartition::dream()
function and it seems the error appears when evaluating the function (line 582-604). Do you have any recommendations to handle the error and continue the execution for the genes which do not face this problem (withtryCatch()
for instance). I'm asking this sincebiocParallel::book()
is used to make the evaluation, and there's maybe something to consider regarding when/where to handle the error.Also, I'm wondering whether I could do some previous check on the exprObj or data
variancePartition::dream()
arguments before callingvariancePartition::dream()
function to see what could be going on...Moreover, what I found a bit strange is that the error Error in eval_f(x, ...): Downdated VtV is not positive definite seems to be related to the complete separation problem, which I haven't faced before and it seems more related to logistic models...
Thanks a lot,
Aida
The text was updated successfully, but these errors were encountered: