Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results between absoluted and doabsoluted? #26

Closed
xiasijian opened this issue Feb 24, 2023 · 20 comments
Closed

Different results between absoluted and doabsoluted? #26

xiasijian opened this issue Feb 24, 2023 · 20 comments

Comments

@xiasijian
Copy link

xiasijian commented Feb 24, 2023

The question is below

@xiasijian xiasijian changed the title Difference Different results between absoluted and doabsoluted? Feb 24, 2023
@xiasijian
Copy link
Author

xiasijian commented Feb 24, 2023

image

code is below
#####################################################
DoAbsolute(Seg = cnvkit_res, Maf = maf_file,
platform = "Illumina_WES",
copy.num.type = "total",
temp.dir="./temp/",
results.dir = "output_Doabsolute",
nThread = 1,
sigma.p = 0,
max.sigma.h = 0.2,
min.ploidy = 2,
max.ploidy = 10,
keepAllResult = FALSE,
primary.disease = "Gastric Cancer",
clean.temp = FALSE,
max.as.seg.count = 3000,
max.non.clonal = 0.05,
max.neg.genome = 0.005,
min.mut.af = 0.02,
min.no.mut = 5,
verbose = TRUE)
#########################################################
RunAbsolute(seg.dat.fn = paste0(outdir,patient,"_test_output.seg"),
maf.fn = paste0(outdir,patient,"_test_output.maf"),
platform = "Illumina_WES",
copy_num_type = "total",
sigma.p=0,
results.dir = "output_absolute",
sample.name=patient,
primary.disease= "Gastric Cancer",
max.sigma.h=0.2,
min.ploidy=2,
max.ploidy=10,
max.as.seg.count=3000,
max.neg.genome=0.005,
max.non.clonal=0.05,
verbose = TRUE,
min.mut.af=0.02)

@xiasijian
Copy link
Author

image

@ShixiangWang
Copy link
Owner

Hi, @xiasijian

Thanks for your report about the comparison. As the DoAbsolute is just an open-source wrapper of ABSOLUTE, there are only two reasons for explaining the minor difference.

  1. The used versions of ABSOLUTE are different.
  2. Preprocessing did or default options used by the DoAbsolute.

I guess you are using the same version of ABSOLUTE, so the point 2 would be the only reason.

The code related to the reason has been given as the below.

#' More detail about how to analyze ABSOLUTE results please see [this link](http://software.broadinstitute.org/cancer/software/genepattern/analyzing-absolute-data).
#' @param Seg a `data.frame` or a file (path) contains columns
#' "Sample", "Chromosome", "Start", "End", "Num_Probes", "Segment_Mean".
#' @param Maf MAF, default is `NULL`, can provided as `data.frame` or file path.
#' @param sigma.p Provisional value of excess sample level variance used for mode search. Default: 0
#' @param max.sigma.h Maximum value of excess sample level variance (Eq. 6). Default: 0.2
#' @param min.ploidy Minimum ploidy value to consider. Solutions implying lower ploidy values will be discarded. Default: 0.5
#' @param max.ploidy Maximum ploidy value to consider. Solutions implying greater ploidy values will be discarded. Default: 10
#' @param primary.disease Primary disease of the sample. Default: `NA`
#' @param platform one of "SNP_6.0", "Illumina_WES", "SNP_250K_STY". Default: "SNP_6.0"
#' @param temp.dir directory path used to store tempory files. Default: Absolute subdirectory under `tempdir()`
#' @param clean.temp if `TRUE`, auto-clean temp dir at the end. Default: `FALSE`
#' @param results.dir directory path used to store result files. Default: work directory
#' @param max.as.seg.count Maximum number of allelic segments. Samples with a higher segment count will be flagged as 'failed'. Default: 1500
#' @param max.non.clonal Maximum genome fraction that may be modeled as non-clonal (subclonal SCNA). Solutions implying greater values will be discarded. Default: 0.05
#' @param max.neg.genome Maximum genome fraction that may be modeled as non-clonal with copy-ratio below that of clonal homozygous deletion. Solutions implying greater values will be discarded. Default: 0.005
#' @param copy.num.type The type of copy number to be handled. Either total or allelic. Total is what this package for. Default: "total"
#' @param min.mut.af Minimum mutation allelic fraction. Mutations with lower allelic fractions will be filtered out before analysis. Default: 0.1
#' @param min.no.mut Minor allele frequency file, or NULL if one is not available. This specifies the data for somatic point mutations to be used by ABSOLUTE. Default: 5
#' @param verbose if `TRUE`, print extra info. Default: `FALSE`
#' @param nThread number of cores used for computation. Default: 1L
#' @param keepAllResult if `TRUE`, clean all results, otherwise clean result directory and keep most important results. Default: `TRUE`
#' @param recover if `TRUE`, recover previous unfinished work.
#' This is helpful when program stop unexpectedly when `clean.temp` is FALSE. Default: `FALSE`
#' @author Shixiang Wang <w_shixiang@163.com>
#' @return NULL
#' @import foreach doParallel data.table utils parallel
#' @export
#' @references Carter, Scott L., et al. "Absolute quantification of somatic DNA alterations in human cancer." Nature biotechnology 30.5 (2012): 413.
#'
DoAbsolute <- function(Seg, Maf = NULL,
sigma.p = 0, max.sigma.h = 0.2, min.ploidy = 0.5, max.ploidy = 10,
primary.disease = NA, platform = c("SNP_6.0", "Illumina_WES", "SNP_250K_STY"),
temp.dir = file.path(tempdir(), "Absolute"), clean.temp = FALSE,
results.dir = getwd(), max.as.seg.count = 1500,
max.non.clonal = 0.05, max.neg.genome = 0.005, copy.num.type = c("total", "allelic"),
min.mut.af = 0.1, min.no.mut = 5, verbose = FALSE, nThread = 1L, keepAllResult = TRUE,
recover = FALSE) {

DoAbsolute/R/DoAbsolute.R

Lines 136 to 141 in 8748cfb

Seg$Chromosome <- as.character(Seg$Chromosome)
Seg$Chromosome <- gsub(pattern = "chr", replacement = "", Seg$Chromosome, ignore.case = TRUE)
Seg$Chromosome <- gsub(pattern = "X", replacement = "23", Seg$Chromosome, ignore.case = TRUE)
if (verbose) cat("-> Keeping only chr 1-23 for CNV data...\n")
autosome <- as.character(seq(1, 23))
Seg <- Seg[Chromosome %in% autosome, ]

maf <- maf[(t_alt_count / (t_ref_count + t_alt_count)) >= min.mut.af, ]

@xiasijian
Copy link
Author

Thanks for your quick reply, but I found a problem in your example data.

@xiasijian
Copy link
Author

image

image

当我抽取一定数量取跑的时候,测试数据在两种软件的纯度结果就有差异。如上图

@xiasijian
Copy link
Author

然而,如果只是用原的测试数据,两者的结果是一样的。

@ShixiangWang
Copy link
Owner

Thanks.

@xiasijian
Copy link
Author

I do not know why this is, can you tell me? Thanks.

@ShixiangWang
Copy link
Owner

然而,如果只是用原的测试数据,两者的结果是一样的。

我不是很理解,你不是说结果一致了吗? 原的测试数据又是什么意思?

@xiasijian
Copy link
Author

xiasijian commented Feb 24, 2023 via email

@ShixiangWang
Copy link
Owner

那我只能理解为2次抽样的数据不一样。另外 DoAbsolute 看你使用并行没有,如果是的话不一定 set.seed 能 work。其他的可能的原因我上面也已经说了。 我这边也不存在其他的可能解释。

一个最简单测试重复的办法是运行到调用 RunAbsolute 函数时,对比下这里的输入文件和你单独运行 ABSOLUTE 的输入文件是否一致。

DoAbsolute/R/DoAbsolute.R

Lines 285 to 295 in 8748cfb

suppressWarnings(ABSOLUTE::RunAbsolute(
seg.dat.fn = seg_fn, maf.fn = maf_fn,
sample.name = samples[i],
sigma.p = sigma.p, max.sigma.h = max.sigma.h,
min.ploidy = min.ploidy, max.ploidy = max.ploidy,
primary.disease = primary.disease, platform = platform,
results.dir = cache.dir, max.as.seg.count = max.as.seg.count,
max.non.clonal = max.non.clonal, max.neg.genome = max.neg.genome,
copy_num_type = copy_num_type,
min.mut.af = min.mut.af, verbose = verbose
))

你可以指定 cores = 1,然后运行 debug(ABSOLUTE::RunAbsolute),然后跑 DoAbsolute,等待跳到运行上面代码之前。

@xiasijian
Copy link
Author

xiasijian commented Feb 24, 2023 via email

@xiasijian
Copy link
Author

xiasijian commented Feb 24, 2023 via email

@xiasijian
Copy link
Author

01_Doabsolute_min.zip

王老师,这是我测试的脚本和输出,我载入了两者的最后保存的Rdata文件,确认输入的一样的,但是在model.res的结果却完全不一样,因此导致结果不一致。

@xiasijian
Copy link
Author

王老师,还有一个问题,DoAbsolute最后的输出为NA,具体含义是?

@ShixiangWang
Copy link
Owner

01_Doabsolute_min.zip

王老师,这是我测试的脚本和输出,我载入了两者的最后保存的Rdata文件,确认输入的一样的,但是在model.res的结果却完全不一样,因此导致结果不一致。

那这就很奇怪了啊,你使用的应该是同一个absolute版本才对

@ShixiangWang
Copy link
Owner

那就是调用的默认参数不同,你再看看,不应该有其他解释,那太玄学了

@xiasijian
Copy link
Author

王老师,确实很让人疑惑,我把您的脚本看了不下10遍,没发现很影响结果的代码,但当Seg文件和Maf文件的行数较少时,这两个软件就会出现偏差,这种偏差既体现在肿瘤纯度,也体现在最后得到的Rdata文件。

@ShixiangWang
Copy link
Owner

那就暂时不要纠结这个了,你直接用 ABSOLUTE 更好的话就直接用 for 循环调用即可,我这个包写的目的也就是为了方便一点。 下面这段代码就给后人参考吧,我会在 README 中说明一下。你后续如果发现真正的原因也可以后续再讨论。

rm(list=ls())
gc()

library(DoAbsolute)
library(dplyr)
example_path = system.file("extdata", package = "DoAbsolute", mustWork = T)
library(data.table)
library(ABSOLUTE)

##set workspace
outdir="./output/"
setwd(workdir)

# segmentation file
seg_normal =  file.path(example_path, "SNP6_blood_normal.seg.txt")
seg_solid  =  file.path(example_path, "SNP6_solid_tumor.seg.txt")
seg_metastatic  = file.path(example_path, "SNP6_metastatic_tumor.seg.txt")

# MAF file
maf_solid  = file.path(example_path, "solid_tumor.maf.txt")
maf_metastatic  = file.path(example_path, "metastatic_tumor.maf.txt")

# read data
seg_normal = fread(seg_normal)
seg_solid = fread(seg_solid)
seg_metastatic = fread(seg_metastatic)
maf_solid = fread(maf_solid)
maf_metastatic = fread(maf_metastatic)

# merge data
Seg = Reduce(rbind, list(seg_normal, seg_solid, seg_metastatic))
Maf = Reduce(rbind, list(maf_solid, maf_metastatic))

Seg$Sample = substr(Seg$Sample, 1, 15)
Maf$Tumor_Sample_Barcode = substr(Maf$Tumor_Sample_Barcode, 1, 15)

test_Seg=Seg %>% subset(Sample %in% c("TCGA-DK-A1A6-01"))
test_Seg=test_Seg[1:42,]
table(test_Seg$Chromosome)
test_Maf=Maf %>% subset(Tumor_Sample_Barcode %in% c("TCGA-DK-A1A6-01"))
table(test_Maf$Chromosome)
test_Maf$Chromosome=ifelse(test_Maf$Chromosome=="X",23,test_Maf$Chromosome)
test_Maf=test_Maf[1:28,]



############################ DoAbsolute #################################
# test function
DoAbsolute(Seg = test_Seg, Maf = test_Maf, 
           platform = "SNP_6.0", 
           copy.num.type = "total",
           results.dir = "output_doabsolute",
           nThread = 1, 
           sigma.p = 0,
           max.sigma.h = 0.2,
           min.ploidy = 0.5,
           max.ploidy = 10,
           keepAllResult = FALSE, 
           primary.disease = "Tumor",
           clean.temp = FALSE,
           max.as.seg.count = 1500,
           max.non.clonal = 0.05,
           max.neg.genome = 0.005,
           min.mut.af = 0.1,
           min.no.mut = 0,
           verbose = TRUE)

############################ Absolute #################################
for(patient in c("TCGA-DK-A1A6-01")){
  print(patient)
  test_seg=test_Seg%>%subset(Sample==patient)
  test_maf=test_Maf %>% subset(Tumor_Sample_Barcode==patient)
  write.table(test_seg,file = paste0(outdir,patient,"_test_output.seg"),sep="\t",row.names = F,quote = F)
  write.table(test_maf,file = paste0(outdir,patient,"_test_output.maf"),sep="\t",row.names = F,quote = F)
  RunAbsolute(seg.dat.fn = paste0(outdir,patient,"_test_output.seg"),
              maf.fn = paste0(outdir,patient,"_test_output.maf"), 
              platform = "SNP_6.0",
              copy_num_type = "total",
              sigma.p=0, 
              results.dir = "output_absolute",
              sample.name=patient,
              primary.disease="Tumor",
              max.sigma.h=0.5,
              min.ploidy=0.5,
              max.ploidy=10,
              max.as.seg.count=1600,
              max.neg.genome=0.005, 
              max.non.clonal=0.05,
              verbose = TRUE, 
              min.mut.af=0.1)
  
}


######################################
## merge absolute data
######################################
ab_outdir="./output_absolute/"
all_files=list.files(paste0(ab_outdir))
rdata_files=all_files[grepl(pattern = ".RData",all_files)]
absolute.files=paste0(ab_outdir,rdata_files)
results.dir <- file.path(ab_outdir, "abs_summary")
CreateReviewObject("DRAWS_summary", absolute.files, results.dir, "total", verbose=TRUE)

load(file = paste0(results.dir,"/","DRAWS_summary.PP-modes.data.Rdata"))
calls.path = file.path(paste0(results.dir, "/","DRAWS_summary.PP-calls_tab.txt"))
modes.path = file.path(paste0(results.dir, "/", "DRAWS_summary.PP-modes.data.RData"))
output.path = file.path(paste0(outdir, "abs_extract"))
ExtractReviewedResults(calls.path, "test", modes.path, output.path, "absolute", "total")

@xiasijian
Copy link
Author

xiasijian commented Feb 26, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants