Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kallisto bus out put doesn't change for --cm flag #477

Open
tsofiya opened this issue Jan 15, 2025 · 1 comment
Open

Kallisto bus out put doesn't change for --cm flag #477

tsofiya opened this issue Jan 15, 2025 · 1 comment

Comments

@tsofiya
Copy link

tsofiya commented Jan 15, 2025

A while ago I asked about getting the raw, not collapsed gene matrix (#408), and was answered I should use -cm flag.
On my current project I wanted to do the same so I run kallisto bus twice, once with -cm and once without.
my script:

kallisto bus -i cdna.idx -o results/bustools/ -x 0,0,8:0,8,21:1,0,0 -t 8 GE_S1_L001_R1_001.fq.gz GE_S1_L001_R2_001.fq.gz GE_S1_L002_R1_001.fq.gz GE_S1_L002_R2_001.fq.gz

cd results/bustools

mkdir -p tmp
bustools sort -o tmp/output.s.bus -T tmp -t 8 -m 4G output.bus

bustools inspect -o inspect.json -w path/to/whitelist.txt tmp/output.s.bus

bustools correct -o tmp/output.s.c.bus -w path/to/whitelist.txt tmp/output.s.bus

bustools sort -o output.unfiltered.bus -T tmp -t 8 -m 4G tmp/output.s.c.bus

mkdir -p counts_unfiltered

bustools count -o counts_unfiltered/cells_x_genes -e matrix.ec -t transcripts.txt --genecounts output.unfiltered.bus -g path/to/kalisto_t2g.txt

rm -rf tmp

(when I run with -cm I use a different folder name)

Later, I use an R script to read the matrix:

read_count_output <- function(dir, name) {
  dir <- normalizePath(dir, mustWork = TRUE)
  m <- readMM(paste0(dir, "/", name, ".mtx"))
  m <- Matrix::t(m)
  m <- as(m, "dgCMatrix")
  ge <- ".genes.txt"
  genes <- readLines(file(paste0(dir, "/", name, ge)))
  barcodes <- readLines(file(paste0(dir, "/", name, ".barcodes.txt")))
  colnames(m) <- barcodes
  rownames(m) <- genes
  return(m)
}


mat <- read_count_output(paste0(project.dir, results.dir,"/bustools/counts_unfiltered"), name="cells_x_genes")
rownames(mat)<- sub("\\..*", "", rownames(mat))

tr2g <- read_tsv("t2g.txt", col_names = c("transcript", "gene", "gene_symbol")) %>%
  select(-transcript) %>%
  distinct()
tr2g$gene<- sub("\\..*", "", tr2g$gene)


rownames(mat) <- tr2g$gene_symbol[match(rownames(mat), tr2g$gene)]
my.data.no.dup= rowsum(mat, row.names(mat))

and the same with the -cm matrix.

The problem is- I'm getting the exact same matrix!
This didn't happened before. Am I doing something wrong?

@Yenaled
Copy link
Collaborator

Yenaled commented Jan 15, 2025

Are you using cm with bustools count or kallisto bus?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants