Initial alevin benchmarking result #18

jashapiro · 2020-09-01T12:48:18Z

This PR adds a workflow to test various alevin index types that could be used for mapping, notably using selective alignment and comparing full transcriptome and cdna only alignment. This fulfills some of the goals of #9.

Also included is an html report for an initial run and a trace table describing memory and cpu usage. This was created by running the following command:

nextflow -C ../nextflow.config run alevin-benchmark-indexes.nf -profile batch -with-report alevin-benchmark.html -with-trace

A quick look at these results seems to indicate that while full SA does take more memory, that memory usage is still within the m4.2xlarge range when run with 8 threads. No jobs died with OOM errors, (the one that did early on seems to have been a fluke).

All mapping results are stored at s3://nextflow-ccdl-results/scpca-benchmark/alevin-quant . Comparisons of the mapping results has not yet been performed.

jaclyn-taroni · 2020-09-01T13:02:31Z

Jogging my own memory here - the decoy-aware indices are from pre-built from refgenie, is that correct?

jashapiro · 2020-09-01T13:19:21Z

Jogging my own memory here - the decoy-aware indices are from pre-built from refgenie, is that correct?

Yes, for this initial benchmarking, I used the prebuilt indices (indexes? I can't decide what I like here). That turns out to be Ensembl v97 with kmer 31.

envest · 2020-09-01T18:47:18Z

Whoa, as you said, the two full selective alignment runs were definite memory usage outliers, but they obviously did fine within the testing environment allocation. I'm interested in the mapping comparison results to see if that memory usage is worth it 🤑 i.e. for my edification, when does SA make an important difference?

jashapiro · 2020-09-01T19:04:55Z

I'm interested in the mapping comparison results to see if that memory usage is worth it 🤑

Funny thing at the moment (maybe often, I don't know), an r4.2xlarge spot instance(~60GB ram) is cheaper than the standard m4.2xlarge (32GB RAM), by about 20%. I don't know what to do with this information.

As to selective alignment, I defer to the docs: https://salmon.readthedocs.io/en/latest/salmon.html and paper: https://www.biorxiv.org/content/10.1101/657874v2

Keep old traces for reference

jaclyn-taroni

I had a few questions, but no need for me to re-review.

jaclyn-taroni · 2020-09-09T12:07:48Z

workflows/alevin-quant/alevin-benchmark-indexes.nf

+  ch_indexes = Channel.fromList([
+    ['cdna_k31_no_sa',
+     's3://nextflow-ccdl-data/reference/homo_sapiens/ensembl-100/salmon_index/cdna_k31',
+     's3://nextflow-ccdl-data/reference/homo_sapiens/ensembl-100/annotation/Homo_sapiens.ensembl.100.tx2gene.tsv'],


The answer to this may be no - can you assign the tx2gene file as a variable?

Not at this time, because there is still the partial SA in there from an external source, which does not use the same tx2gene. 😞

jaclyn-taroni · 2020-09-09T12:11:37Z

workflows/alevin-quant/trace.txt.1

@@ -0,0 +1,13 @@
+task_id	hash	native_id	name	status	exit	submit	duration	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar


I see you updated the trace and HTML files in 437842b and this file is the "keep old trace for reference" based on the diff. I might suggest we name this something else for future us reasons. I would expect the other file to be the old one just looking at the file names. (This comment extends to the HTML files.)

Yeah, this was how nextflow handled the file name conflict. I would have expected the same as you did based on file names, but I am hesitant to rename them just for consistency. Scratch that, I will rename with a date.

jashapiro added 2 commits August 31, 2020 20:30

add workflow for benchmarking indexes

5680b97

Add initial benchmark results

11badd9

jashapiro mentioned this pull request Sep 1, 2020

Benchmark salmon/alevin and kallisto #9

Closed

jashapiro requested a review from jaclyn-taroni September 8, 2020 13:13

jashapiro added 3 commits September 8, 2020 09:14

Merge branch 'master' into jashapiro/alevin-benchmark

1454d3a

Incorporate new indexes

fa53cc9

Update trace & benchmark files

437842b

Keep old traces for reference

jaclyn-taroni approved these changes Sep 9, 2020

View reviewed changes

jaclyn-taroni mentioned this pull request Sep 9, 2020

Add full decoy transciptomes #20

Merged

Rename benchmarks/traces with dates

7a321cf

jashapiro merged commit d85b6b5 into master Sep 9, 2020

jashapiro deleted the jashapiro/alevin-benchmark branch October 22, 2021 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial alevin benchmarking result #18

Initial alevin benchmarking result #18

jashapiro commented Sep 1, 2020 •

edited

Loading

jaclyn-taroni commented Sep 1, 2020

jashapiro commented Sep 1, 2020

envest commented Sep 1, 2020

jashapiro commented Sep 1, 2020

jaclyn-taroni left a comment

jaclyn-taroni Sep 9, 2020

jashapiro Sep 9, 2020

jaclyn-taroni Sep 9, 2020

jashapiro Sep 9, 2020

		@@ -0,0 +1,13 @@
		task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar

Initial alevin benchmarking result #18

Initial alevin benchmarking result #18

Conversation

jashapiro commented Sep 1, 2020 • edited Loading

jaclyn-taroni commented Sep 1, 2020

jashapiro commented Sep 1, 2020

envest commented Sep 1, 2020

jashapiro commented Sep 1, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni Sep 9, 2020

Choose a reason for hiding this comment

jashapiro Sep 9, 2020

Choose a reason for hiding this comment

jaclyn-taroni Sep 9, 2020

Choose a reason for hiding this comment

jashapiro Sep 9, 2020

Choose a reason for hiding this comment

jashapiro commented Sep 1, 2020 •

edited

Loading