Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial alevin benchmarking result #18

Merged
merged 6 commits into from
Sep 9, 2020
Merged

Conversation

jashapiro
Copy link
Member

@jashapiro jashapiro commented Sep 1, 2020

This PR adds a workflow to test various alevin index types that could be used for mapping, notably using selective alignment and comparing full transcriptome and cdna only alignment. This fulfills some of the goals of #9.

Also included is an html report for an initial run and a trace table describing memory and cpu usage. This was created by running the following command:

nextflow -C ../nextflow.config run alevin-benchmark-indexes.nf -profile batch -with-report alevin-benchmark.html -with-trace

A quick look at these results seems to indicate that while full SA does take more memory, that memory usage is still within the m4.2xlarge range when run with 8 threads. No jobs died with OOM errors, (the one that did early on seems to have been a fluke).

All mapping results are stored at s3://nextflow-ccdl-results/scpca-benchmark/alevin-quant . Comparisons of the mapping results has not yet been performed.

@jaclyn-taroni
Copy link
Member

Jogging my own memory here - the decoy-aware indices are from pre-built from refgenie, is that correct?

@jashapiro
Copy link
Member Author

Jogging my own memory here - the decoy-aware indices are from pre-built from refgenie, is that correct?

Yes, for this initial benchmarking, I used the prebuilt indices (indexes? I can't decide what I like here). That turns out to be Ensembl v97 with kmer 31.

@envest
Copy link
Contributor

envest commented Sep 1, 2020

Whoa, as you said, the two full selective alignment runs were definite memory usage outliers, but they obviously did fine within the testing environment allocation. I'm interested in the mapping comparison results to see if that memory usage is worth it 🤑 i.e. for my edification, when does SA make an important difference?

@jashapiro
Copy link
Member Author

I'm interested in the mapping comparison results to see if that memory usage is worth it 🤑

Funny thing at the moment (maybe often, I don't know), an r4.2xlarge spot instance(~60GB ram) is cheaper than the standard m4.2xlarge (32GB RAM), by about 20%. I don't know what to do with this information.

As to selective alignment, I defer to the docs: https://salmon.readthedocs.io/en/latest/salmon.html and paper: https://www.biorxiv.org/content/10.1101/657874v2

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few questions, but no need for me to re-review.

ch_indexes = Channel.fromList([
['cdna_k31_no_sa',
's3://nextflow-ccdl-data/reference/homo_sapiens/ensembl-100/salmon_index/cdna_k31',
's3://nextflow-ccdl-data/reference/homo_sapiens/ensembl-100/annotation/Homo_sapiens.ensembl.100.tx2gene.tsv'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to this may be no - can you assign the tx2gene file as a variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this time, because there is still the partial SA in there from an external source, which does not use the same tx2gene. 😞

@@ -0,0 +1,13 @@
task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you updated the trace and HTML files in 437842b and this file is the "keep old trace for reference" based on the diff. I might suggest we name this something else for future us reasons. I would expect the other file to be the old one just looking at the file names. (This comment extends to the HTML files.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was how nextflow handled the file name conflict. I would have expected the same as you did based on file names, but I am hesitant to rename them just for consistency. Scratch that, I will rename with a date.

@jashapiro jashapiro merged commit d85b6b5 into master Sep 9, 2020
@jashapiro jashapiro deleted the jashapiro/alevin-benchmark branch October 22, 2021 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants