-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial alevin benchmarking result #18
Conversation
Jogging my own memory here - the decoy-aware indices are from pre-built from refgenie, is that correct? |
Yes, for this initial benchmarking, I used the prebuilt indices (indexes? I can't decide what I like here). That turns out to be Ensembl v97 with kmer 31. |
Whoa, as you said, the two full selective alignment runs were definite memory usage outliers, but they obviously did fine within the testing environment allocation. I'm interested in the mapping comparison results to see if that memory usage is worth it 🤑 i.e. for my edification, when does SA make an important difference? |
Funny thing at the moment (maybe often, I don't know), an r4.2xlarge spot instance(~60GB ram) is cheaper than the standard m4.2xlarge (32GB RAM), by about 20%. I don't know what to do with this information. As to selective alignment, I defer to the docs: https://salmon.readthedocs.io/en/latest/salmon.html and paper: https://www.biorxiv.org/content/10.1101/657874v2 |
Keep old traces for reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few questions, but no need for me to re-review.
ch_indexes = Channel.fromList([ | ||
['cdna_k31_no_sa', | ||
's3://nextflow-ccdl-data/reference/homo_sapiens/ensembl-100/salmon_index/cdna_k31', | ||
's3://nextflow-ccdl-data/reference/homo_sapiens/ensembl-100/annotation/Homo_sapiens.ensembl.100.tx2gene.tsv'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer to this may be no - can you assign the tx2gene file as a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at this time, because there is still the partial SA in there from an external source, which does not use the same tx2gene. 😞
workflows/alevin-quant/trace.txt.1
Outdated
@@ -0,0 +1,13 @@ | |||
task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you updated the trace and HTML files in 437842b and this file is the "keep old trace for reference" based on the diff. I might suggest we name this something else for future us reasons. I would expect the other file to be the old one just looking at the file names. (This comment extends to the HTML files.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was how nextflow
handled the file name conflict. I would have expected the same as you did based on file names, but I am hesitant to rename them just for consistency. Scratch that, I will rename with a date.
This PR adds a workflow to test various alevin index types that could be used for mapping, notably using selective alignment and comparing full transcriptome and cdna only alignment. This fulfills some of the goals of #9.
Also included is an html report for an initial run and a trace table describing memory and cpu usage. This was created by running the following command:
A quick look at these results seems to indicate that while full SA does take more memory, that memory usage is still within the m4.2xlarge range when run with 8 threads. No jobs died with OOM errors, (the one that did early on seems to have been a fluke).
All mapping results are stored at s3://nextflow-ccdl-results/scpca-benchmark/alevin-quant . Comparisons of the mapping results has not yet been performed.