-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization for SNPCoverageAdapter and CRAM parsing #2947
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2947 +/- ##
==========================================
- Coverage 60.74% 60.67% -0.07%
==========================================
Files 601 601
Lines 27329 27341 +12
Branches 6643 6654 +11
==========================================
- Hits 16600 16590 -10
- Misses 10419 10441 +22
Partials 310 310
Continue to review full report at Codecov.
|
This PR now also adds some CRAM optimizations too (raw loops instead of foreach, preallocate mismatches). Combined with @gmod/cram optimization, this branch is about ~20% faster than main (the @gmod/cram was about 10%-15% faster) for longread CRAM 400xcram longread 600xcram longread 800xcram longread |
note: these changes above improve the longread performance more while the @gmod/cram pr improve shortread more |
This can save about ~1second in generateCoverageBins calls on a 20kb region
Allocates less nested objects
On 200x coverage shortreads from jb2profile
Time spent in generateCoverageBins specifically:
This branch 2.82s
Main branch 3.6s
Possibly we could also bin multiple bp into a single bin (jb1 does this) and or use skips_and_dels (another jb1-ism) to optimize further, but we would need to be careful about binning multiple bp into a single bin with snps/modifications as those should probably not be aggregated