-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement iterators over .hic files #2
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Move MatrixType and MatrixUnit params from MatrixSelector to HiCFile ctor.
Codecov Report
@@ Coverage Diff @@
## main #2 +/- ##
==========================================
- Coverage 81.45% 81.09% -0.37%
==========================================
Files 47 50 +3
Lines 3441 3285 -156
==========================================
- Hits 2803 2664 -139
+ Misses 638 621 -17
|
Split MatrixSelector into several smaller classes.
robomics
force-pushed
the
impl-hic-lazy-fetch
branch
from
June 12, 2023 09:47
0e16d99
to
fd14b4a
Compare
In .hic files blocks are stored sorted by column, we want them sorted by row.
robomics
force-pushed
the
impl-hic-lazy-fetch
branch
from
June 14, 2023 18:26
e644114
to
606a15b
Compare
Replace LRU cache with a simple FIFO.
robomics
force-pushed
the
impl-hic-lazy-fetch
branch
from
June 18, 2023 09:06
8f98b78
to
c79cf9c
Compare
robomics
force-pushed
the
impl-hic-lazy-fetch
branch
from
June 18, 2023 09:12
c79cf9c
to
865556f
Compare
Our previous implementation performed quite poorly at high resolutions, as we were processing one row at a time while paying the overhead of fetching the block index and data every row. This was done with the intention of minimizing the amount of read-ahead we do, as well as getting pixels in the correct order without doing any explicit sort. However this was too slow. The current solution is less clean but performance is much better (15x at 10bp on some of our datasets). Instead of processing one row at a time, we now process rows in chunks. Chunk sizes are computed as a fraction of chromosome sizes, and thus grow linearly with resolution, making the overhead to process a chunk comparable across resolution. Another benefit of this approach is that indexing of InteractionBlock is no longer needed: sorting pixels is enough.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rework hic library to support efficient iteration over pixels overlapping a given query.