-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/parse large files low memory #33
Feature/parse large files low memory #33
Conversation
I think I need to match my |
Just the default settings, integrated with PyCharm as an external tool: https://github.com/psf/black#editor-integration . You can run |
…arge-files-low-memory # Conflicts: # src/snps/io.py # tests/test_snps_collection.py
Codecov Report
@@ Coverage Diff @@
## develop #33 +/- ##
==========================================
+ Coverage 89.15% 90.56% +1.4%
==========================================
Files 5 5
Lines 1097 1123 +26
Branches 196 204 +8
==========================================
+ Hits 978 1017 +39
+ Misses 73 60 -13
Partials 46 46
Continue to review full report at Codecov.
|
….com/willgdjones/snps into feature/parse-large-files-low-memory
Ok! I've matched the settings now. |
Hi @apriha - the refactoring that you've made to this PR looks good to me, thanks for doing so. |
Thanks again @willgdjones ! Let me know if you agree with the latest commits and I'll merge the PR. |
Those changes look good @apriha ! |
…mments prefer build from comments
This PR implements functionality to efficiently parse large VCF files without needing to decompress them. It also provides functionality to specify a set of rsids to extract from the file:
This functionality only works when feeding in a
bytes_data
object, which needs to be a valid, gzip compressed byte-string.I am able to extract rsids from an output of an imputation pipeline which is a VCF file ~450mb compressed in ~ 1 minute.
This should be merged after #32.