-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize gtf parsing #2681
Labels
enhancement
New feature or request
Comments
cc @teresam856 could be worth checking out. the 19mb gz file does unpack into 329mb in memory but it would cool if we could handle this scale of file. |
the 329 is just checking size on disk from unzipping the gz |
This is a good thing to have. Could look into it... |
super..if it was optimized, this could become a "recommended" or even pre-configured way for people to load data |
some amount better from #2927 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There is a nice ability to load GTF files directly because hosts like UCSC still publish these for their annotations
e.g.
https://hgdownload.soe.ucsc.edu/goldenPath/canFam6/bigZips/genes/ncbiRefSeq.gtf.gz
this 19MB file does load sometimes but pushes memory limits
Here is a partial profiling trace of loading the url above (not a full trace because leaving it on during loading crashes on my computer)
zoom out of loading process
![Screenshot from 2022-01-27 12-51-03](https://user-images.githubusercontent.com/6511937/151433252-09eaf1d3-283f-41fa-8811-3e9aa794c1ff.png)
gtf_loading.json.gz
zoom in
![Screenshot from 2022-01-27 12-51-16](https://user-images.githubusercontent.com/6511937/151433256-4d1c8f6d-ae1a-4619-8a2c-d0ca2fe0d21b.png)
The text was updated successfully, but these errors were encountered: