Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize gtf parsing #2681

Closed
cmdcolin opened this issue Jan 27, 2022 · 6 comments
Closed

Optimize gtf parsing #2681

cmdcolin opened this issue Jan 27, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@cmdcolin
Copy link
Collaborator

There is a nice ability to load GTF files directly because hosts like UCSC still publish these for their annotations

e.g.
https://hgdownload.soe.ucsc.edu/goldenPath/canFam6/bigZips/genes/ncbiRefSeq.gtf.gz

this 19MB file does load sometimes but pushes memory limits

Here is a partial profiling trace of loading the url above (not a full trace because leaving it on during loading crashes on my computer)

zoom out of loading process
gtf_loading.json.gz
Screenshot from 2022-01-27 12-51-03

zoom in
Screenshot from 2022-01-27 12-51-16

@cmdcolin cmdcolin added the enhancement New feature or request label Jan 27, 2022
@cmdcolin
Copy link
Collaborator Author

this is the memory line going up to 800+mb

Screenshot from 2022-01-27 13-01-45

@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Jan 27, 2022

cc @teresam856 could be worth checking out. the 19mb gz file does unpack into 329mb in memory but it would cool if we could handle this scale of file.

@cmdcolin
Copy link
Collaborator Author

the 329 is just checking size on disk from unzipping the gz

@teresam856
Copy link
Contributor

teresam856 commented Jan 27, 2022

This is a good thing to have. Could look into it...

@cmdcolin
Copy link
Collaborator Author

super..if it was optimized, this could become a "recommended" or even pre-configured way for people to load data

@cmdcolin
Copy link
Collaborator Author

some amount better from #2927

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants