Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[URGENT] Redesign blast data loading #16

Closed
lucventurini opened this issue Sep 18, 2015 · 2 comments
Closed

[URGENT] Redesign blast data loading #16

lucventurini opened this issue Sep 18, 2015 · 2 comments
Milestone

Comments

@lucventurini
Copy link
Collaborator

At the moment, I perform many of the calculations inside the serialization class (init), e.g. the calculation of the global identity.
This is MASSIVELY inefficient because I cannot perform bulk inserts the right way. I will have to preprocess the Blast Hits / Hsps BEFORE loading them into the database so that I can actually use the bulk_insert appropriately.

@lucventurini lucventurini added this to the 0.9 milestone Sep 18, 2015
@lucventurini
Copy link
Collaborator Author

Probably solved by using raw __ table__.insert calls and pre-calculating the statistics before loading. I need access to the cluster data to confirm.

@lucventurini
Copy link
Collaborator Author

BLAST redesigned, unclear on the efficiency gains. This is now the preferred way, even inside the Hit class init.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant