-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory foodprint creating large tanimoto score files #150
Comments
Will be solved with #151 |
@guikool |
launch started on 500K spectra in collab... |
oups..crash
|
one way to limit the size of the tanimoto square matrix is perhaps to limit the tanimoto score to a given treshold (0.7?) |
You can remove the step library_creator.calculate_tanimoto_scores() We actually only need a fraction of the tanimoto scores, so the memory footprint in this version should be reduced a lot (even more than only above 0.7). |
I didn't notice the script change. |
Dear Niek, I just benchmarked the last version of MS2query for library creation. On a 32Go memory based computer, It terminates with the following error:
I've access to a 256 Go workstation and will make a try, but perhaps there is something to optimize on this part. |
Thanks for letting us know. It is hard for me to change this since this step is not needed for MS2Query but instead is needed for training MS2Deepscore. I had a quick look if this could be easily changed, but it is not straightforward to change this. I will make an issue in MS2Deepscore, about this, so this mitght be changed in the future. I hope it works on the 256Gb workstation. |
I've encountered another issue on the workstation, but related to python install. |
I finally removed all in-silico spectra from my in house library and work with less than 50K unique inchikey, no problem so far, library creation works really well and fast. |
Great to hear that it works well now! |
Currently a matrix with tanimoto scores is generated. However, only the top 10 highest scoring Tanimoto scores are needed for MS2Query.
Suggested change:
Do not store the entire matrix with tanimoto scores, instead just store the top 10 highest tanimoto scores. And pass this to the sqlite file generator.
The text was updated successfully, but these errors were encountered: