-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Library Files python script - missing step #140
Comments
Sorry missed this issue at the time. Thanks for pointing me at this. The issue is actually slightly different, the directory is made (if needed), but the function does not expect a directory, but instead expects a based file name: However, I agree that this is not intuitive for the user. So I will change this to specifying the directory. And making the expected files. Additionally with #146 it is made a lot easier to create new library files for your own data, it is now possible to do this with just a few lines of code, without needing to run all the notebooks. |
Hi
|
The results was an empty directory: |
Hi, The loading bars I see (for a small test set) are: Cleaning metadata library spectra: 100%|██████████| 100/100 [00:00<00:00, 417.27it/s] Does waiting longer solve the issue for you? |
I added printing "Calculating Tanimoto scores" |
Thanks for your prompt reply, here is a sample of my msp file: NAME: Actinorhodin |
I now notice you have quite some unique Inchikeys; 168039. Is this an in house library and does that number of unique inchikeys match with your expectations? This increase in unique inchikeys might result in some memory issues in google colab. I tested it for up to about 20.000 unique inchikeys. 168039 is quite substantially more and since the tanimoto score is calculated between each inchikey, the size increases to the power of 2. I think this makes google colab crash, it might be possible to still run this on a server, with more memory available than google colab. Could you maybe try the workflow with a smaller spectrum file (with e.g. 100 spectra). To make sure the workflow works well in google colab? If this is indeed the issue, I could have a look at some improvements to reduce the memory footprint of the generation of the Tanimoto scores. |
You're probably right, |
I confirm, it works on collab for 10000 spectra!! |
Great I will also make a less memory intensive implementation. This can be further discussed in #150 |
"7_create_library_files.py"
There is a small bug, as the folder path_library need to be created, e.g. including:
if not(os.path.isdir(path_library)): os.makedirs(path_library)
The text was updated successfully, but these errors were encountered: