Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling and general sense on SELFIES generation #62

Closed
chupvl opened this issue Sep 1, 2021 · 4 comments
Closed

Error handling and general sense on SELFIES generation #62

chupvl opened this issue Sep 1, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@chupvl
Copy link

chupvl commented Sep 1, 2021

Hey! Where can I read about error handling and general SELFIES BS filter?
For example:
sf.encoder('1243124124') - will just hang for a very long time, should it through the error right away?
sf.encoder('SOMETHINGWRONGHERE') will generate '[S][O][M][E][T][H][I][N][G][W][R][O][N][G][H][E][R][E]'

This is linked with the issue I have processing a list of compounds that returned an error while all SMILES were verified by the rdkit accordingly.

--> Acquiring data... Finished acquiring data. Representation: SMILES --> Translating SMILES to SELFIES... 'NoneType' object has no attribute 'find'

@MarioKrenn6240
Copy link
Collaborator

MarioKrenn6240 commented Sep 4, 2021

Dear Chupvl,
Thanks for those messages. We are in the final stages of preparing a version SELFIES 2.0.0, and will consider your first two examples.

Could you please provide the SMILES which produces None? That would be very useful, thank you.

Mario

@MarioKrenn6240 MarioKrenn6240 added the bug Something isn't working label Sep 4, 2021
@chupvl
Copy link
Author

chupvl commented Sep 7, 2021

Hello Mario.
That's exactly an issue, all compounds passed thru rdkit just fine, but I cannot capture the reason for SELFIES breaking.
It's definitely due to incorrect SMILES... going to do a more in-depth cleanup, removing metals and some other weird stuff.
But still SELFIES validity check or better error handling will help to resolve those issues.

@MarioKrenn6240
Copy link
Collaborator

Can you find the example that gives NonType? Would help identifying the problem. Maybe you can go through each SMILES and translate to SELFIES and catch the error and the SMILES id?

@alstonlo
Copy link
Collaborator

Hi @chupvl,

In selfies v2.0.0, we have implemented more stringent error checking. For example, the invalid SMILES strings you gave now raise a selfies.EncoderError when passed into selfies.encoder

sf.encoder("SOMETHINGWRONGHERE") 
selfies.exceptions.SMILESParserError: 
	SMILES: SOMETHINGWRONGHERE
	          ^
	Index:  2
	Reason: invalid atom symbol 'M'
...
...
selfies.exceptions.EncoderError: failed to parse input
	SMILES: SOMETHINGWRONGHERE

To read more about error handling in selfies, we unfortunately do not have any documentation on this, but the code that parses SMILES and detects invalid SMILES is all contained in this utils/smiles_utils.py.

Thanks for the bug report! Please let us know if the error persists!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants