one question about the canonicalize_products.py #10

teslacool · 2021-03-16T01:51:45Z

@chaoyan1037 If I understand correctly, the SMILES in train.csv before processed by the canonicalize_product.py is canonicalized by RDKit through smiles=Chem.MolToSmiles(mol, canonical=True). So, what is the purpose to permute the atommapnumber in this canonicalize_product.py and is there any reference work?

The text was updated successfully, but these errors were encountered:

chaoyan1037 · 2021-03-22T01:08:20Z

@teslacool It is a good catch. We found there was an information leak within the USPTO dataset itself. The order of the atoms within the product SMILES may indicate the reaction atoms. To be more specific, we found that for most USPTO products, the first atoms of the product SMILES are usually reaction atoms. You may look into the reaction atoms yourself.
So we use canonicalize_products.py to rearrange the atom order to be the same as the canonical atom order, hoping to remove the potential information leak.

teslacool · 2021-03-22T01:57:48Z

Ok. thanks for your answer.

chaoyan1037 · 2021-04-15T02:15:10Z

We have an important update of our method. Please refer to the readme for more details.

YanjingLiLi · 2023-06-01T23:28:45Z

For this leakage, does it only appear in your developed model, or it's actually a general problem for all the models using this dataset?

najwalb · 2023-09-23T14:38:35Z

@YanjingLiLi I believe it's a general problem, it was mentioned by other methods later as well: https://openreview.net/pdf?id=SnONpXZ_uQ_

Methods using atom-mapping information shld be particularly careful with this.

teslacool closed this as completed Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

one question about the canonicalize_products.py #10

one question about the canonicalize_products.py #10

teslacool commented Mar 16, 2021

chaoyan1037 commented Mar 22, 2021

teslacool commented Mar 22, 2021

chaoyan1037 commented Apr 15, 2021

YanjingLiLi commented Jun 1, 2023

najwalb commented Sep 23, 2023

one question about the canonicalize_products.py #10

one question about the canonicalize_products.py #10

Comments

teslacool commented Mar 16, 2021

chaoyan1037 commented Mar 22, 2021

teslacool commented Mar 22, 2021

chaoyan1037 commented Apr 15, 2021

YanjingLiLi commented Jun 1, 2023

najwalb commented Sep 23, 2023