add notice for the atom map id in the rxn. #36

autodataming · 2020-06-28T07:28:03Z

The map ids in the rxn should be consecutive!

Issue #, if available: #33

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The map ids in the rxn should be consecutive!

examples/reaction_prediction/rexgen_direct/README.md

mufeili · 2020-06-28T09:07:28Z

examples/reaction_prediction/rexgen_direct/README.md

+```
+The map ids in the rxn should be consecutive， or it will report [the molAtomMapNumber issue](https://github.com/awslabs/dgl-lifesci/issues/33).
+
+To avoid the problem, you could convert the raw rxn smiles with explicit hydrgogen atoms to the rxn smiles without hydrogen atoms by RDKit befor adding map ids for the rxn smiles.


Can we give some examples about how to relabel atom mapping numbers using consecutive integers?

For example, the raw rxn smiles is "[H]C([H])([H])Oc1ccc(CCNC=O)cc1OC([H])([H])[H]>>[H]C([H])([H])Oc1cc2c(cc1OC([H])([H])[H])CCN=C2", if you directly add map for the rxn by RDT software.
the mapped rxn smiles is

[CH2:1]([CH2:2][NH:10][CH:11]=[O:21])[c:8]1[cH:7][cH:6][c:5]([O:4][CH3:3])[c:12]([cH:9]1)[O:13][CH3:14]>>[CH:11]1=[N:10][CH2:2][CH2:1][c:8]2[cH:9][c:12]([O:13][CH3:14])[c:5]([cH:6][c:7]12)[O:4][CH3:3]

the oxygen atom will be labelled as 21.

First, convert the raw rxn smiles to rxn smiles without hydrogen atoms.

#!python from rdkit import Chem def canonicalizatonsmi(smi): newsmi = Chem.MolToSmiles(Chem.MolFromSmiles(smi)) return newsmi def canon_reaction(rxnstring): #print("rxnstring:",rxnstring) r,p =rxnstring.split('>>') rs = r.split('.') #print("rs",rs) ps = p.split('.') #print("ps",p) rscans=[] pscans=[] for reactant in rs: temp=canonicalizatonsmi(reactant) #print(reactant,temp) rscans.append(temp) for product in ps: pscans.append(canonicalizatonsmi(product)) rscan='.'.join(sorted(rscans)) pscan='.'.join(pscans) newrxnsring='%s>>%s'%(rscan,pscan) return newrxnsring from rdkit import Chem rxnstring= '[H]C([H])([H])Oc1ccc(CCNC=O)cc1OC([H])([H])[H]>>[H]C([H])([H])Oc1cc2c(cc1OC([H])([H])[H])CCN=C2' canon_reaction(rxnstring)

Then, add map for the reaction smiles.

[O:15]=[CH:1][NH:2][CH2:3][CH2:4][c:5]1[cH:6][cH:7][c:8]([O:9][CH3:10])[c:11]([O:12][CH3:13])[cH:14]1>>[CH:1]1=[N:2][CH2:3][CH2:4][c:5]2[cH:14][c:11]([O:12][CH3:13])[c:8]([O:9][CH3:10])[cH:7][c:6]12

the oxygen atom will be labelled as 15.

Thanks for the example. Is it possible for us to have a python script that automatically performs:

Canonicalize the rxn SMILES

Add new atom mapping numbers

Indigo support the python api, but the accuracy is worse.
RDT is better than Indigo, but it is a Java tools.
Other tools such as ChemAxon、rxnmapper need to further be evaluated.
I am not sure which tool is the best tool to add atom mapping numbers.
So I don't put "Add new atom mapping numbers" in the python script.

add notice for the atom map id in the rxn.

5ade027

The map ids in the rxn should be consecutive!

mufeili self-requested a review June 28, 2020 07:41