-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add notice for the atom map id in the rxn. #36
Conversation
The map ids in the rxn should be consecutive!
``` | ||
The map ids in the rxn should be consecutive, or it will report [the molAtomMapNumber issue](https://github.com/awslabs/dgl-lifesci/issues/33). | ||
|
||
To avoid the problem, you could convert the raw rxn smiles with explicit hydrgogen atoms to the rxn smiles without hydrogen atoms by RDKit befor adding map ids for the rxn smiles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give some examples about how to relabel atom mapping numbers using consecutive integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the raw rxn smiles is "[H]C([H])([H])Oc1ccc(CCNC=O)cc1OC([H])([H])[H]>>[H]C([H])([H])Oc1cc2c(cc1OC([H])([H])[H])CCN=C2", if you directly add map for the rxn by RDT software.
the mapped rxn smiles is
[CH2:1]([CH2:2][NH:10][CH:11]=[O:21])[c:8]1[cH:7][cH:6][c:5]([O:4][CH3:3])[c:12]([cH:9]1)[O:13][CH3:14]>>[CH:11]1=[N:10][CH2:2][CH2:1][c:8]2[cH:9][c:12]([O:13][CH3:14])[c:5]([cH:6][c:7]12)[O:4][CH3:3]
the oxygen atom will be labelled as 21.
First, convert the raw rxn smiles to rxn smiles without hydrogen atoms.
#!python
from rdkit import Chem
def canonicalizatonsmi(smi):
newsmi = Chem.MolToSmiles(Chem.MolFromSmiles(smi))
return newsmi
def canon_reaction(rxnstring):
#print("rxnstring:",rxnstring)
r,p =rxnstring.split('>>')
rs = r.split('.')
#print("rs",rs)
ps = p.split('.')
#print("ps",p)
rscans=[]
pscans=[]
for reactant in rs:
temp=canonicalizatonsmi(reactant)
#print(reactant,temp)
rscans.append(temp)
for product in ps:
pscans.append(canonicalizatonsmi(product))
rscan='.'.join(sorted(rscans))
pscan='.'.join(pscans)
newrxnsring='%s>>%s'%(rscan,pscan)
return newrxnsring
from rdkit import Chem
rxnstring= '[H]C([H])([H])Oc1ccc(CCNC=O)cc1OC([H])([H])[H]>>[H]C([H])([H])Oc1cc2c(cc1OC([H])([H])[H])CCN=C2'
canon_reaction(rxnstring)
Then, add map for the reaction smiles.
[O:15]=[CH:1][NH:2][CH2:3][CH2:4][c:5]1[cH:6][cH:7][c:8]([O:9][CH3:10])[c:11]([O:12][CH3:13])[cH:14]1>>[CH:1]1=[N:2][CH2:3][CH2:4][c:5]2[cH:14][c:11]([O:12][CH3:13])[c:8]([O:9][CH3:10])[cH:7][c:6]12
the oxygen atom will be labelled as 15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example. Is it possible for us to have a python script that automatically performs:
- Canonicalize the rxn SMILES
- Add new atom mapping numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indigo support the python api, but the accuracy is worse.
RDT is better than Indigo, but it is a Java tools.
Other tools such as ChemAxon、rxnmapper need to further be evaluated.
I am not sure which tool is the best tool to add atom mapping numbers.
So I don't put "Add new atom mapping numbers" in the python script.
The map ids in the rxn should be consecutive!
Issue #, if available: #33
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.