-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding resonance transitons for lone pairs #1348
Conversation
This PR fails on Travis due to aug-InChI issues |
e06389c
to
91ee3d6
Compare
Codecov Report
@@ Coverage Diff @@
## master #1348 +/- ##
==========================================
- Coverage 41.54% 41.51% -0.04%
==========================================
Files 173 174 +1
Lines 28537 28951 +414
Branches 5586 5716 +130
==========================================
+ Hits 11857 12019 +162
- Misses 15885 16121 +236
- Partials 795 811 +16
Continue to review full report at Codecov.
|
d7426f2
to
1ede57d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall comments:
- In for loops,
enumerate()
is generally cleaner and more pythonic thanxrange(len())
. I commented on a few cases, but I think every instance you added could potentially be replaced byenumerate
. - The filtration algorithm seems pretty involved. Have you had a chance to evaluate the overall effect this PR has on performance of generate_resonance_structures?
- What's the final status of augmented InChI? Is it still broken/deprecated?
- How do you think Chemical identity comparison for Molecules and Species #1329 would affect the changes in this PR, if at all?
- Could you run RMG-tests?
Thanks for the awesome work!
rmgpy/rmg/react.py
Outdated
@@ -147,6 +147,7 @@ def deflateReaction(rxn, molDict): | |||
object are populated with the value of the dictionary, either an | |||
integer index, or either a Species object. | |||
""" | |||
rxn.generatePairs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair generation is typically done by KineticsFamily.getReactionPairs()
, except for bimolecular families which are not hard-coded, in which case we recently allowed falling back to Reaction.generatePairs()
which is done in CERM.makeNewReaction()
.
Because Reaction.generatePairs()
resets the pairs, I think it's not advisable to call that here since it will overwrite the pairs generated by the kinetics family. I think it might be better to simply catch the error (which I assume occurs when rxn.pairs
is empty or none?) and save a more appropriate value.
rmgpy/molecule/pathfinder.py
Outdated
if ((atom1.isNitrogen() and atom1.radicalElectrons >= 1 and atom1.lonePairs in [0, 1]) | ||
or (atom1.isOxygen() and atom1.radicalElectrons >= 1 and atom1.lonePairs in [1, 2]) | ||
or (atom1.isSulfur() and atom1.radicalElectrons >= 1 and atom1.lonePairs in [0, 1, 2])): | ||
for atom2, _ in atom1.edges.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use keys()
or iterkeys()
here if you only need the atoms.
rmgpy/molecule/resonance.py
Outdated
|
||
def generate_adjacent_resonance_structures(mol): | ||
def generate_ally_delocalization_resonance_structures(mol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allyl is misspelled (unless you used ally intentionally).
rmgpy/molecule/molecule.py
Outdated
@@ -879,7 +879,7 @@ def sortAtoms(self): | |||
for index, vertex in enumerate(self.vertices): | |||
vertex.sortingLabel = index | |||
|
|||
def update(self): | |||
def update(self, log_species_while_updating_atom_types=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just naming it log_species
would be sufficient? This seems a bit long.
logging.debug('A net charged species was formed when reacting {0} to form {1} in' | ||
' reaction family {2}. Not generating this reaction.'.format( | ||
reactant_net_charge,product_net_charge,self.label)) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section could be better integrated with the section at line 1322 to avoid repetition.
rmgpy/data/kinetics/family.py
Outdated
# Reverse direction (the direction in which kinetics is not defined) | ||
reactionList.extend(self.__generateReactions(reactants, products=products, forward=False, prod_resonance=prod_resonance)) | ||
|
||
if all([mol.reactive for mol in reactants]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? It's checking that all the molecules are reactive. Also, this method is intended to accept either a list of molecules or a list of lists of molecules, so it would be nice if that still worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is wrong and will be addressed in a fixup
}, | ||
{ | ||
"data": { | ||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAEMAAAAZCAIAAADlvQS1AAAABmJLR0QA/wD/AP+gvaeTAAADqUlE\nQVRYhd2YT0jbUBzHX2VuZKWSmtB2soN/wGKFzlpQpChIFaSReRp4EQZjiJdimZBDT4KTHAseXA5F\nxLIi9SBFT7WgYkVklYk6dVgQUbSS2uJoamv17ZCZxvpnmU3d2OfU3/e993u/b/L6S1oZhBD8FxT8\nxb2dTmckEpEq219zkkgkNjc3dTrd0NBQOp2WICN8dFiW7e/vx3H84OBga2vLYrFotdrp6ekc0z62\nE6/XW1ZWVlNTQ5Lk2dkZJ/p8Pp1O19LSsrGx8eDMj+dkZWWlqanJYDDMzc3t7u52dHRUVFRMTk5y\no6lUyuFw4DhutVpjsdgD8j+GE4ZhrFarSqVyOBzpdJrX/X6/Xq9vbm5eXV29f6YY8uvkt1f64uJi\ndHRUrVZ3dXUdHx9zYjAYbGxsrK2tnZ+fF79XHp3wp399ff3+mScnJyRJqtVqiqKSySQner3e0tLS\n9vb2cDgsZru8ONne3iYIorKycmpqSvyqm32MZVmXyyXymF1zUl1dPTY2xodut1ur1d61cmJior6+\nHkEQHMctFksgEOD0nZ2drKv7R/h8vqqqqtv72OfP8NUr+OwZVKng+/cwEpHACU3TCoWCpun9/f1o\nNOrxeLq7u7mh8/NzkefhLvhvF0mSGfXTJ4ii0O2GsRj89g22tUGDAV718Qc6YVkWRVGapoXi5eVl\nLtXfhGEYu93+K0gkoFIJnc7McDwONRooqOEhbyvBYDAWi3V2dgpFmUwmwRuHAAzDBgYGfgVfvoBo\nFLx5kxl+/hy8fg18Pl54krW+p6ent7eX+5xKpUpKSm7uEYlE5HJ5UVGRpJXfC8MAuRwoFNfEFy/A\n9+98lH1PBgcHv15BURQnLiwsNDQ0mEwmm80GAMAwLB6Pn56e5rd6ITgO4nHw48c18fAQ4DgfZTtR\nKpUvryguLubE8vLy2dnZQCCwt7e3trZmNBpRFB0fHxcuhHn9nWM0AhQFHk9GYVng9QKzmReyT9et\n8GessLCwoKAAQRCKovr6+mQyGUEQCIL4/f6ZmZnh4WFJyxeAIODjR/DhA5DLQVsbODwENhtQqcDb\nt5k5wnZxf+9aXl62WCx86PF46urquOcJQRCLi4vS9q5bcLmgXg+fPoU4Dt+9gwwjHBT7jA+HwyaT\n6ejoKA8FSoMoJ8lksrW1NRgM5ruaXBDlZGRkRKPRmM1ms9m8tLTE66FQKBQK5a22P0MGc+g5GIYB\nACT8VyEXRPWuu7Db7VLVkTs53ZN/ip+ygYFFORXIAQAAAABJRU5ErkJggg==\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'm not sure if this should be part of RMG-Py.
- If we decide to add it, could you clear the output cells before committing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm clearing the output cells.
I don't mind not including this notebook (tell me what you think, I'm OK with both ways). I think we previously discussed adding a resonance generation feature to the website, let's discuss it and decide whether or not to ha a filtration switch (or filter by default). It could be an excellent replacement for this notebook.
rmgpy/molecule/pathfinder.py
Outdated
if (((atom.isNitrogen() or atom.isSulfur()) and atom.lonePairs in [1, 2, 3]) | ||
or (atom.isOxygen() and atom.lonePairs in [2, 3])): | ||
return True | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could directly return the value of your boolean expression (also for gain_lone_pair).
Also, loose
should be lose
.
rmgpy/data/kinetics/common.py
Outdated
# Remake resonance structures with new labels | ||
if resonance: | ||
species.generate_resonance_structures(keepIsomorphic=True) | ||
if all([mol.reactive for mol in species.molecule]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are potential cases where there might be an unreactive molecules when calling this method? Independent atom IDs are pretty important for degeneracy, so depending on the situation, a more comprehensive fix might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Molecules with a reactive=False
flag get generated, but are not allowed to further react in families. However, we do calculate degeneracy for these reactions. My concern here was a line below that resets the molecule list (species.molecule = [mol]
), removing all unreactive molecules. Even if we re-generate resonance structures, we won't regenerate these unreactive molecule back. I modified this that condition so that unreactive molecules are appended if needed.
rmgpy/data/kinetics/common.py
Outdated
elif resonance: | ||
# IDs are already independent, generate resonance structures if needed | ||
for species in input_species: | ||
species.generate_resonance_structures(keepIsomorphic=True) | ||
if len(species.molecule) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is already done by Species.generate_resonance_structures()
.
Thanks for the helpful comments, @mliu49 !
|
Here's a comparison of number of species and resonance structure generation time for this brunch (w/ and w/o filtration) vs. master:
The overall time is indeed higher (about a factor of 5), but most of it is spent in generating the new structures (using new transitions not available on current master). The filtration time ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes!
- It looks like filtration doesn't take that long. The increase in time for resonance structure generation overall is a bit concerning though. I'm not sure that there's much we can do about it though.
- Regarding augmented InChI, the fact that we were augmenting them with electron positions was already going against the philosophy of InChI. I think Nick used resonance structure generation to try and get canonical numbering for the electrons. I don't think removing it should have a significant impact though.
- Regarding Chemical identity comparison for Molecules and Species #1329, I was mainly thinking about the
checkForExistingSpecies
method. In that PR, I completely replace it with InChI comparison. I think that could greatly simplify the changes necessary to that method in this PR.
rmgpy/rmg/react.py
Outdated
rxn.pairs = [(molDict[reactant.molecule[0]], molDict[product.molecule[0]]) for reactant, product in rxn.pairs] | ||
except ValueError: | ||
rxn.generatePairs() | ||
rxn.pairs = [(molDict[reactant.molecule[0]], molDict[product.molecule[0]]) for reactant, product in rxn.pairs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking that you could just set rxn.pairs = None
if you encounter an error, since the pairs will be generated later if they don't exist (here).
rmgpy/molecule/molecule.py
Outdated
for atom in self.vertices: | ||
if atom.isHydrogen() and atom.label == '': | ||
hydrogens.append(atom) | ||
hydrogens = filter(lambda at: at.isHydrogen(), self.vertices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this loses the check on the atom label. Was that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintentional. I'm dropping this small commit
rmgpy/data/kinetics/family.py
Outdated
def addReverseAttribute(self, rxn): | ||
reactants = [reactant if isinstance(reactant, list) else [reactant] for reactant in reactants] | ||
for reactant in reactants: | ||
if not all([mol.reactive for mol in reactant]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might not have been clear about the list of lists. In the typical case, this takes reactants as [reactantA, reactantB]
. However, it can also properly handle [[A_res1, A_res2], [B_res1, B_res2]]
, where the sublists are lists of resonance structures for each reactant. I think I was also confused in my original comment about this part; it was correct to check that all the reactants are reactive.
Now that I think about it more though, it may be unnecessary do the check here at all. Since you already check in __generateReactions()
where the actual generation is done, I don't think you need to change this method at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'll revert the original change
rmgpy/species.py
Outdated
for species in species_list: | ||
if isinstance(species, Species): | ||
if self.isIsomorphic(species): | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also do return self.isisomorphic(species)
.
rmgpy/species.py
Outdated
return True | ||
else: | ||
raise ValueError('Unexpected value "{0!r}" for species_list parameter;' | ||
' should be a List of Species objects.'.format(species)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use TypeError?
Thanks for the comments @mliu49! I pushed additional fixups. |
Could you convert the two assertions Codacy flagged into exceptions as well? Then I think you can go ahead and squash all of the fixups. |
Thanks - dealt with the two assertions, squashed and rebased |
rmgpy/data/kinetics/family.py
Outdated
if isinstance(struc, Molecule): | ||
struc.update() | ||
else: | ||
struc.resetRingMembership() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you added this following my earlier comment; could you also move Xiaorui's additions up to here and remove the section at line 1334?
rmgpy/data/kinetics/family.py
Outdated
@@ -1526,7 +1554,7 @@ def addReverseAttribute(self, rxn): | |||
self.forbidden = ForbiddenStructures() # Initialize with empty one | |||
try: | |||
reactionList = self.__generateReactions([spc.molecule for spc in rxn.products], | |||
products=rxn.reactants, forward=True) | |||
products=rxn.reactants, forward=True, react_non_reactive=react_non_reactive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor, but I think it would be better to put the react_non_reactive argument on the next line and keep the indentation inline with the open parenthesis like before.
rmgpy/data/thermo.py
Outdated
else: | ||
# For noncyclics, default to original algorithm of ordering thermo based on the most stable enthalpy | ||
H298 = numpy.array([t.getEnthalpy(298.) for t in thermoDataList]) | ||
indices = H298.argsort() | ||
indices = numpy.array([indices[i] for i in xrange(len(indices)) if species.molecule[indices[i]].reactive] +\ | ||
[indices[i] for i in xrange(len(indices)) if not species.molecule[indices[i]].reactive]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like you need xrange at all, and could just do for i in indices
.
rmgpy/molecule/filtration.py
Outdated
for mol in filtered_list: | ||
logging.info('Structure: {0}\n{1}Reactive: {2}'.format(mol.toSMILES(),mol.toAdjacencyList(),mol.reactive)) | ||
logging.info('\n') | ||
raise AssertionError('Each species must have at least one reactive structure. Something probably went wrong' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it would be a good idea to create a ResonanceError class to be used here.
rmgpy/molecule/inchi.py
Outdated
@@ -376,6 +376,8 @@ def _generate_minimum_resonance_isomer(mol): | |||
Next, we return the candidate with the lowest unpaired electrons metric. | |||
|
|||
The metric is a sorted list with indices of the atoms that bear an unpaired electron | |||
|
|||
This function is currently depreciated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling of deprecated
. Also, it would be good to mention why.
I think you can add a deprecation warning similar to f7dd2f1 for now.
rmgpy/rmg/model.py
Outdated
return True, True, spec | ||
|
||
# As a last resort, check using molecule.fingerprint if the obj matches any existing species, | ||
# and if it does, generate resonance structures w/o filtration and check for isomorphism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like you're checking fingerprints as the comment says.
@mliu49, I agree with all of your comments, I addressed them in fixup commits |
rmgpy/data/thermo.py
Outdated
indices = numpy.array([indices[i] for i in xrange(len(indices)) if species.molecule[indices[i]].reactive] +\ | ||
[indices[i] for i in xrange(len(indices)) if not species.molecule[indices[i]].reactive]) | ||
indices = numpy.array([i for i in indices if species.molecule[indices[i]].reactive] +\ | ||
[i for i in indices if not species.molecule[indices[i]].reactive]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think indices[i]
should be replaced by i
as well.
If resonance structures were already generated for `new_item`, they may contain non-reactive structures that we'd like to keep. Re-generating resonance structures could remove them. This makes sure that resonance structures would only be generated for Species that contain only reactive structure.
If `saturateH` is `True`, then saturate `mol` with hydrogens before generating the resonance structures and remove the hydrogens before returning `isomorphic_isomers`. This is useful when resonance structures are generated for molecules in which all hydrogens were intentionally removed as in generating augInChI. Otherwise, RMG will probably get many of the lonePairs and partial charges in a molecule wrong.
Here are the test results for this branch: There were no changes for eg1, eg3, eg6, eg7, and solvent_hexane In eg5, the order in which networks were generated was different, so we get things like:
@mliu49, I was wrong yesterday to think that only the first network reaction instance gets included in the model, RMG seems to be doing the right thing here. In the NC test, this branch corrected the representative structure of NNOH, so the test treats them as different species. The methane test (with reactive N2) had some discrepancies, with 4 species not included in the new model, and 2 additional species. Some kinetics are different due to two reasons:
The updated thermo, however, isn't alwas better. e.g.:
[O-][N+]=[N+][O-] turned into a more representative form, [O][N]N=O, but now only one N group is recognized. It's a step in the right direction, but GAV must be update. The Methylcyclohexane (MCH) test had some difference in species, probably due to the only core reaction kinetics difference:
Seems like labels *1 and *2 switched places here. I'm not too familiar with this family, but this is perhaps due to different sorting of the resonance structures. The new H2S test crashed because of incorrect parsing of library names, I'll try to deal with it separately. |
Thanks for looking at the tests! I think this is ready to be merged. The final thing (which I almost forgot about) is whether or not to keep |
No problem relocating it. |
As clearly stated in the InChI paper: "Mesomerism is effectively eliminated in InChI. Mesomers have the same InChIs (this is true for all possible InChI layouts of layers). Actually, this is very natural. Mesomeric structures of a molecular entity have the same basic connectivity but differ in bond orders, and maybe by having atomic charges on different atoms. InChI does not use bond orders and does not place charges on particular atoms; the placement of hydrogen atoms in a mesomeric system, which would be important for InChI, is always the same." Source: InChI, the IUPAC International Chemical Identifier, J. Cheminform 2015, 7, 23, doi: 10.1186/s13321-015-0068-4
Because training reactions are often entered with a specific resonance form, this prevents changes to resonance structure generation from affecting addition of rules from training sets.
Notebook removed. It should be ready to go now. |
This PR adds new resonance transitions and introduces the resonance filtration function based on heuristics.
New resonance transitions added
lone_pair_multiple_bond_resonance_structures
Resonance structures formed by lone electron pair - multiple bond shifts.
Example:
lone_pair_radical_multiple_bond_resonance_structures
Resonance structures formed by lone electron pair - radical - multiple bond shifts.
Example:
N5dc_resonance_structures
Resonance structures formed by radical and lone pair shifts mediated by an N5dc atom (not to be confused with
N5ddc_N5tc_resonance_structures
).Example:
(in the above example, the two
O
atoms are "different" - this addresses issue #1293)Other resonance modifications
adjacent_resonance_structures
was renamedally_delocalization_resonance_structures
and elaborated for heteroatomsN5dd_N5ts_resonance_structures
was renamedN5ddc_N5tc_resonance_structures
and elaborated for sulfurlone_pair_radical_resonance_structures
was elaborated for heteroatomsResonance filtration
This module contains methods for filtering a list of Molecules representing a single Species,
keeping only the representative structures.
The rules this module follows are (by order of importance):
Other modifications introduced by this PR
reactive
attribute to the Molecule class. If a non-representative molecule (one that would have been filtered out if formed bygenerate_resonance_structures()
) is directly formed during theenlarge
step, the non-representative Molecule is marked anreactive=False
, and the respresentative structures are appended to the Molecule list. If, however, this species already exists (and doesn't contain the non-representative structure), the species are merged.