-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chemical identity comparison for Molecules and Species #1329
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1329 +/- ##
==========================================
- Coverage 41.67% 41.54% -0.13%
==========================================
Files 176 176
Lines 29102 29103 +1
Branches 5988 5975 -13
==========================================
- Hits 12127 12090 -37
- Misses 16138 16181 +43
+ Partials 837 832 -5
Continue to review full report at Codecov.
|
0e1c3c4
to
c3e65d9
Compare
c46bd06
to
f9d2f17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this look pretty good except for the naming issues we already talked about, I just have a few questions and comments.
rmgpy/molecule/molecule.pxd
Outdated
@@ -136,6 +136,10 @@ cdef class Molecule(Graph): | |||
cdef str _inchi | |||
cdef str _smiles | |||
|
|||
cpdef bint is_equal(self, Molecule other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this function come from?
rmgpy/rmg/model.py
Outdated
if spec is not None and spec.is_same(molecule): | ||
self.speciesCache.pop(i) | ||
self.speciesCache.insert(0, spec) | ||
# Check whether the resonance form is already included in the species |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't all of the resonance forms we care about for the species already be generated? Can we avoid the isomorphism check at the end?
rmgpy/reaction.py
Outdated
@@ -412,24 +412,30 @@ def matchesSpecies(self, reactants, products=None): | |||
else: | |||
return False | |||
|
|||
def isIsomorphic(self, other, eitherDirection=True, checkIdentical = False, | |||
checkOnlyLabel = False, checkTemplateRxnProducts=False): | |||
def isIsomorphic(self, other, eitherDirection=True, level=1, checkTemplateRxnProducts=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should rename this to something more general...like is_same, but I'm not sure how much of a pain that would be.
rmgpy/reactionTest.py
Outdated
r1 = self.makeReaction('A=B') | ||
self.assertTrue(r1.isIsomorphic(self.makeReaction('a=B'))) | ||
self.assertTrue(r1.isIsomorphic(self.makeReaction('b=A'))) | ||
self.assertFalse(r1.isIsomorphic(self.makeReaction('B=a'),eitherDirection=False)) | ||
self.assertFalse(r1.isIsomorphic(self.makeReaction('B=a'), eitherDirection=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These whitespace changes should be in a separate commit
rmgpy/reaction.py
Outdated
@@ -394,18 +394,20 @@ def matchesSpecies(self, reactants, products=None): | |||
Compares the provided reactants and products against the reactants | |||
and products of this reaction. Both directions are checked. | |||
|
|||
Uses level 2 of `isomorphis_species_lists`, which compares InChI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
rmgpy/molecule/translator.py
Outdated
del mol_elementcount[key] | ||
# If we didn't find anything, that would mean the result is wrong | ||
conditions.append(found) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the check below (inchi_elementcount == mol_elementcount) works because util.retrieveElementCount(identifier) doesn't account for isotopes and the above loop moves isotopes to there corresponding element?
@@ -1841,17 +1841,13 @@ def generate_products_and_reactions(order): | |||
|
|||
products0 = reaction.products[:] if forward else reaction.reactants[:] | |||
|
|||
# For aromatics, generate aromatic resonance structures to accurately identify isomorphic species | |||
if prod_resonance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we get rid of this stuff?
This PR is out of date now. The commits related to InChI generation for isotopes has been merged via #1453. The species comparison changes will be refactored and a new PR will be opened once that's ready. |
Major changes have been made to this PR
Please see the updated initial comment for summary and explanation of the new changes. Performance comparison with minimal example:
This is a 14% reduction in runtime. I expect more significant differences for models with larger species and/or more resonance structures. The model core is identical on these two branches. The edge on this branch has 7 fewer species, most likely due to species being identified as duplicates using the non-strict comparison. I think this is a positive change though, since this is a more robust method of identifying whether species are truly the same (as compared to generating resonance structures). RMG-tests results:
|
01aaf8f
to
e0ce75d
Compare
e0ce75d
to
719f66b
Compare
More pythonic and prevents extra getter/setter functions from cluttering the namespace.
Allow instantiation using either SMILES or InChI Change SMILES and InChI to all lowercase for easier typing Make smiles and inchi read only properties that generate and save the respective identifiers to private attributes
For strict=False, ignores electrons (i.e. bond order, radicals, lone pairs, etc.) Only implemented for Molecule isomorphism, exceptions raised in other cases For atoms, strict argument is added to the `equivalent` method For bonds, strict argument is handled within vf2
Because its functionality is replaced by the strict argument
Also add strict argument which is passed to isIsomorphic
Since this option allows us to compare resonance structures directly, we don't need to worry about nonreactive resonance structures or aromatic resonance structures. Also, the return value of checkForExistingSpecies is reduced to a single value, since returning whether or not the species was found is redundant with returning the species itself.
No need to ensure species - the reactions will always contain Molecules at this point, so either Molecule or Species objects are acceptable for the product list to compare with Aromatic resonance structure generation can be omitted by using the strict=False option, which neglects electrons during the isomorphism comparison
Remove checks that the product list is converted in place to Species objects, since that step was removed
Similar to strict option for isomorphism, setting strict=False ignores electrons and bond orders for the comparison, which is useful for comparing different resonance structures
Of Molecule, Species, and Reaction, which is passed to Graph.isMappingValid
Instead, use the strict=False option to ignore resonance issues Change ensure_species to not generate resonance structures by default Add call to ensure_independent_atom_ids in addReverseAttribute because the changes to resonance structure generation means that products may not necessarily have proper atom IDs
Add SMILES property and unit tests
In certain situations, we would call ensure_species with keep_isomorphic=False, which would ultimately result in wrong degeneracy values. This fixes that oversight and makes sure that keep_isomorphic=True when generating reactions. A unit test is also added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, I just have a few noted questions.
"""InChI string representation of this species. Read-only.""" | ||
if self._inchi is None: | ||
if self.molecule: | ||
self._inchi = self.molecule[0].InChI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is self.molecule[0].InChI always not None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Molecule.InChI
is a property which will set the InChI if it's not already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
@@ -1321,28 +1326,33 @@ def get_mean_sigma_and_epsilon(self, reverse=False): | |||
mean_epsilons = reduce((lambda x, y: x * y), epsilons) ** (1 / len(epsilons)) | |||
return mean_sigmas, mean_epsilons | |||
|
|||
def isomorphic_species_lists(list1, list2, check_identical=False, only_check_label=False, generateInitialMap=False): | |||
|
|||
def same_species_lists(list1, list2, check_identical=False, only_check_label=False, generate_initial_map=False, strict=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're renaming this already can we either generalize it so it works for N species or rename it in a less general way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean since it can only compare list of up to 3 species? Do you any name suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isomorphic_species_trios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isomorphic_species_lists_le3, isomorphic_species_lists_lt4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's not worth worrying about the confusion that much?
Improvements:
is_same
method checks "chemical identity" via InChI and multiplicity (better name suggestions welcome)is_same
incheckForExistingSpecies()
instead of isomorphism. Testing has shown that this is much faster and also more accurate.April 2019 Revision
I have finally refactored and updated this branch. There are a number of changes from the prior version, but I decided not to make a new PR.
I decided to abandon the idea of creating a new
is_same
method, since it seemed to add unnecessary complexity, especially since people are already familiar with usingisIsomorphic
andisIdentical
.Major changes
strict
argument to almost all isomorphism related methods. The default isstrict=True
which is the same as current behavior. Specifyingstrict=False
ignores electrons when comparing atoms and bonds, meaning bond orders, radicals, and lone pairs are not considered at all. The result is a resonance structure independent comparison.strict=False
comparisons for:strict=False
is slower than strict isomorphism (due to more potential matches), it is more than compensated for by eliminating resonance structure generation.checkForExistingSpecies
, though non-reactive Molecules can still be created.Quality of life changes
fromInChI
orfromSMILES
Bugfixes
ensure_independent_atom_ids
, make sure we select a reactive molecule when generating resonance structuresensure_independent_atom_ids
, passkeep_isomorphic=True
toensure_species
for accurate degeneracy determination