-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve group additivity ranking for aromatic species #1731
Conversation
Group values are often fitted to the most aromatic resonance structure of a molecule. This can cause issues if the thermo value for a less representative structure ends up being lower than that of the aromatic structure, even though it has a more accurate group estimate. This approach ensures more predictable behavior after adding new group values for aromatic species.
Codecov Report
@@ Coverage Diff @@
## master #1731 +/- ##
==========================================
+ Coverage 32.57% 32.64% +0.06%
==========================================
Files 87 87
Lines 26090 26170 +80
Branches 6866 6877 +11
==========================================
+ Hits 8500 8542 +42
- Misses 16627 16656 +29
- Partials 963 972 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks straight forward and OK.
I added one commentt re commit messages.
Would you like to add an actual H298 GAV test? Maybe for the species in your example?
@@ -248,6 +248,8 @@ cdef class Molecule(Graph): | |||
|
|||
cpdef identify_ring_membership(self) | |||
|
|||
cpdef int count_aromatic_rings(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change the commit message to lower_case_with_underscores to be consistent with the code?
(same comment for the Minor code improvements to ThermoDatabase.prioritizeThermo
commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
701fb2f
to
276a91c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Motivation or Problem
Estimating thermo for aromatic species using group additivity can sometimes give weird results because we will get an estimate for every resonance structure and then take the one with the lowest H298. In some cases, this can lead to severe underprediction of enthalpy if there is an inappropriate group which results in a lower enthalpy than a more appropriate group.
For example:
This species has resonance structures where the radical can delocalize into the aromatic ring. One of those structures gives the estimate with the lowest H298 of 61.4 kcal/mol:
Thermo group additivity estimation: group(Cs-(Cds-Cds)CbHH) + group(Cb-Cs) + group(Cb-(Cds-Cds)) + group(Cds-Cds(Cds-Cds)Cs) + group(Cb-H) + group(Cds-CdsCbH) + group(Cb-H) + group(Cds-Cds(Cds-Cds)H) + group(Cb-H) + group(Cb-H) + group(Cds-CdsHH) + group(Cdd-CdsCds) + polycyclic(s2_6_6_ben_ene_1) + radical(Benzyl_S_dihydronaphthalene)
However the calculated H298 of the molecule is 101.7 kcal/mol. If we were to use GAV to estimate the thermo of resonance structure depicted above, then we would get 107.3 kcal/mol with the following groups:
Thermo group additivity estimation: group(Cbf-CbCbCbf) + group(Cbf-CbCbCbf) + group(Cb-(Cds-Cds)) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cds-CdsCbH) + group(Cds-CdsHH) + polycyclic(s2_6_6_naphthalene) + radical(Cds_S)
Description of Changes
This PR adds an additional parameter, number of aromatic rings, by which to rank GAV estimates, in addition to H298.
The number of aromatic rings is determined by the actual bond orders in the resonance structure, rather than true aromaticity detection.
The result is that estimates for resonance structures with more aromatic rings is prioritized over those with fewer, even if it results in a higher H298.
Testing
Besides testing with the species mentioned above, I ran comparisons against the SABIC_aromatics_1dHR library (not currently on master), which contains ~350 aromatic/near-aromatic species calculated using CBS-QB3 with 1D hindered rotors.
The H298 MAE on master is 5.263 kcal/mol, while the MAE on this branch is 5.228 kcal/mol. This is not a substantial difference, mainly because this change does not affect many species. However, it does suggest that this change does not have noticeable negative effects.