Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve group additivity ranking for aromatic species #1731

Merged
merged 4 commits into from
Sep 28, 2019
Merged

Conversation

mliu49
Copy link
Contributor

@mliu49 mliu49 commented Sep 26, 2019

Motivation or Problem

Estimating thermo for aromatic species using group additivity can sometimes give weird results because we will get an estimate for every resonance structure and then take the one with the lowest H298. In some cases, this can lead to severe underprediction of enthalpy if there is an inappropriate group which results in a lower enthalpy than a more appropriate group.

For example:

image

This species has resonance structures where the radical can delocalize into the aromatic ring. One of those structures gives the estimate with the lowest H298 of 61.4 kcal/mol:
Thermo group additivity estimation: group(Cs-(Cds-Cds)CbHH) + group(Cb-Cs) + group(Cb-(Cds-Cds)) + group(Cds-Cds(Cds-Cds)Cs) + group(Cb-H) + group(Cds-CdsCbH) + group(Cb-H) + group(Cds-Cds(Cds-Cds)H) + group(Cb-H) + group(Cb-H) + group(Cds-CdsHH) + group(Cdd-CdsCds) + polycyclic(s2_6_6_ben_ene_1) + radical(Benzyl_S_dihydronaphthalene)

However the calculated H298 of the molecule is 101.7 kcal/mol. If we were to use GAV to estimate the thermo of resonance structure depicted above, then we would get 107.3 kcal/mol with the following groups:
Thermo group additivity estimation: group(Cbf-CbCbCbf) + group(Cbf-CbCbCbf) + group(Cb-(Cds-Cds)) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cb-H) + group(Cds-CdsCbH) + group(Cds-CdsHH) + polycyclic(s2_6_6_naphthalene) + radical(Cds_S)

Description of Changes

This PR adds an additional parameter, number of aromatic rings, by which to rank GAV estimates, in addition to H298.

The number of aromatic rings is determined by the actual bond orders in the resonance structure, rather than true aromaticity detection.

The result is that estimates for resonance structures with more aromatic rings is prioritized over those with fewer, even if it results in a higher H298.

Testing

Besides testing with the species mentioned above, I ran comparisons against the SABIC_aromatics_1dHR library (not currently on master), which contains ~350 aromatic/near-aromatic species calculated using CBS-QB3 with 1D hindered rotors.

The H298 MAE on master is 5.263 kcal/mol, while the MAE on this branch is 5.228 kcal/mol. This is not a substantial difference, mainly because this change does not affect many species. However, it does suggest that this change does not have noticeable negative effects.

mliu49 added 3 commits May 14, 2019 16:33
Group values are often fitted to the most aromatic resonance
structure of a molecule. This can cause issues if the thermo
value for a less representative structure ends up being lower
than that of the aromatic structure, even though it has a more
accurate group estimate.

This approach ensures more predictable behavior after adding
new group values for aromatic species.
@codecov
Copy link

codecov bot commented Sep 26, 2019

Codecov Report

Merging #1731 into master will increase coverage by 0.06%.
The diff coverage is 38.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1731      +/-   ##
==========================================
+ Coverage   32.57%   32.64%   +0.06%     
==========================================
  Files          87       87              
  Lines       26090    26170      +80     
  Branches     6866     6877      +11     
==========================================
+ Hits         8500     8542      +42     
- Misses      16627    16656      +29     
- Partials      963      972       +9
Impacted Files Coverage Δ
rmgpy/molecule/molecule.py 0% <0%> (ø) ⬆️
rmgpy/data/thermo.py 60.77% <100%> (+0.09%) ⬆️
rmgpy/data/kinetics/groups.py 17.52% <0%> (-0.49%) ⬇️
rmgpy/molecule/symmetry.py 0% <0%> (ø) ⬆️
rmgpy/rmg/pdep.py 12.21% <0%> (ø) ⬆️
rmgpy/reaction.py 0% <0%> (ø) ⬆️
rmgpy/species.py 0% <0%> (ø) ⬆️
rmgpy/yml.py 15.71% <0%> (ø) ⬆️
rmgpy/quantity.py 0% <0%> (ø) ⬆️
rmgpy/rmg/input.py 34.34% <0%> (ø) ⬆️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e37656...276a91c. Read the comment docs.

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks straight forward and OK.
I added one commentt re commit messages.
Would you like to add an actual H298 GAV test? Maybe for the species in your example?

@@ -248,6 +248,8 @@ cdef class Molecule(Graph):

cpdef identify_ring_membership(self)

cpdef int count_aromatic_rings(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the commit message to lower_case_with_underscores to be consistent with the code?
(same comment for the Minor code improvements to ThermoDatabase.prioritizeThermo commit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@mliu49 mliu49 force-pushed the aromatic_gav branch 3 times, most recently from 701fb2f to 276a91c Compare September 27, 2019 21:05
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants