-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onlyHeterocyclics option for Machine Leaning Estimator Settings #1621
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1621 +/- ##
==========================================
+ Coverage 41.88% 41.88% +<.01%
==========================================
Files 176 176
Lines 29354 29368 +14
Branches 6052 6059 +7
==========================================
+ Hits 12294 12301 +7
- Misses 16183 16188 +5
- Partials 877 879 +2
Continue to review full report at Codecov.
|
One comment before I review in detail. Since heterocyclics are a subset of cyclics, it seems to me like it would make more sense for
|
@mliu49 What you said makes more sense. I will make changes accordingly |
1 similar comment
@mliu49 What you said makes more sense. I will make changes accordingly |
70eb9c8
to
8ccb7a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only two minor comments.
species will be estimated. ``onlyHeterocyclics`` means that only heterocyclic species will be estimated. Note that | ||
if ``onlyHeterocyclics`` setting is set to True, the machine learning estimator is restricted to only heterocyclic | ||
species regardless of the ``onlyCyclics`` setting. If ``onlyCyclics`` is False and ``onlyHeterocyclics`` is True, | ||
RMG will log a warning that ``onlyCyclics`` should also be True and because heterocyclics are a subset of cyclics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly awkward wording "and because heterocyclics are a subset of cyclics"
Maybe "and the estimator will be restricted to heterocyclics because they are a subset of cyclics`?
rmgpy/data/thermoTest.py
Outdated
@@ -407,6 +430,7 @@ def test_thermo_generation_ml_settings(self): | |||
self.assertIsInstance(thermo, ThermoData) | |||
self.assertTrue('ML Estimation' in thermo.comment, 'Thermo not from ML estimation, test purpose not fulfilled') | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove this extra line?
Could you address my last two comments and rebase? I think this can be merged soon. |
'isHeterocyclic' function will return True if a molecule is heterocyclic or return False if not
3 unit tests for molecule.isHeterocyclic() were added. It checks whether it returns False for cyclohexanol and returns True for furan and pyridine
A new option 'onlyHeterocyclics' is added for Machine Learning Thermo Estimator. If 'onlyHeterocyclics' is True, the thermo of only heterocyclics species are estimated by mlEstimator regardless of 'onlyCyclics' setting. If 'onlyCyclics' is False and 'onlyHeterocyclics' is True, then RMG will log a warning that 'onlyCyclics' should be also True since heterocyclics is a subset of cyclics. I also added lines that will log a warning when 'onlyCyclics' is False and 'minCycleOverlap' is greater than zero since the machine learning estimator is restricted to only cyclic species when 'minCycleOverlap' is greater than zero. If 'onlyCyclics' is False and 'heteroCyclics' is True and 'minCycleOverlap' is greater than zero, it will log a warning and the machine learning estimator will be restricted to only heterocyclic species with the speficied minimum cycle overlap.
4 unit tests for 'onlyHeterocyclics' option in Machine Learning Thermo Estimator are added. Case 1: 'onlyCyclics' = True, 'onlyHeterocyclics' = True test cyclic species gives None Case 2: 'onlyCyclics' = True, 'onlyHeterocyclics' = True test heterocyclic species gives ML thermo Case 3: 'onlyCyclics' = False, 'onlyHeterocyclics' = True test cyclics species gives None Case 4: 'onlyCyclics' = False, 'onlyHeterocyclics' = True test heterocyclic species gives ML thermo
8ccb7a0
to
3e495db
Compare
@mliu49 I applied all the changes you have suggested :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good!
Motivation or Problem
Adds one more option for Machine Learning Estimator settings to allow one to use machine learning for only heterocyclic species and group additivity for the rest
Description of Changes
onlyHeterocyclics
option is added to Machinermgpy.rmg.input.mlEstimator
, and a new methodisHeterocyclic
is added tormgpy.molecule.molecule.py
. Relevant changes are made in other rmg files including the documentation.If both
onlyCyclics
andonlyHeterocyclics
options are True, thenonlyCyclics
option overrides, and all cyclic species thermo will be estimated by machine learning.Testing
3 unit tests are added to
rmgpy.molecule.moleculeTest.py
to check that species can be correctly identified as heterocyclic or not byisHeterocyclic
method.3 unit tests are added to
rmgpy.data.thermoTest.py
to check that mlEstimator is applied for heterocyclic species for appropriate mlEstimator settings.Reviewer Tips
Check unit tests and run simple mlEstimators examples with heterocyclic settings turned on and off