Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Kinetics Data for Hydrogen Abstraction by Chlorine Atom #257

Merged
merged 10 commits into from
Mar 24, 2018
Merged

Conversation

zjburas
Copy link
Contributor

@zjburas zjburas commented Mar 12, 2018

Companion to RMG-Py PR #1310.

Consists of 3 chlorine-related additions to the database:

  1. 41 training reactions for hydrogen abstraction reactions by chlorine (Cl+RH=HCl+R), all taken from either NIST Kinetics or IUPAC. Most of the kinetics are based on temperature-dependent experimental measurements, although for a few of the slow reactions without any reported measurements (e.g., Cl+CH3OH=HCl+CH3O) theoretical predictions from literature were used. These types of reactions should not be pressure-dependent.

  2. 11 Benson GAV's for carbon bonded to at least one chlorine atom. These all come from Benson's 1976 book "Thermochemical Kinetics", which were fit to experimental measurements. For many of the groups, Cp was not determined up to 1500 K, so as an approximation I padded the missing data points with the Cp value measured at the highest T.

  3. Thermo library containing experimental measurements for 17 chloroalkanes. These were the species used by Benson to fit the original GAV's, and could perhaps be used in a unit test.

Currently, the purpose of this branch is to estimate initial radical pools in chlorine-initiated oxidation experiments. In these experiments the Cl is first generated by photolysis of a chemical precursor (e.g. Cl2=2Cl at 351 nm) and then abstracts hydrogens from different sites on the fuel molecule (e.g., heptane) to initiate the chemistry. I have tested this branch for Cl+n-heptane, i-octane and 1-butanol and have obtained reasonable product branching in each case. For example, in the butanol case very little H-abstraction from the O-H site is predicted relative to the C-H sites.

Technically the additions to the thermochemical database are unnecessary, because the only chlorine-containing species involved in H-abstraction are Cl and HCl, both of which cannot be estimated by GAV and appear in the primary thermo library instead. However, I decided to include Benson's Cl-containing GAV's in this PR to facilitate future additions to chlorine chemistry. For example, R-recombination (R+Cl=RCl), "Cl-abstraction" (RCl+X=R+XCl) and SN2 reactions (RCl+X=RX+Cl) involving chlorine (and other halogens) could all be added in the future.

I have not forbid Cl from reacting in any families, but maybe I should until we actually have kinetics data for Cl reacting in those families.

I suppose some unit tests are in order next. @alongd , can you give me some guidance?

@zjburas
Copy link
Contributor Author

zjburas commented Mar 13, 2018

The training reactions cover H-abstraction from as wide a range of CHO functional groups as I could find reliable data for:

Primary, secondary and tertiary unsaturated alkanes
alkenes
aldehydes
ketones
alcohols
peroxides
esters
ethers
cyclic alkanes
resonantly-stabilized sites
aromatics

Suitable templates were matched with each training reaction as verified by running convertKineticsLibraryToTrainingReactions.ipynb

Although there are measurements for H-abstraction from CHONS compounds in literature, the current PR is restricted to CHO.

@alongd
Copy link
Member

alongd commented Mar 13, 2018

It would be a good idea to add a small unit test to test_fromSMILES() under rmgpy/molecule/parserTest.py.

I'm not aware of explicit unit tests that test for GAV or kinetics, I think a good implementation would be to generate a new test for the RMG-test repository which will be used as a benchmark. In my understanding, this should be a job that runs in less than 10 minute (you could choose a smaller fuel molecule if needed). You should have saveEdgeSpecies=True in options, PDep isn't necessary. You can see an example in this PR. Principally you'd make sure that the families you'd like to test have corresponding reactions in the output, and same for species GAV. However, as I understand, this test isn't expected to include carbon bonded to chlorine - so perhaps you could add some sample species marked with reactive=False to the test input file so their thermo will be tested as well against a benchmark?

@alongd
Copy link
Member

alongd commented Mar 13, 2018

Travis tests fail here since they are run using the master branch of -Py. To ask Travis to use the corresponding chlorine -Py branch add a temporary commit (to be removed before merging) to the "before_install" section of the .travis.yml file:

  - cd RMG-Py
  - git checkout chlorine

@@ -7056,6 +7056,16 @@
kinetics = None,
)

entry(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to consider HCl as RH so the reverse reaction could also be found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Taken care of in new commit below.

Ea = (3201.07, 'J/mol'),
T0 = (1, 'K'),
),
rank = 3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the rank 3 intentional? I think that rank 3 is for CBS-QB3 or similar, and that rank 1 is for experimental data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch again. This is an artifact of using convertKineticsLibraryToTrainingReactions.ipynb to import training reactions. I've changed the ranks on all of the experimental training reactions to 1.

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!


entry(
index = 1234,
label = "Cl + C2H6 <=> ClH + C2H5",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps replace ClH with HCl for readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alongd
Copy link
Member

alongd commented Mar 13, 2018

Did you try running a thermo generator script w/o defining the new library, and see that the GAV results make sense?

@alongd
Copy link
Member

alongd commented Mar 13, 2018

RMG-tests have many significant discrepancies on this branch:
main_log.txt

Perhaps this is because many of the added H_Abs rates are for relatively low T, and they affect other averaging in the family?

@zjburas
Copy link
Contributor Author

zjburas commented Mar 13, 2018

After reviewing @faribas halogen branch of the RMG-database (ReactionMechanismGenerator/RMG-Py@master...faribas:halogens) and her thesis (http://hdl.handle.net/2047/D20213055) there is a lot of useful information there, but most of it is outside of the current scope of this PR. The relevant parts that I can add to this PR are:

  1. Updated GAV's for chlorine-containing groups. I was using Bensons' 1976 values, but @faribas used Chen and Bozelli's 1998 GAV's, which appear more accurate.
  2. Rate rules for H-abstraction reactions by Cl. However, several of these rate rules are redundant with the training reactions that I added (e.g., H-abstraction by Cl from a primary carbon). Also, should rate rules still be added to the database, or are training reactions preferred?

Of course, the scope of this PR can be expanded to include @faribas other database changes (HBI corrections for Cl-containing radicals, 2 new reaction families for Cl-abstraction and Cl2/HCl double-bond insertion). It just depends on our goals. For my current purposes, this PR is already sufficient, but I'm pretty sure that Cl-abstraction and Cl-recombination reactions will be desirable in the near future. Let me know how you would like to proceed, @alongd and @rwest .

@alongd
Copy link
Member

alongd commented Mar 13, 2018

IMO, we definitely want @faribas's additions, but it doesn't have to be part of this PR.

BTW, what about R-Recombination Cl + radical as part of the scope of the present PR?

We prefer training reactions over hard coded rate rules.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 13, 2018

For the kinetics of Cl-Recombination, @faribas did not add any new rate rules or training reactions, her branch simply relies on the averaging of other R-Recombination rates. This is probably not a bad approximation since most of these rates will be around the collision limit anyway. Nonetheless, if this PR were to include Cl-Recombination, a literature search for reliable rates would probably be a good idea.

Of more importance to Cl-Recombination is making sure that the thermo of the chlorinated product is accurate, such that the reverse decomposition rate can be predicted accurately. To this end, @faribas database branch already has a lot of useful data (GAV's, HBI's and NNI's), much of which I was already planning on adding.

So I would be ok adding Cl-recombination to his PR, but since this is creeping outside of my current goals I probably won't be able to work on it until the weekend.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 13, 2018

@alongd , regarding your question if the thermoEstimator script was run to check the GAV estimates: I did this for the 13 chlorine-containing compound used by Benson to fit/validate his original GAV's (all included in the Benson_Chloroalkanes thermo library). For all but 2 of these species the GAV estimates were good. I couldn't figure out what was wrong with the 2 problematic species, but it doesn't matter anymore since we are going to update those groups with Chen and Bozelli's values. I'll do the same test with the new GAV's, and hopefully I can reach a more decisive conclusion.

@rwest
Copy link
Member

rwest commented Mar 13, 2018

In general I'm in favor of small pull requests so would suggest that adding the other reaction families follows in a subsequent PR even if it's quite soon. But not a hard rule.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 13, 2018

Alright, I'll just focus on the H_abstraction family and accurate chloroalkane thermo in this PR. R_recombination, Cl_Abstraction, Cl2/HCl insertion and SN2 reactions can come in future PR's.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 15, 2018

After testing this branch out a bit more I've realized that degeneracy is not being accounted for correctly because degeneracy=1 for all training reactions by default. For example, the degeneracy of H-abstraction from isobutane is 9, but if the training reaction says degeneracy=1, then when RMG encounters this reaction in a mechanism it will take the training reaction rate and directly multiply it by 9.

I will go through and manually correct the degeneracy for this batch of 41 training reactions, but I thought that there was already a solution to this issue.

@mliu49
Copy link
Contributor

mliu49 commented Mar 15, 2018

@zjburas, Mark's fix_training_degneracy branch has a script which you should be able to use to correct the training reaction degeneracies.

Let us know if you run into any issues.

@alongd
Copy link
Member

alongd commented Mar 16, 2018

@zjburas , are there more additions underway? Let me know once you want me to run RMG-tests once more.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 16, 2018

I still have to replace the Benson GAV's with Chen and Bozelli's, and try to convert @faribas H-abstraction rate rules to training reactions if possible. I will work on this today.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 17, 2018

The thermo groups and library have been updated and tested. Next, the H-abstraction rate rules found by @faribas should be converted to training reactions if possible, which I will work on later today.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 18, 2018

After reviewing the sources of @faribas H-abstraction rate rules (Goldfinger and Senkan) I do think they could be converted to training reactions. However, the new training reactions that I would add are all for Cl + some chlorinated hydrocarbon (e.g., hexachloroethane). The Cl + hydrocarbon rates in the above references are already included in the current PR (e.g., Cl + ethane). Adding training reactions for the Cl + chlorinated hydrocarbon reactions would require the addition of new groups in the H-abstraction family, and because these reactions are not of immediate interest to my current project I would like to leave this to @skrsna and @rwest for a later PR.

If that's ok, then I'm done with the data entry portion of this PR, and you can run the test again @alongd . Also I will think about what unit tests to add.

@KEHANG , didn't you have a test to check whether changes to the thermo database made estimations for some benchmark species better or worse? If so, I have a library of ~50 experimentally measured chlorinated species that could be added to this benchmark.

@alongd
Copy link
Member

alongd commented Mar 18, 2018

@zjburas, could you take a look at the Codacy report? it complains about many trailing white-spaces

@alongd
Copy link
Member

alongd commented Mar 18, 2018

The -tests results are below:
main_log.txt

Overall seems good: there are many H_Abs differences as expected, all within O(1)
@zjburas, could you take a look to see that everything seems reasonable?

There's also an unexpected thermo difference:


Non-identical thermo!
original:	[C]1=NO1
tested:	[C]1=NO1
Hf(300K)  |S(300K)   |Cp(300K)  |Cp(400K)  |Cp(500K)  |Cp(600K)  |Cp(800K)  |Cp(1000K) |Cp(1500K) 
    129.73|     51.79|      9.14|      9.59|      9.99|     10.34|     10.90|     11.38|     12.40
    154.90|     56.55|      9.79|      9.15|      8.58|      8.33|      9.02|     10.26|     10.96
thermo: Thermo group additivity estimation: group(Cds-CdsCsCs) + group(N3s-CsHH) + ring(Cyclopropene) + radical(Cds_P)
thermo: Thermo group additivity estimation: group(Cds-CdsCsCs) + group(N3s-CsHH) + ring(oxirene) + radical(Cds_P)

@mjohnson541, we previously saw that together, do you remember what caused it?

There's also a traceback at the end of the report that I'm unfamiliar with:

Test mode: benchmark
Traceback (most recent call last):
  File "/home/alongd/Code/RMG-tests/thermo_val/evaluate.py", line 160, in <module>
    main()
  File "/home/alongd/Code/RMG-tests/thermo_val/evaluate.py", line 144, in main
    auth_info = get_RTD_authentication_info()
  File "/home/alongd/Code/RMG-tests/thermo_val/utils.py", line 19, in get_RTD_authentication_info
    config = read_config(cfg_path)
  File "/home/alongd/Code/RMG-tests/thermo_val/utils.py", line 12, in read_config
    with open(cfg_path, 'r') as fid:
IOError: [Errno 2] No such file or directory: '/home/alongd/Code/RMG-tests/thermo_val/config.cfg'

@zjburas
Copy link
Contributor Author

zjburas commented Mar 20, 2018

Whitespaces should be taken care of now. I will take a look at the test results next and decide on new unit tests.

@KEHANG
Copy link
Member

KEHANG commented Mar 20, 2018

@zjburas that test is in RMG-tests, I'm happy to add the library to it. Can you send the library to me? either in RMG format or csv format with (SMILES, Hf298, S298, Cp300-1500).

@KEHANG
Copy link
Member

KEHANG commented Mar 20, 2018

@alongd I know how to resolve the the traceback. Will send you a quick note via Slack.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 21, 2018

I though whitespaces were finally taken care of, but they're back. Maybe someone else should try removing them using a different editor. I've used both vi and PyCharm with mixed success.

Also, I've checked the RMG-Tests results and as expected this PR has almost no effect on the hydrocarbon mechanisms generated (number of core/edge species/reactions are identical between db versions). For the kinetics that are different, those reactions are already falling up to a generic node, so I don't think we can really claim that one set of purely estimated kinetics is better than another one. Also as @alongd pointed out, the kinetics are usually only affected within 1 O.M.

What is concerning is that there are significant differences in eg1, eg6 and MCH. However, it's difficult to judge how important these differences actually are without looking at the species profiles side-by-side. I will run these 3 test cases again and see what the profiles actually look like.

@alongd , did you still want to add a chlorine unit test to test_fromSMILES()?

If @KEHANG can add the Chlorinated_Hydrocarbon library to his benchmark thermo test, then I think this PR is good on tests. Once again, we can leave it to @rwest and his students to make more thoughtful mechanism tests for relevant chlorine systems.

@alongd
Copy link
Member

alongd commented Mar 21, 2018

@zjburas, thanks for all your work done here.

Great that most trailing whitespaces were removed, it doesn't have to be tidy.

I agree that the tests seems reasonable, let us know what profiles you come up with.

Yes, it would be great to have a few simple adjList/SMILES tests added to test_fromSMILES().

Please rebase and sqush - I'll give it a final review and we could merge after understanding the differences in the tests.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 22, 2018

I'm still trying to reproduce the discrepancy in eg1, eg6 and MCH reported in RMG-tests. However, I'm not sure if I'm using 100% the same simulation conditions since the regression_input.py files in the RMG-tests examples folder only lists the reactor conditions. I ask because some of the jobs are taking more than a day to finish. Is there an easier way to do this? Thanks.

@KEHANG
Copy link
Member

KEHANG commented Mar 23, 2018

@zjburas the new test using chlorine dataset has been added to RMG-tests PR. It's currently under testing; if it goes well, that chlorine test will become one of the RMG-tests' default tests. And I'll post the comparison result when it's ready.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 23, 2018

Below is the comparison between RMG-test results for eg6 and MCH using the master and chlorine database branches (and the original warning raised by the tests:

image
Observable species CC varied by more than 0.050 on average between old model ethane(1) and new model ethane(1) in condition 1.

image
Observable species CC1CCCCC1 varied by more than 0.050 on average between old model MCH(1) and new model MCH(1) in condition 1.

For the eg6 test, there are clearly some differences between the master and chlorine db, but never more than ~10%. For MCH, I don't see a difference. So at first glance, it seems odd that RMG-tests would report both species as varying by more than 0.05 on average, but I have a couple of ideas as to what might be going on:

  1. This is probably a dumb question, but when the two profiles are compared are they evaluated at the same times? I ask because even if the final profiles look the same, the solver might have taken different time steps to get there (as was the case for the profiles above).

  2. The plots above only show a small fraction of the full profiles, which extend to 1 and 100 s, respectively. So for the vast majority of the profile the species have essentially zero mole fraction. However, the mole fraction values are never identically zero, instead they just get really small. Eventually, the limit of machine precision is reached, at which point the values fluctuate randomly. These random fluctuations at ~0 would manifest as large relative differences between two essentially identical profiles. I think this is the more likely explanation of what's happening.

A discrepancy was also reported for eg1 (ethane pyrolysis), but I'm still running that RMG job. I don't think it will ever finish though, because the current stopping criteria is 1000 s according to the regression_input.py. There must be another default criteria, or at least a constraint on what can be generated. What is the "tolerance" in that input? In any case, I would guess that the "discrepancy" in eg1 will look similar to the other two cases above.

I also think similar "discrepancies" could be holding up PR #228 , although that PR makes more substantial additions to the database, so the discrepancies could be real.

Overall, I think a couple of improvements could be made to RMG-tests, which would speed up database decision-making. First, redefine what a "discrepancy" is to avoid False Positive results such as the ones above. Second, automatically output profile comparisons such as the ones above for any species with a discrepancy (maybe this already exists?).

@zjburas
Copy link
Contributor Author

zjburas commented Mar 23, 2018

I should also mention that in the eg6 test result above, ethane is consumed slightly faster using the chlorine database. This makes sense because the Cl H-abstraction training reactions added are near the collision limit, and will tend to speed up all H-abstraction reactions that fall up to a generic node.

@KEHANG
Copy link
Member

KEHANG commented Mar 24, 2018

@zjburas I can confirm your second idea. The regression test method is actually in RMG-Py, location: rmgpy/tools/observableRegression.py. There we have a method called curvesSimilar() which, as the docstring says, calculates abs((y1-y2)/y1) so if y1 is close to zero, even a small discrepancy (y1-y2) can be amplified a lot.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 24, 2018

Below are the results for eg1, I assumed that the regression test only runs to 90% conversion:

image
Observable species CC varied by more than 0.050 on average between old model ethane(1) and new model ethane(1) in condition 1.

image
Observable species [CH3] varied by more than 0.050 on average between old model CH3(4) and new model CH3(4) in condition 1.

In this case there is a clear discrepancy, but without an experimental target I don't think we can say which one is "better". As expected, the chlorine database predicts faster ethane consumption. This is mainly because many of the H-abstraction reactions that initiate ethane pyrolysis had their kinetics estimated by averaging from a generic node. This includes the reaction H+ethane=H2+ethyl. So I think the main conclusion is that more specific training reactions are needed for H+ethane, so that this important reaction doesn't have to rely on average estimates.

@alongd , do you feel comfortable merging this PR now?

@KEHANG
Copy link
Member

KEHANG commented Mar 24, 2018

@zjburas some updates on thermo validation on the Chlorinated_Hydrocarbons library. Unfortunately, current master database is not ready to give thermochemistry estimation for chlorine related molecules (although master Py already can support molecule creation ). So a comparison cannot be made between this db branch and db master.

But I do have the results for this db branch through. The average error (mean absolute error tested against Chlorinated_Hydrocarbons library) is 1.39 kcal/mol using this branch. It's a good result.

But there's still a few examples having higher than 5 kcal/mol error. They are

  • Cl (label in the library) with 20.1 kcal/mol error,
  • HCl (label in the library) with 22.0 kcal/mol error,
  • ClO (label in the library) with 8.147 kcal/mol error

@zjburas
Copy link
Contributor Author

zjburas commented Mar 24, 2018

Thanks @KEHANG .

For eg6 and MCH I didn't use a terminationConversion criteria. If most regression tests stop before any observable value reaches "0" then maybe machine precision is not the issue.

Regarding the thermo results, I'm certain that the estimates of db master would be way off because there were no chlorine-containing groups previously. For the 3 small molecules that are estimated poorly, that is also expected because there are no groups for diatomics. Instead I thought it would be better to include them in a library (this is how H2, another important diatomic, is treated). Cl, Cl2 and HCl are already in the primary thermo library, and ClO could be added.

@alongd
Copy link
Member

alongd commented Mar 24, 2018

Thanks @zjburas for the thorough investigation of the tests.
I agree with what was said above, and I'd like to echo some points for future improvements of the tests:

  • The time termination criteria of the current tests should be reevaluated (at least for the regressions?) so we don't evaluate the ratios of near-zero values.
  • It would be great to automatically plot the differences in observables (and even greater if experimental results are available to plot...)
  • We should add more training reactions to H_Abs, e.g. ethane+H which hits a general node.

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these additions, @zjburas!
Please rebase so we can merge.

@zjburas
Copy link
Contributor Author

zjburas commented Mar 24, 2018

Rebase done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants