Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load and dump TS YAML files in Arkane #1551

Merged
merged 15 commits into from
Mar 15, 2019
Merged

Load and dump TS YAML files in Arkane #1551

merged 15 commits into from
Mar 15, 2019

Conversation

alongd
Copy link
Member

@alongd alongd commented Feb 8, 2019

Motivation

For a more convenient workflow of archiving QM data and easily parsing it again (and soon automating this process and forming a database) it is important to also save and load YAML files of TSs in addition to stable species.

Description of Changes

ArkaneSpecies can now process TSs as well.
So far, a YAML file was generated for stable species iff their structure was given and Thermo() was called (both important data to save)
Now, a TS YAML file is saved iff structures of all relevant reactant/s and product/s were given and Kinetics() was called. We're not saving the kinetics, though.
The RMGObject was changes slightly to save and load lists.
This PR has one small commit not directly related to the topic, changing how Arkane loads energy from QChem files. I could open a different PR if the reviewers prefer.

Testing

An example for a reaction where all species and TS are loaded from YAML files was added (Arkane examples are also used as functional tests).

Reviewer Tips

Run any reaction in Arkane, look at the YAML files generated, then run the reaction again, pointing the TS to the generated YAML file.

@alongd alongd requested a review from mliu49 February 8, 2019 17:29
@alongd alongd self-assigned this Feb 8, 2019
@alongd alongd added the Arkane label Feb 8, 2019
@codecov
Copy link

codecov bot commented Feb 8, 2019

Codecov Report

Merging #1551 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1551      +/-   ##
==========================================
- Coverage   41.86%   41.84%   -0.02%     
==========================================
  Files         165      165              
  Lines       28008    28008              
  Branches     5713     5713              
==========================================
- Hits        11726    11721       -5     
- Misses      15494    15498       +4     
- Partials      788      789       +1
Impacted Files Coverage Δ
rmgpy/data/kinetics/family.py 58.27% <0%> (-0.3%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be58562...07876cf. Read the comment docs.

@alongd alongd force-pushed the arkane_ts_yaml branch 2 times, most recently from 4c76ce5 to a815028 Compare February 28, 2019 15:18
@alongd
Copy link
Member Author

alongd commented Feb 28, 2019

@mliu49, I think this is now ready for review

Copy link
Contributor

@mliu49 mliu49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial round of comments. I will test it out soon.

@@ -188,11 +188,8 @@ def save_yaml(self, path):
filename = os.path.join('ArkaneSpecies',
''.join(c for c in self.label if c in valid_chars) + '.yml')
full_path = os.path.join(path, filename)
content = yaml.dump(data=self.as_dict(), Dumper=Dumper)
# remove empty lines from the file (multi-line strings have excess new line brakes for some reason):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I was worried about this. I probably should have followed up on that.

Copy link
Member Author

@alongd alongd Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks ugly, but works....
Another option for human input is to use a pipe symbol in yaml, e.g. (in ARC's format):


species:
  - label: vinoxy
    smiles: C=C[O]
    multiplicity: 2
    charge: 0

  - label: OH
    xyz: |
      O       0.00000000    0.00000000   -0.12002167
      H       0.00000000    0.00000000    0.85098324

  - label: methylamine
    adjlist: |
      1 C u0 p0 c0 {2,S} {3,S} {4,S} {5,S}
      2 N u0 p1 c0 {1,S} {6,S} {7,S}
      3 H u0 p0 c0 {1,S}
      4 H u0 p0 c0 {1,S}
      5 H u0 p0 c0 {1,S}
      6 H u0 p0 c0 {2,S}
      7 H u0 p0 c0 {2,S}

  - label: propene
    smiles: C=CC

  - label: N2H4
    smiles: NN

if 'Final energy is' in line:
E0 = float(line.split()[3]) * constants.E_h * constants.Na
logging.debug('energy is {}'.format(str(E0)))
if E0 is None:
for line in f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, the reason it fails is because you looping through a file moves a pointer from the beginning to the end. Once you're at the end, looping through again doesn't do anything. Thus, you need to reset the pointer, which can be done using f.seek(0).

However, I think you could probably accomplish this only reading through the file once:

with open (self.path, 'r') as f:
    a = b = 0
    for line in f:
        if 'Final energy is' in line:
            a = float(line.split()[3]) * constants.E_h * constants.Na
        if 'Total energy in the final basis set' in line:
            b = float(line.split()[8]) * constants.E_h * constants.Na
    E0 = a or b
    logging.debug('energy is {}'.format(str(E0)))

Of course the fix you have solves the issue, so it's up to you whether or not to try these other options.

Copy link
Member Author

@alongd alongd Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your implementation. I'll modify it a bit into
e0 = a if a else b
since I think that if the text for b is in the file, then the text for a is there as well. I'll check and will anyway add a test.
Edit: I just realized that e0 = a or b does exactly that. I'll keep it

_, file_extension = os.path.splitext(path)
if file_extension in ['.yml', '.yaml']:
if TS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this line deleted intentionally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR replaces it with is_ts, but probably this commit only captured the deletion. I'll sort the commits out

logging.debug("made object {0}".format(class_name))
data[key].append(obj)
elif isinstance(val, list) and val:
if isinstance(val[0], dict) and 'class' in val[0]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section could be combined with the first if clause by adding an elif at the second level. Let me know if that made sense to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so clear. Combine if isinstance(val[0], dict) and 'class' in val[0] with if isinstance(val, dict) and 'class' in val?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you can ignore this. I missed the lines that were hidden by GitHub and thought that these changes were part of the same function.

chemkin_thermo_string: "N2H3 H 3N 2 G 10.000 3000.000
\ 478.04 1\n 2.42935350E+00 1.15088524E-02-6.01470204E-06 1.59974609E-09-1.70089224E-13
\ 2\n 2.62991058E+04 1.15861374E+01 4.04880122E+00-4.38083729E-03 5.11833994E-05
\ 3\n-8.84029667E-08 5.22512137E-11 2.61709988E+04 5.24801815E+00 4\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any idea why this string is printed differently from the adjlist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because the lines are too long, so YAML breaks them, then makes it a multiline string with "...". Perhaps \ns aren't executed then?

@alongd
Copy link
Member Author

alongd commented Mar 1, 2019

@mliu49 , I addressed the initial comments, squashed and force-pushed (I re-organized the commits)

@mliu49
Copy link
Contributor

mliu49 commented Mar 1, 2019

I tested TS dumping and loading, and it seems to work fine.

I added a new commit with a couple changes to yaml dumping. Most notably, I added a custom string representer to make it dump multiline strings as block literals. Please try it out and let me know what you think.

I also removed the Dumper=Dumper argument since it seemed unnecessary. Do you remember why you had added that?

alongd added 11 commits March 14, 2019 14:45
Doing so interfers with reading the line brakes, so whe reading we get:
1 O u0 p2 c0 {2,S} {9,S} 2 C u0 p0 c0 {1,S} {3,S} {4,S} {5,S} 3 C u0 p0
c0 {2,S} {6,S} {7,S} {8,S} 4 H u0 p0 c0 {2,S} 5 H u0 p0 c0 {2,S} 6 H u0
p0 c0 {3,S} 7 H u0 p0 c0 {3,S} 8 H u0 p0 c0 {3,S} 9 H u0 p0 c0 {1,S}
instead of:
1 O u0 p2 c0 {2,S} {9,S}
2 C u0 p0 c0 {1,S} {3,S} {4,S} {5,S}
3 C u0 p0 c0 {2,S} {6,S} {7,S} {8,S}
4 H u0 p0 c0 {2,S}
5 H u0 p0 c0 {2,S}
6 H u0 p0 c0 {3,S}
7 H u0 p0 c0 {3,S}
8 H u0 p0 c0 {3,S}
9 H u0 p0 c0 {1,S}
For some reason ARC fails without this modification when sp calculations
are done in QChem. It is indeed less efficient, since the entire file is
loaded to memory.
Using the 'Total energy in the final basis set' phrase
Some properties are important to parse, but are not objects and cannot
be updated automatically.
`imaginary_frequency` contains a ScalarQuantity instance, but it is not
an object in itself in RMG/Arkane, just an attribute of TransitionState
(.frequency). It's important to parse it for tunneling.
Structure information is important to parse since YAML file generation
depends on it.
Iff structures of all reactant/s and product/s are given
Also, renames `TS` as `is_ts`
@alongd
Copy link
Member Author

alongd commented Mar 14, 2019

Thanks for this awesome addition, @mliu49! Now it also looks nice :)
No, I don't remember why I had to add Dumper, I think I just followed what I saw in tutorials, apparently as you show it is not necessary. The TS YAML generation worked fine for me as well. I just rebased, should be good to go.

@mliu49
Copy link
Contributor

mliu49 commented Mar 14, 2019

Would you like to update the example yaml files following the change to string representation?

Otherwise I think this is ready to merge.

alongd and others added 4 commits March 14, 2019 19:46
Showing how to use species (and TS) YAML files to run Arkane
Provide file stream directly to yaml.dump
Remove optional Dumper argument and associated imports
Add str representer to dump multiline strings as block literals
@alongd
Copy link
Member Author

alongd commented Mar 14, 2019

Done!

@mliu49 mliu49 merged commit 59aad7a into master Mar 15, 2019
@mliu49 mliu49 deleted the arkane_ts_yaml branch March 15, 2019 16:41
@mliu49 mliu49 mentioned this pull request May 15, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants