Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terachem #1788

Merged
merged 13 commits into from
Nov 22, 2019
Merged

Terachem #1788

merged 13 commits into from
Nov 22, 2019

Conversation

alongd
Copy link
Member

@alongd alongd commented Oct 27, 2019

Motivation or Problem

We'd like to expand the electronic structure software Arkane supports

Description of Changes

Added full support for TeraChem.

I couldn't figure out how to parse the moments of inertia out of a TeraChem log file (I think they might not be reported, although they are certainly calculated and used internally), so some general functionalities for calculating them from xyz were added to common.py. Note that neither RDKit nor OB were used to have the capability to calculate moments of inertia for TSs as well. These functions are used in statmech if no rotational mode exists in conformer.modes. Could be useful in the future for additional ESS. I verified that the results are in agreement with those Gaussian would have calculated for a few species, and added tests.

Sine we have additional future plans to expand ESS support in Arkane (Orca, Psi4), we'll end up with many ESS log parsing and respective tests files under the arkane folder. For this to not get out of hand, a logs subdir was created for all ESS log files and their tests, including the Log parent class.

Testing

Plenty of tests were added.

Reviewer Tips

See that the code makes sense, verify that the tests are comprehensive and run them.

I'm tagging @dranasinghe as the reviewer and @mjohnson541 as the merger.

Copy link
Contributor

@dranasinghe dranasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some comments to see whether you agree. They are mostly for the terachem Parser. for the moments of inertia, I could n't recreate the Gaussian output for CH4 manually. I will keep looking into that.

if len(coords) == 0 or len(numbers) == 0 or len(masses) == 0 \
or ((len(coords) != num_of_atoms or len(numbers) != num_of_atoms or len(masses) != num_of_atoms)
and num_of_atoms is not None):
raise LogError(f'Unable to read atoms from TeraChem geometry output file {self.path}.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened if someone gives geometry optimization output file as output? I think it better to suggest user give frequency calculation, output.geomatry or .xyz or if parse could n't find a Since there is .xyz, ouput.geomatry and frequency

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent suggestion, added

# Matrix element rows
for j in range(n_rows): # for j in range(i*6, Nrows):
line = f.readline()
while len(line.split()) != 7:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if you have an odd number of atoms eg. 11. you will have a 33*33 matrix and the last block will only have 3 columns, which means while loop will exit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add as a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! test added

arkane/logs/terachem.py Show resolved Hide resolved
arkane/logs/terachem.py Show resolved Hide resolved
arkane/logs/terachem.py Show resolved Hide resolved
arkane/logs/terachem.py Show resolved Hide resolved
if 'Scan Cycle' in lines[line_index]:
# example: '-=#=- Scan Cycle 7/36 -=#=-'
v_index += 1
if 'FINAL ENERGY:' in lines[line_index]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to grab "Optimized Energy" value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see Optimized Energy in a scan log file. Let's talk about it

arkane/common.py Show resolved Hide resolved
arkane/common.py Show resolved Hide resolved
Copy link
Member Author

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!! Comments were addressed, and fixup commits were added

arkane/common.py Show resolved Hide resolved
if len(coords) == 0 or len(numbers) == 0 or len(masses) == 0 \
or ((len(coords) != num_of_atoms or len(numbers) != num_of_atoms or len(masses) != num_of_atoms)
and num_of_atoms is not None):
raise LogError(f'Unable to read atoms from TeraChem geometry output file {self.path}.')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent suggestion, added

arkane/logs/terachem.py Show resolved Hide resolved
arkane/logs/terachem.py Show resolved Hide resolved
arkane/logs/terachem.py Show resolved Hide resolved
arkane/logs/terachem.py Show resolved Hide resolved
if 'Scan Cycle' in lines[line_index]:
# example: '-=#=- Scan Cycle 7/36 -=#=-'
v_index += 1
if 'FINAL ENERGY:' in lines[line_index]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see Optimized Energy in a scan log file. Let's talk about it

# Matrix element rows
for j in range(n_rows): # for j in range(i*6, Nrows):
line = f.readline()
while len(line.split()) != 7:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! test added

@alongd alongd force-pushed the terachem branch 2 times, most recently from 958b630 to 059fb6f Compare November 1, 2019 20:31
@alongd
Copy link
Member Author

alongd commented Nov 2, 2019

Remaining things to do:

  • parse E from either freq (first) or opt (last), appropriately
  • in scans, use opt E
  • check number of scan points according to the input
  • check termination
  • test failed job

@alongd alongd force-pushed the terachem branch 6 times, most recently from ff781c0 to 3c2f5dd Compare November 3, 2019 18:31
@alongd
Copy link
Member Author

alongd commented Nov 4, 2019

@dranasinghe, I addressed all the comments we discussed (online and offline), please take a look when you get a chance.

@alongd
Copy link
Member Author

alongd commented Nov 4, 2019

@goldmanm, I implemented in TeraChem the rotor scan checks we discussed, please take a look and let me know what you think. Feel free to replicate for Gaussian and/or make suggestions.

@ReactionMechanismGenerator ReactionMechanismGenerator deleted a comment from codecov bot Nov 4, 2019
Copy link
Contributor

@dranasinghe dranasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for addressing the issues. The code looks good. :)

@goldmanm
Copy link
Contributor

goldmanm commented Nov 5, 2019

The part about rotor scans looks good. I had one comment about it, listed below.

I've implemented the gaussian changes, and I'm just waiting on the more substantial bugfix pr #1810 to go through to reduce merge conflicts.

logging.info(' Assuming {0} is the output from a TeraChem PES scan...'.format(os.path.basename(self.path)))

v_list.append(v_list[0]) # TeraChem does not repeat the first scan point (unlike Gaussian), append it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we need to repeat the data at the end? I think adding data should be avoided if possible. The only reason I can think of is how we define 'angles', will have a point at the end.

A workaround might be to not add the extra datapoint and to replace the angle definition on line 310 so that it does not have the 2$\pi$ angle.

angles = np.linspace(0, 2 * math.pi * (len(v_list) -1) / len(v_list), len(v_list), np.float64)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, if there are 4 data points (90 degree spacing), the angles output will be 0, 0.5$\pi$, $\pi$, and 1.5$\pi$.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a way to repeat the last point in TeraChem instead, I'll change these lines. We do it as a quality assurance method of checking that a 360 scan gives almost the same energy as the initial point (checked in ARC)

@codecov
Copy link

codecov bot commented Nov 5, 2019

Codecov Report

Merging #1788 into master will decrease coverage by 1.16%.
The diff coverage is 14.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1788      +/-   ##
==========================================
- Coverage   43.02%   41.85%   -1.17%     
==========================================
  Files          80       82       +2     
  Lines       21099    21574     +475     
  Branches     5516     5653     +137     
==========================================
- Hits         9077     9030      -47     
- Misses      11004    11539     +535     
+ Partials     1018     1005      -13
Impacted Files Coverage Δ
arkane/encorr/corr.py 53.65% <ø> (ø) ⬆️
arkane/logs/log.py 92.3% <ø> (ø)
arkane/logs/molpro.py 9.16% <100%> (ø)
arkane/logs/gaussian.py 81.81% <100%> (ø)
arkane/logs/orca.py 20.19% <100%> (ø)
arkane/logs/terachem.py 12.5% <12.5%> (ø)
arkane/logs/qchem.py 22% <50%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0a78cf...1132931. Read the comment docs.

@alongd
Copy link
Member Author

alongd commented Nov 11, 2019

@mjohnson541, thanks for the comments, I reverted the style changes (rebased and squashed). That commit was not removed, though, it instead applies a consistent style to all of Arkan'es files (now keeping the #### lines). The only exception is common.py where the #### lines are deleted, but added in a subsequent commit. Sorry for that, it just created an annoying conflict otherwise.
I also added a minor commit, correcting some docstring in RMG Molecule, since it's very minor I thought it's OK to just append it here instead of creating a PR for it. No other changes other than that.
I think this PR should be good to go, let me know if you have any additional comments.

@alongd alongd force-pushed the terachem branch 5 times, most recently from 2d3f042 to 0995767 Compare November 19, 2019 18:52
@alongd
Copy link
Member Author

alongd commented Nov 20, 2019

@mjohnson541, I think all comments were addressed. This branch is now rebased. Please feel free to merge unless there are additional concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants