-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Gaussian job and handlers #328
Conversation
…rds and disabling symmetry
* Change better_guess func name in tests * Add type annotations to GaussianErrorHandler * `float` instead of `int | float` for mem
@rashatwi could you have a quick look at the CI logs why the test files can no longer be found? |
@janosh The mentioned files in the error messages are in the correct location (under tests/files/gaussian), I'm not sure why this error is occurring. When I run locally, all the tests succeed. The .out.gz files are getting replaced by .out files (decompressed) before the tests run, which is causing the FileNotFoundError. |
@janosh I fixed the issue with the test files on my forked repo. Should I create a new PR? |
that would be great, thanks! sorry for letting this slip. was hoping to get back to it sooner |
WalkthroughThe latest update enhances the Changes
Poem
Recent Review DetailsConfiguration used: CodeRabbit UI Files ignored due to path filters (21)
Files selected for processing (32)
Files skipped from review due to trivial changes (4)
Additional Context UsedRuff (1)
Additional comments not posted (36)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
""" | ||
This package implements various Gaussian Jobs and Error Handlers. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condense the multi-line docstring into a single line to adhere to Python best practices.
- """
- This package implements various Gaussian Jobs and Error Handlers.
- """
+ """This package implements various Gaussian Jobs and Error Handlers."""
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
""" | |
This package implements various Gaussian Jobs and Error Handlers. | |
""" | |
"""This package implements various Gaussian Jobs and Error Handlers.""" |
def __init__( | ||
self, | ||
input_file: str, | ||
output_file: str, | ||
stderr_file: str = "stderr.txt", | ||
cart_coords: bool = True, | ||
scf_max_cycles: int = 100, | ||
opt_max_cycles: int = 100, | ||
job_type: str = "normal", | ||
lower_functional: str | None = None, | ||
lower_basis_set: str | None = None, | ||
prefix: str = "error", | ||
check_convergence: bool = True, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider setting default values for parameters like lower_functional
and lower_basis_set
to enhance robustness.
@staticmethod | ||
def _recursive_remove_space(obj: dict[str, Any]) -> dict[str, Any]: | ||
""" | ||
Recursively remove leading and trailing whitespace from keys and string values | ||
in a dictionary. | ||
|
||
This method processes each key-value pair in the given dictionary. If a value | ||
is a string, it strips leading and trailing whitespace from it. If a value is | ||
a dictionary, it applies the same stripping process recursively to that | ||
dictionary. The keys of the dictionary are also stripped of leading and trailing | ||
whitespace. Non-string values are included in the output without modification. | ||
|
||
Args: | ||
obj (dict): The dictionary whose keys and string values will have whitespace | ||
removed. It can be nested, with dictionaries as values, which will | ||
also be processed. | ||
|
||
Returns: | ||
dict: | ||
A new dictionary with all keys and string values stripped of leading | ||
and trailing whitespace. The structure of the dictionary is preserved. | ||
""" | ||
return { | ||
key.strip(): GaussianErrorHandler._recursive_remove_space(value) | ||
if isinstance(value, dict) | ||
else value.strip() | ||
if isinstance(value, str) | ||
else value | ||
for key, value in obj.items() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize the method by reducing unnecessary recursion for non-dictionary types.
@staticmethod | ||
def _update_route_params(route_params: dict, key: str, value: str | dict) -> dict: | ||
""" | ||
Update Gaussian route parameters with new key-value pairs, handling nested | ||
structures. | ||
|
||
Args: | ||
route_params (dict): The dictionary of route parameters to be updated. | ||
key (str): The key in the route parameters to update or add. | ||
value (str | dict): The new value to set or add to the route parameters. | ||
This can be a string or a dictionary. If it is a dictionary, it is | ||
merged with the existing dictionary at `key`. | ||
|
||
Returns: | ||
dict: | ||
The updated route parameters. | ||
""" | ||
obj = route_params.get(key, {}) | ||
if not obj: | ||
route_params[key] = value | ||
elif isinstance(obj, str): | ||
update = {key: {obj: None, **value}} if isinstance(value, dict) else {key: {obj: None, value: None}} | ||
route_params.update(update) | ||
elif isinstance(obj, dict): | ||
route_params[key].update(value if isinstance(value, dict) else {value: None}) | ||
return route_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure type safety and error handling in the _update_route_params method to prevent runtime errors.
def check(self, directory: str = "./") -> bool: | ||
"""Check for errors in the Gaussian output file.""" | ||
# TODO: this backups the original file instead of the actual one | ||
if "linear_bend" in self.errors: | ||
os.rename( | ||
os.path.join(directory, self.input_file + ".prev"), | ||
os.path.join(directory, self.input_file), | ||
) | ||
|
||
self.gin = GaussianInput.from_file(os.path.join(directory, self.input_file)) | ||
self.gin.route_parameters = GaussianErrorHandler._recursive_lowercase(self.gin.route_parameters) | ||
assert isinstance(self.gin.route_parameters, dict) | ||
self.gin.route_parameters = GaussianErrorHandler._recursive_remove_space(self.gin.route_parameters) | ||
self.gout = GaussianOutput(os.path.join(directory, self.output_file)) | ||
self.errors = set() | ||
error_patts = set() | ||
# TODO: move this to pymatgen? | ||
self.conv_data = {"values": {}, "thresh": {}} | ||
with zopen(os.path.join(directory, self.output_file)) as f: | ||
for line in f: | ||
error_match = GaussianErrorHandler.error_patt.search(line) | ||
mem_match = GaussianErrorHandler.recom_mem_patt.search(line) | ||
if error_match: | ||
patt = error_match.group(0) | ||
error_patts.add(patt) | ||
for pattern, error_key in GaussianErrorHandler.error_defs.items(): | ||
if re.match(pattern, patt): | ||
self.errors.add(error_key) | ||
break | ||
# self.errors.add(GaussianErrorHandler.error_defs[patt]) | ||
if mem_match: | ||
mem = mem_match.group(1) | ||
self.recom_mem = GaussianErrorHandler.convert_mem(float(mem), "mw") | ||
|
||
if self.check_convergence and "opt" in self.gin.route_parameters: | ||
for k, v in GaussianErrorHandler.conv_critera.items(): | ||
m = v.search(line) | ||
if m: | ||
if k not in self.conv_data["values"]: | ||
self.conv_data["values"][k] = [m.group(2)] | ||
self.conv_data["thresh"][k] = float(m.group(3)) | ||
else: | ||
self.conv_data["values"][k].append(m.group(2)) | ||
|
||
# TODO: it only plots after the job finishes, modify? | ||
if self.conv_data["values"] and all(len(v) >= 2 for v in self.conv_data["values"].values()): | ||
for k, v in self.conv_data["values"].items(): | ||
# convert strings to float taking into account the | ||
# possibility of having ******** values | ||
self.conv_data["values"][k] = np.genfromtxt(np.array(v)) | ||
GaussianErrorHandler._monitor_convergence(self.conv_data) | ||
for patt in error_patts: | ||
self.logger.error(patt) | ||
return len(self.errors) > 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address the TODO about moving functionality to the pymatgen module to maintain separation of concerns.
Would you like me to help refactor this part of the code or open a GitHub issue to track this task?
self.logger.info("Missing blank line at the end of the input file.") | ||
actions.append({"blank_lines": "rewrite_input_file"}) | ||
else: | ||
self.logger.info("Not sure how to fix zmatrix error. Check manually?") | ||
return {"errors": [self.errors], "actions": None} | ||
|
||
elif "coords" in self.errors: | ||
if "connectivity" in self.gin.route_parameters.get("geom"): | ||
self.logger.info("Explicit atom bonding is requested but no such input is provided") | ||
if isinstance(self.gin.route_parameters["geom"], dict) and len(self.gin.route_parameters["geom"]) > 1: | ||
self.gin.route_parameters["geom"].pop("connectivity", None) | ||
else: | ||
del self.gin.route_parameters["geom"] | ||
actions.append({"coords": "remove_connectivity"}) | ||
else: | ||
self.logger.info("Missing connectivity info. Not sure how to fix this error. Exiting!") | ||
return {"errors": [self.errors], "actions": None} | ||
|
||
elif "found_coords" in self.errors: | ||
if self.gin.molecule and any( | ||
key in self.gin.route_parameters.get("geom", {}) for key in ["checkpoint", "check", "allcheck"] | ||
): | ||
# if coords are found in the input and the user chooses to read | ||
# the molecule specification from the checkpoint file, | ||
# remove mol | ||
self.gin._mol = None | ||
actions.append({"mol": "remove_from_input"}) | ||
else: | ||
self.logger.info("Not sure why atom specifications should not be found in the input. Examine manually!") | ||
return {"errors": [self.errors], "actions": None} | ||
|
||
elif "coord_inputs" in self.errors: | ||
if ( | ||
any(key in self.gin.route_parameters.get("opt", {}) for key in ["z-matrix", "zmatrix"]) | ||
and self.cart_coords | ||
): | ||
# if molecule is specified in xyz format, but the user chooses | ||
# to perform the optimization using internal coordinates, | ||
# switch to z-matrix format | ||
self.cart_coords = False | ||
actions.append({"coords": "use_zmatrix_format"}) | ||
else: | ||
# error cannot be fixed automatically. Return None for actions | ||
self.logger.info( | ||
"Not sure how to fix problem with z-matrix " | ||
"optimization if coords are already input in" | ||
"z-matrix format. Examine manually!" | ||
) | ||
return {"errors": [self.errors], "actions": None} | ||
|
||
elif "missing_mol" in self.errors: | ||
if ( | ||
not self.gin.molecule | ||
and "read" in self.gin.route_parameters.get("guess") | ||
and not any( | ||
key in self.gin.route_parameters.get("geom", {}) for key in ["checkpoint", "check", "allcheck"] | ||
) | ||
): | ||
# if molecule is not specified and the user requests that the | ||
# initial guess be read from the checkpoint file but forgot to | ||
# take the geom from the checkpoint file, add geom=check | ||
if not glob.glob("*.[Cc][Hh][Kk]"): | ||
raise FileNotFoundError("This remedy reads geometry from checkpoint file. This file is missing!") | ||
GaussianErrorHandler._update_route_params(self.gin.route_parameters, "geom", "check") | ||
self.gin.route_parameters["geom"] = "check" | ||
actions.append({"mol": "get_from_checkpoint"}) | ||
else: | ||
# error cannot be fixed automatically. Return None for actions | ||
self.logger.info("Molecule is not found in the input file. Fix manually!") | ||
# TODO: check if logger.info is enough here or return is needed | ||
return {"errors": list(self.errors), "actions": None} | ||
|
||
elif any(err in self.errors for err in ["empty_file", "bad_file"]): | ||
self.logger.error("Required checkpoint file is bad. Fix manually!") | ||
return {"errors": list(self.errors), "actions": None} | ||
|
||
elif "missing_file" in self.errors: | ||
self.logger.error("Could not find the required file. Fix manually!") | ||
return {"errors": list(self.errors), "actions": None} | ||
|
||
elif "syntax" in self.errors: | ||
# error cannot be fixed automatically. Return None for actions | ||
self.logger.info("A syntax error was detected in the input file. Fix manually!") | ||
return {"errors": list(self.errors), "actions": None} | ||
|
||
elif "insufficient_mem" in self.errors: | ||
mem_key, dynamic_mem = GaussianErrorHandler._find_dynamic_memory_allocated(self.gin.link0_parameters) | ||
if dynamic_mem and self.recom_mem and dynamic_mem < self.recom_mem: | ||
# this assumes that 1.5*minimum required memory is available | ||
mem = math.ceil(self.recom_mem * 1.5) | ||
self.gin.link0_parameters[mem_key] = f"{mem}MB" | ||
actions.append({"memory": "increase_to_gaussian_recommendation"}) | ||
else: | ||
self.logger.info("Check job memory requirements manually and set as needed.") | ||
return {"errors": list(self.errors), "actions": None} | ||
|
||
else: | ||
self.logger.info("Must have gotten an error that is parsed but not handled yet. Fix manually!") | ||
return {"errors": list(self.errors), "actions": None} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve TODOs in the correct method to ensure all edge cases and potential issues are handled.
Would you like assistance in addressing these TODOs or should I open a GitHub issue for further discussion?
def correct(self, directory: str = "./") -> dict: | ||
"""Perform the corrections.""" | ||
# TODO: when using restart, the rwf file might be in a different dir | ||
backup_files = [self.input_file, self.output_file, self.stderr_file, *BACKUP_FILES.values()] | ||
backup(backup_files, prefix=self.prefix, directory=directory) | ||
if glob.glob(os.path.join(directory, BACKUP_FILES["rwf"])): | ||
rwf = glob.glob(os.path.join(directory, BACKUP_FILES["rwf"]))[0] | ||
gin = GaussianInput.from_file(os.path.join(directory, self.input_file)) | ||
# TODO: check if rwf is already there like RWF or Rwf or ... | ||
# gin.link0_parameters.update({'%rwf': rwf}) | ||
# gin.route_parameters = {'Restart': None} | ||
# os.rename(self.input_file, self.input_file + '.prev') | ||
input_str = [f"%rwf={rwf}"] + [f"{i}={j}" for i, j in gin.link0_parameters.items()] | ||
input_str.append(f"{gin.dieze_tag} Restart\n\n") | ||
with open(os.path.join(directory, self.input_file + ".wt"), "w") as f: | ||
f.write("\n".join(input_str)) | ||
return {"errors": ["wall_time_limit"], "actions": None} | ||
self.logger.info( | ||
"Wall time handler requires a read-write gaussian file to be available. No such file is found." | ||
) | ||
return {"errors": ["Walltime reached but not rwf file found"], "actions": None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address the TODO about handling file locations during restarts to ensure robustness in file management.
Would you like me to help refactor this part of the code or open a GitHub issue to track this task?
@@ -0,0 +1,34 @@ | |||
%chk=Optimization.chk | |||
%Mem=1MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory setting of 1MB
is likely insufficient for any Gaussian computation. Consider increasing this to a more realistic value.
%chk=Optimization.chk | ||
%mem=100GB | ||
%NProcShared=28 | ||
#P B3LYP/6-311++G** Opt=(MaxCycless=100) SCF SCRF=(PCM, Solvent=TetraHydroFuran) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the typo in the Gaussian directive from MaxCycless
to MaxCycles
.
- #P B3LYP/6-311++G** Opt=(MaxCycless=100) SCF SCRF=(PCM, Solvent=TetraHydroFuran)
+ #P B3LYP/6-311++G** Opt=(MaxCycles=100) SCF SCRF=(PCM, Solvent=TetraHydroFuran)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
#P B3LYP/6-311++G** Opt=(MaxCycless=100) SCF SCRF=(PCM, Solvent=TetraHydroFuran) | |
#P B3LYP/6-311++G** Opt=(MaxCycles=100) SCF SCRF=(PCM, Solvent=TetraHydroFuran) |
merging working branch
gaussian
intomaster
. follow-up to #325. all credit to @rashatwiSummary by CodeRabbit
New Features
Bug Fixes
Tests
Chores