-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASE databases incompatible with current fine-tuning tutorial #629
Comments
I think you are correct that this was an oversight when converting to the new trainer/configs. The new location for dataset format makes more sense but is not backwards compatible. You should be able to get around this error by adding |
This PR is intended to address #629
That did the trick regarding that part - thank you!
Are there additional tags I need to supply to the config for it to parse the databases? |
Thanks for flagging this. The new trainer has renamed the targets from
Referencing these lines from the new example config |
Unfortunately it still throws that error. Just for reference, here is the currently used config.yml:
|
Nothing holding it up. This should be ready to merge, unless @mshuaibii or @emsunshine have any further suggestions |
Hello, First, I think that line 1018 of While the new branch - #622 - which was recommended for using the ASE db's does enable the first inferencing step, it quickly resolves into the second error:
|
I was able to get the fine-tuning tutorial working with the changes from these two PRs: Open-Catalyst-Project/tutorial#4 and #630. You can try these branches to see if they solve the problem. |
This issue has been marked as stale because it has been open for 30 days with no activity. |
* minor cleanup of lmbddatabase * ase dataset compat for unified trainer and cleanup * typo in docstring * key_mapping docstring * add stress to atoms_to_graphs.py and test * allow adding target properties in atoms.info * test using generic tensor property in ase_datasets * minor docstring/comments * handle stress in voigt notation in metadata guesser * handle scalar generic values in a2g * clean up ase dataset unit tests * allow .aselmdb extensions * fix minor bugs in lmdb database and update tests * make connect_db staticmethod * remove redundant methods and make some private * allow a list of paths in AseDBdataset * remove sprinkled print statement * remove deprecated transform kwarg * fix doctring typo * rename keys function * fix missing comma in tests * set default r_edges in a2g in AseDatasets to false * simple unit-test for good measure * call _get_row directly * [wip] allow string sids * raise a helpful error if AseAtomsAdaptor not available * remove db extension in filepaths * set logger to info level when trying to read non db files, remove print * set logging.debug to avoid saturating logs * Update documentation for dataset config changes This PR is intended to address #629 * Update atoms_to_graphs.py * Update test_ase_datasets.py * Update test_ase_datasets.py * Update test_atoms_to_graphs.py * Update test_atoms_to_graphs.py * case for explicit a2g_args None values * Update update_config() * Update utils.py * Update utils.py * Update ocp_trainer.py More helpful warning for debug mode * Update ocp_trainer.py * Update ocp_trainer.py * Update TRAIN.md * fix concatenating predictions * check if keys exist in atoms.info * Update test_ase_datasets.py * use list() to cast all batch.sid/fid * correctly stack predictions * raise error on empty datasets * raise ValueError instead of exception * code cleanup * rename get_atoms object -> get_atoms for brevity * revert to raise keyerror when data_keys are missing * cast tensors to list using tolist and vstack relaxation pos * remove r_energy, r_forces, r_stress and r_data_keys from test_dataset w use_train_settings * fix test_dataset key * fix test_dataset key! * revert to not setting a2g_args dataset keys * fix debug predict logic * support numpy 1.26 * fix numpy version * revert write_pos * no list casting on batch lists * pretty logging --------- Co-authored-by: Ethan Sunshine <93541000+emsunshine@users.noreply.github.com> Co-authored-by: Muhammed Shuaibi <mushuaibi@gmail.com>
This should be fixed in #622. closing. |
* minor cleanup of lmbddatabase * ase dataset compat for unified trainer and cleanup * typo in docstring * key_mapping docstring * add stress to atoms_to_graphs.py and test * allow adding target properties in atoms.info * test using generic tensor property in ase_datasets * minor docstring/comments * handle stress in voigt notation in metadata guesser * handle scalar generic values in a2g * clean up ase dataset unit tests * allow .aselmdb extensions * fix minor bugs in lmdb database and update tests * make connect_db staticmethod * remove redundant methods and make some private * allow a list of paths in AseDBdataset * remove sprinkled print statement * remove deprecated transform kwarg * fix doctring typo * rename keys function * fix missing comma in tests * set default r_edges in a2g in AseDatasets to false * simple unit-test for good measure * call _get_row directly * [wip] allow string sids * raise a helpful error if AseAtomsAdaptor not available * remove db extension in filepaths * set logger to info level when trying to read non db files, remove print * set logging.debug to avoid saturating logs * Update documentation for dataset config changes This PR is intended to address #629 * Update atoms_to_graphs.py * Update test_ase_datasets.py * Update test_ase_datasets.py * Update test_atoms_to_graphs.py * Update test_atoms_to_graphs.py * case for explicit a2g_args None values * Update update_config() * Update utils.py * Update utils.py * Update ocp_trainer.py More helpful warning for debug mode * Update ocp_trainer.py * Update ocp_trainer.py * Update TRAIN.md * fix concatenating predictions * check if keys exist in atoms.info * Update test_ase_datasets.py * use list() to cast all batch.sid/fid * correctly stack predictions * raise error on empty datasets * raise ValueError instead of exception * code cleanup * rename get_atoms object -> get_atoms for brevity * revert to raise keyerror when data_keys are missing * cast tensors to list using tolist and vstack relaxation pos * remove r_energy, r_forces, r_stress and r_data_keys from test_dataset w use_train_settings * fix test_dataset key * fix test_dataset key! * revert to not setting a2g_args dataset keys * fix debug predict logic * support numpy 1.26 * fix numpy version * revert write_pos * no list casting on batch lists * pretty logging --------- Co-authored-by: Ethan Sunshine <93541000+emsunshine@users.noreply.github.com> Co-authored-by: Muhammed Shuaibi <mushuaibi@gmail.com>
* minor cleanup of lmbddatabase * ase dataset compat for unified trainer and cleanup * typo in docstring * key_mapping docstring * add stress to atoms_to_graphs.py and test * allow adding target properties in atoms.info * test using generic tensor property in ase_datasets * minor docstring/comments * handle stress in voigt notation in metadata guesser * handle scalar generic values in a2g * clean up ase dataset unit tests * allow .aselmdb extensions * fix minor bugs in lmdb database and update tests * make connect_db staticmethod * remove redundant methods and make some private * allow a list of paths in AseDBdataset * remove sprinkled print statement * remove deprecated transform kwarg * fix doctring typo * rename keys function * fix missing comma in tests * set default r_edges in a2g in AseDatasets to false * simple unit-test for good measure * call _get_row directly * [wip] allow string sids * raise a helpful error if AseAtomsAdaptor not available * remove db extension in filepaths * set logger to info level when trying to read non db files, remove print * set logging.debug to avoid saturating logs * Update documentation for dataset config changes This PR is intended to address FAIR-Chem#629 * Update atoms_to_graphs.py * Update test_ase_datasets.py * Update test_ase_datasets.py * Update test_atoms_to_graphs.py * Update test_atoms_to_graphs.py * case for explicit a2g_args None values * Update update_config() * Update utils.py * Update utils.py * Update ocp_trainer.py More helpful warning for debug mode * Update ocp_trainer.py * Update ocp_trainer.py * Update TRAIN.md * fix concatenating predictions * check if keys exist in atoms.info * Update test_ase_datasets.py * use list() to cast all batch.sid/fid * correctly stack predictions * raise error on empty datasets * raise ValueError instead of exception * code cleanup * rename get_atoms object -> get_atoms for brevity * revert to raise keyerror when data_keys are missing * cast tensors to list using tolist and vstack relaxation pos * remove r_energy, r_forces, r_stress and r_data_keys from test_dataset w use_train_settings * fix test_dataset key * fix test_dataset key! * revert to not setting a2g_args dataset keys * fix debug predict logic * support numpy 1.26 * fix numpy version * revert write_pos * no list casting on batch lists * pretty logging --------- Co-authored-by: Ethan Sunshine <93541000+emsunshine@users.noreply.github.com> Co-authored-by: Muhammed Shuaibi <mushuaibi@gmail.com> Former-commit-id: 24092fae39e1e45bec1795884b08218d47ccdb94
I’ve been using the OCP framework recently to try to perform ethylene adsorption calculations and was able to get GemNet OC20+22 to predict the adsorption energies within a reasonable margin of error, but I now would like to fine-tune the model for these systems. I followed through the oxides fine-tuning tutorial on the OCP tutorials repo to ensure that everything is in working order before continuing, but got the following error in the training output once training began using
main.py
:I followed the traceback and think I’ve narrowed it down to an issue with the ocpmodels\trainers\base_trainer.py file which seems to not recognize ASE databases and instead defaults to LMDB databases
[ Line 279-280: self.train_dataset = registry.get_dataset_class(self.config["dataset"].get("format", "lmdb")) ]
. Could this be an artifact of the “new” config.yml format that I occasionally see log messages about converting to?Are you aware if there currently is working version of the fine-tuning example code that works with ASE databases, or alternatively is there example code for performing the fine-tuning with LMDB databases? All the code surrounding LMDB I could find revolved around starting from scratch with OC’s datasets rather than continuing with a pre-trained checkpoint.
For reference, the following packages are currently being used:
The text was updated successfully, but these errors were encountered: