-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Also add QM energies with 'default' OpenFF compute spec? #39
Comments
It's fine if you want to compute the same conformations at a lower level of theory. But let's be careful about calling the result "SPICE". We don't want to do anything that might create confusion, or lead someone to get the low accuracy results thinking they're getting the high accuracy ones. At some point you might want to consider updating to a better level of theory for OpenFF. B3LYP is pretty dated at this point. There are newer functionals that provide better accuracy at the same cost. |
There is probably a good naming convention one could use for SPICE configurations but at a different level of theory (maybe OpenFF already has adopted a particular one) e.g. SPICE(B3LYP-D3BJ/DZVP) or SPICE@B3LYP-D3BJ/DZVP etc. where SPICE would refer to the current level of theory and the ones with brackets or @ would denote the same configurations computed a different way. |
The risk with that is that someone would see a reference to it somewhere and come away thinking, "SPICE uses a cheap, inaccurate level of theory." It would have a high risk of causing confusion. |
I definitely agree we want to avoid confusion! We can give the other levels of theory a less prominent role in the manuscript (or even name them SPICE-lite, etc), and control what we put in the HDF5 files we make available for download and how we name them, which will be the primary way people interact with the dataset. If they access it through the QCPortal, they will see there are multiple levels of theory attached---it would be impossible for them to conclude there is only one low level of theory present. Practically, if would also be a huge pain, a significant waste of space, and rather awkward to try to correlate data between datasets if other levels of theory were generated as entirely separate groups of datasets in QCArchive. Does this make sense? Or am I missing some other failure mode of concern? |
I'm not familiar with how QCArchive handles this sort of thing. If it allows a single dataset to provide multiple levels of theory for each sample, and for all of them to be enumerated through the API, that seems reasonable. As long as we can make sure the higher accuracy one is what people get by default if they don't explicitly specify a level of theory. In practice I expect very few people to access it directly through the API. |
Yeah, the access is through explicit specification of theory level as in the line here in downloader script. We can completely avoid mentioning other QC specs if we choose to and whoever wants to work with the other spec can download it at their own volition. If models from second spec are much closer in accuracy to
I agree. |
That sounds like a good plan. |
I think our primary user group will be downloading the HDF5 files we control, or via the downloader we provide. But QCArchive has a great QCPortal API that is improving its support for bulk downloads. Currently, it's still a great way for exploring datasets. Check out this example, which shows how to access a reaction dataset and browse which levels of theory and molecules are available. |
@pavankum is running this for us now! It looks like essentially everything is complete (except for some errored calculations). |
I definitely agree we want to avoid confusion!
Practically, we can give the other levels of theory a less prominent role
in the manuscript (or even name them SPICE-lite, etc), and control what we
put in the HDF5 files we make available for download and how we name them,
which will be the primary way people interact with the dataset.
If they access it through the QCPortal, they will see there are multiple
levels of theory attached---it would be impossible for them to conclude
there is only one low level of theory present.
Practically, if would also be a huge pain, a big waste of space, and very
awkward to try to correlate data between datasets if other levels of theory
were generated as entirely separate groups of datasets in QCArchive.
Does this make sense? Or am I missing some other failure mode of concern.
|
Let me emphasize once again: SPICE is computed at ωB97M-D3BJ/def2-TZVPPD. Any computations performed at any other level of theory are not SPICE. They are a different dataset that needs to have a different name and must never be referred to as "SPICE", "SPICE-lite", or anything similar. Anything else will create confusion. If the current organization of the data on QCArchive creates confusion, then the data organization needs to be fixed. |
There is a similar situation with MD17. It has been computed at two
different levels of theory and it is often confusing in papers at what
level a benchmark is done.
…On Tue, Oct 11, 2022 at 5:42 PM Peter Eastman ***@***.***> wrote:
Let me emphasize once again: SPICE is computed at ωB97M-D3BJ/def2-TZVPPD.
Any computations performed at any other level of theory *are not SPICE*.
They are a different dataset that needs to have a different name and must
never be referred to as "SPICE", "SPICE-lite", or anything similar.
Anything else will create confusion. If the current organization of the
data on QCArchive creates confusion, then the data organization needs to be
fixed.
—
Reply to this email directly, view it on GitHub
<#39 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOSHUOUAVJUTFO7FXDLWCWDGRANCNFSM55ELKFWA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@peastman: I just realized that the dataset we generated only used the QM level of theory used for OpenMM SPICE, which would mean the data is not useful to the OpenFF folks because it is not compatible with the
default
OpenFF compute spec (B3LYP-D3BJ/DZVP
). We included both levels of theory for this recent RNA dataset so the dataset would be compatible with both OpenMM SPICE and OpenFF datasets, and it looks like the OpenFF level of theory is much less expensive.Would it be OK to have @pavankum add the OpenFF compute spect to the SPICE QCArchive dataset so we end up with both sets of QM data on QCArchive? We can still primarily distribute the more expensive QM data in our HDF5 distributions, but having both would enable multiple applications:
The text was updated successfully, but these errors were encountered: