Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloader.py sets method and basis non-deterministically #44

Closed
leifjacobson opened this issue Sep 28, 2022 · 10 comments · Fixed by #47
Closed

downloader.py sets method and basis non-deterministically #44

leifjacobson opened this issue Sep 28, 2022 · 10 comments · Fixed by #47

Comments

@leifjacobson
Copy link

I'm was trying to reproduce the energies in a few examples of your SPICE dataset as downloaded with downloader.py. I was unable to successfully reproduce the energies and after some debugging discovered that spec['method'] and spec['basis'] was set to b3lyp and tzvp respectively. I further found that repeated invocations of downloader.py can result in setting these values in a non-deterministic way. Since these values are retrieved inside a loop over subsets I suspect this could also result in a mixed dataset.

I'm downloading again after hardcoding the appropriate values and will try to reproduce the energies again. Regardless of outcome I suggest this is investigated and fixed.

@peastman
Copy link
Member

@jchodera @pavankum is this caused by #39? If so it needs to be fixed urgently. QCPortal should always give the high quality results by default unless someone specifically requests the low quality ones.

@pavankum
Copy link
Collaborator

I couldn't reproduce the error, I see the zeroth element is always spice-default for me. We can change this line though, https://github.com/openmm/spice-dataset/blob/main/downloader/downloader.py#L81, to be more specific and avoid any such errors.

I ran

datasets = ['SPICE Solvated Amino Acids Single Points Dataset v1.1', 'SPICE Dipeptides Single Points Dataset v1.2', 'SPICE DES Monomers Single Points Dataset v1.1', 'SPICE DES370K Single Points Dataset v1.0', 'SPICE DES370K Single Points Dataset Supplement v1.0', 'SPICE PubChem Set 1 Single Points Dataset v1.2', 'SPICE PubChem Set 2 Single Points Dataset v1.2', 'SPICE PubChem Set 3 Single Points Dataset v1.2', 'SPICE PubChem Set 4 Single Points Dataset v1.2', 'SPICE PubChem Set 5 Single Points Dataset v1.2', 'SPICE PubChem Set 6 Single Points Dataset v1.2', 'SPICE Ion Pairs Single Points Dataset v1.1']
for dataset in datasets:
    ds = client.get_collection("Dataset", dataset)
    spec = ds.list_records().iloc[0].to_dict()
    print(dataset)
    print(spec)

and the output shows:

SPICE Solvated Amino Acids Single Points Dataset v1.1
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE Dipeptides Single Points Dataset v1.2
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE DES Monomers Single Points Dataset v1.1
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE DES370K Single Points Dataset v1.0
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE DES370K Single Points Dataset Supplement v1.0
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default_no_mbis', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default_no_mbis'}
SPICE PubChem Set 1 Single Points Dataset v1.2
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE PubChem Set 2 Single Points Dataset v1.2
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE PubChem Set 3 Single Points Dataset v1.2
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE PubChem Set 4 Single Points Dataset v1.2
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE PubChem Set 5 Single Points Dataset v1.2
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE PubChem Set 6 Single Points Dataset v1.2
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
SPICE Ion Pairs Single Points Dataset v1.1
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}

@leifjacobson Can you please drop the version of qcportal you are using.

cc: @bennybp

@leifjacobson
Copy link
Author

>>> import qcportal
>>> qcportal.__version__
'v0.15.8'

@peastman
Copy link
Member

We can change this line though

The downloader script is versioned as part of the repository. Even if we change it for future versions, someone who wants v1.1 of the dataset is going to download or check out that version of the repository and run the corresponding version of the script. So we need to make sure it works correctly with the existing script.

@leifjacobson
Copy link
Author

leifjacobson commented Sep 28, 2022

Seems consistent within invocations but can change between them?

(schrodinger_venv) -bash-4.2$ cat test.py
from qcportal import FractalClient
dataset = 'SPICE Solvated Amino Acids Single Points Dataset v1.1'
for i in range(10):
    client = FractalClient()
    ds = client.get_collection("Dataset", dataset)
    spec = ds.list_records().iloc[0].to_dict()
    print(spec)
(schrodinger_venv) -bash-4.2$ python test.py 
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
(schrodinger_venv) -bash-4.2$ python test.py 
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
(schrodinger_venv) -bash-4.2$ python test.py 
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
(schrodinger_venv) -bash-4.2$ python test.py 
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'wb97m-d3bj', 'basis': 'def2-tzvppd', 'keywords': 'spice_default', 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}
(schrodinger_venv) -bash-4.2$ python test.py 
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}
{'driver': 'gradient', 'program': 'psi4', 'method': 'b3lyp', 'basis': 'dzvp', 'keywords': 'openff-default', 'name': 'B3LYP/dzvp-openff-default'}

@pavankum
Copy link
Collaborator

Thank you for the report @leifjacobson. This is in @bennybp 's territory, I will let him know on slack.

Even if we change it for future versions, someone who wants v1.1 of the dataset is going to download or check out that version of the repository and run the corresponding version of the script.

We should make that version inaccessible and make a minor release then (v1.1.1), unfortunately I don't see any other way. @dotsdl or @jthorton can you take a look and confirm when you can.

@jchodera
Copy link
Member

@peastman : If the download script does not hard-code the spec, @pavankum's suggestion of hard-coding this to always get the desired version is the right way to go.

The downloader script is versioned as part of the repository. Even if we change it for future versions, someone who wants v1.1 of the dataset is going to download or check out that version of the repository and run the corresponding version of the script. So we need to make sure it works correctly with the existing script.

You can erase the tag and re-release it, but I'd suggest a simple 1.1.1 bugfix with a note in the release notes that this eliminates the non-determinism error would be sufficient.

@peastman
Copy link
Member

Can you make a PR with the fix? I'm not sure exactly what it is.

@peastman
Copy link
Member

@pavankum or @jchodera could one of you create a PR with the fix ASAP? This is extremely urgent, since at this moment there could easily be people running the downloader and getting corrupted versions of the dataset.

@peastman
Copy link
Member

The fix is merged and the 1.1.1 release is up. Thanks @pavankum for the very fast response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants