-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter assignment regression test for FreeSolv fails due bug in toolkit version 0.0.4 #175
Comments
Actually, I'm not sure that's the problem because the error I get is:
and the SMIRNOFF 0.0.3 parameters are different from the file I linked above. I'm assuming 0.03 files was used according to this README.md, but I'll check whether there's no error in reading the topology or similar. |
Neither of those could possibly be right! 0.0.3 has zero epsilon and sigma, which we shouldn't allow: <Atom smirks="[#1:1]-[#8]" epsilon="0.0000" id="n13" rmin_half="0.0000"/> and 0.1 has an <Atom smirks="[#1:1]-[#8]" epsilon="0.3000" id="n12" rmin_half="5.27e-05"/> |
After a little, I've noticed that
so I think the repeated hydration free energy calculations were set up using 0.0.4, not 0.0.3, which has the same parameters for n12 as 0.1, but somehow the code inverted the two values when assigning the I'll run the same check tomorrow on the full freesolv set to check the extent of the problem. |
I've run the parameter assignment test on all the FreeSolv set with the OpenMM XML files generated for the SMIRNOFF 0.1 hydration free energy calculations, and it failed for 126/517 molecules. I've opened a temporary branch
Errors descriptionHere is a summary (freesolv molecule ID -> failure description):
On top of these errors, I had 4 molecules that gave some problems to the OpenEye toolkit wrapper.
What to do now?By manual inspection, it seems to me that the current version of the toolkit is doing the right thing in assigning the parameters (although a second pair of eyes would be helpful). It seems to me, the old toolkit had problems with bond orders and/or ring matches. If this is true, we need to decide
For 1., we could ignore these molecules and maybe regenerate a representative subset of them with the current version of the toolkit for future regression tests. I'm not sure it's worth spending some time debugging the old version of the toolkit at this point, although this may affect the current stable version (i.e. the one in |
I think it is worth figuring out what went wrong here in terms of whether the failure could affect the current version of the toolkit. In particular, is this a problem with
I don't think we need to fix the old toolkit---which would actually change the behavior we would need to reproduce the old data, and wouldn't be very useful to do---but it would be good to identify the root cause(s). |
I manually checked the molecule topologies for the examples above and it looked to me that the current toolkit (in I'll work to make sure that all the other failures are caused by the same problems (and don't just result in the same error). |
@jchodera @andrrizzi relating to the specific rmin_half/epsilon for polar hydrogens noted here:
Cross-posting from Slack:
Andrea thinks this is a swap of sigma and epsilon. I'll have to check; that doesn't sound right ot me. |
@andrrizzi - I think one of your tests catches an actual bug in the forcefield (fixed in #179 ). The others I'm less sure about and will require @bannanc . We DID catch a couple of parameter issues that we concluded needed fixing since the paper was put out, so it could be that the test SHOULD fail for these specific cases. @bannanc will need to investigate/weigh in; I'll send this to her on Slack also. |
@andrrizzi also check out openforcefield/smirnoff99Frosst#85 which may be one of the changes you're running into. |
See also openforcefield/smirnoff99Frosst#81 which is another. I think these are the only two changes. I'm not seeing a change to torsions. |
@andrrizzi I'm trying to understand the problems you posted so I'm going to comment on them one at a time:
Are you using the same input files and version of openeye as was used before? It seems weird for these to have not been a problem before, but are a problem now.
I think we've handled this case as an intentional change in the parameters, as discussed above, it is a typo from what we actually wanted, but was an officially released version of the force field.
I'm having a hard time understanding the direction of this, but in version 1.0.6 we added cyclobutyl parameters that were not there before. So if you're saying this
OK, I think this is the same change in parameters as above, but it seems like you should always get an intracyclic and extracyclic parameter change so I don't understand how you're not seeing a7 here. It would be really helpful if you could include SMILES or pictures of the molecules when you highlight issues so I can understand what chemistry is there.
This one makes me worry you have a different tautomer or something with your molecule because the first SMIRKS should only match a sulfur next two a monovalent oxygen (
OK, I'm still not clear on how you're identifying which parameter is being assigned incorrectly... I don't remember us changing this torsion, so we'd need to dig a little deeper. However, you should consider the chemistry of the center bond, t50 cannot match the same bond as t51, note the change in the connectivity on the nitrogen. However, t56 is a more subset of the chemistry matched in t51, same connectivity on the carbon and nitrogen, but now the carbon is in a 3-membered ring. |
If the goal is to check that the parameters are assigned in the same way, I think it makes more sense to look at which SMIRKS match the molecule, rather than trying to understand where the energy difference tracks to a parameter. I did this when we switched from the |
Thank you both for the pointers, @bannanc, @davidlmobley! I think that clears up the issue with the nonbonded and angle parameters. I'll update my summary post above accordingly (and my testing code).
I think I'm using the same input files, but a more recent version of OpenEye. I'm not sure about the charge error, but I think the
Great! That's exactly what is happening.
Sorry about that! I thought the mol2 files I uploaded were sufficient but I now realize it's not the most convenient way of sharing. I'll update my post above with the SMILES!
In short, I'm taking the OpenMM XML System input files we generated for the hydration free energies (for example, see here: https://github.com/MobleyLab/SMIRNOFF_paper_code/tree/master/FreeSolv_repeat_subset), then reparameterizing those systems starting from the same input mol2 files with the latest openforcefield toolkit and check that it assigns the same parameters. To guess the possible match that the old toolkit found, I was just looking at the old
The goal here was to add a generic Travis test to make sure we didn't introduce any major regression in the parameter assignment in the |
I'm probably missing something, but I'm not sure this change made it into the |
@andrrizzi Are you using the the smirnoff99Frosst.offxml in the master branch? We've made changes to smirnoff99Frosst since the topology branch was created? Those parameters are changed in the openforcefield master |
Ah! That's it, thank you! I'm planning to merge |
Thanks, @bannanc and @andrrizzi ! I've also adjusted my github notification preferences so hopefully I will be more responsive when tagged now. (Before I was getting deluged with so many notifications I was missing the ones where I was tagged.) |
I think I've tracked down the origin of the failures for the bond and the proper torsions. The problem affects 9 molecules of the FreeSolv set used for the first batch of the hydration free energy calculations (not the repeat with the update polar hydrogen parameters), which were parametrized with The different behavior in 0.0.2 is triggered by incorrect information in the mol2 file about the bond order and/or this line, which calls Here are two examples with more details: 3323117 The S=O bond in the figure above should match 63712 The torsion involving the tetrahedral nitrogen above should match Here is the complete list of molecules affected by the problem
3323117 is the only one with the bond problem, all the others have the same torsion issue involving a tetrahedral nitrogen. The call to I am going to investigate the charged and undefined stereochemistry errors next. |
Wow---great sleuthing, @andrrizzi! That was indeed an important bugfix. |
I believe the remaining problems are not worrisome. The I think the 2501588: N-(cyclopropylmethyl)-2,6-dinitro-N-propyl-4-(trifluoromethyl)aniline 3266352: N-methyl-N-(2,2,2-trifluoroethyl)aniline 7829570: N-butyl-N-ethyl-2,6-dinitro-4-(trifluoromethyl)aniline I believe in all cases the lone pair is delocalized and the nitrogen is not chiral, but it would be helpful if somebody else could double-check these. |
This may be relevant to @davidlmobley's work on tertiary amines and @ChayaSt's findings on OpenEye perception of these atoms as chiral. |
Thanks, @andrrizzi -- I agree with your points and what you've done here. Excellent. And this is not relevant immediately to our work on tertiary amines, at least in the sense that we haven't been doing anything with these molecules. |
The issue here seems to be that:
So, beyond this specific case, what is the desired behavior when a user gives a molecule where geometry and bond order disagree`? |
I believe we should only be using the geometry information for assigning partial charges right now. However, this should later be corrected, since it would be ideal if we had a reproducible charging scheme that produced charges that were entirely independent of conformation. For typing, all information should come from the chemical information in the Ultimately, we should aim to produce force field parameters that only utilize the chemical information in the While it may be useful to have some utilities that check for consistency between geometry-derived information (bond orders, chirality, etc) and the direct chemical information specified in the |
I agree with John entirely: The semiempirical QM calculation should use the geometry; the typing/chemical perception should use the molecule. If these are inconsistent, we might eventually want something on the front end which will warn people about this (there is an issue discussing this which is open somewhere). To rehash what's in that issue, it's likely that we'll have two main categories of users: This means basically we want to warn everyone when they do something which might be questionable/odd, and allow people in category (b) to easily take a chemistry "clean up" pass before bringing in their molecules. But people in (a) won't want that. |
We currently use geometry to get stereochemistry when loading molecules from 3D representations. We need to perceive stereochemistry when making OFFMols if we want to keep the door open for stereo-specific SMARTS. We might want to punt on this. However, it's a big decision. If we don't build infrastructure for strict stereochemistry checking now, the day that we update the toolkit to make those checks (maybe we release a FF version with stereo-specific SMARTS), it will start complaining about loading molecules that it was previously just fine with. |
This issue was resolved in #183 . |
I've attempted to test whether the current toolkit can recreate the
System
s we used for the hydration free energy calculations in the SMIRNOFF paper. Currently, the test fails for molecule1019269
(1-butanol). Because of a different Lennard-Jones parameter that changed from version 0.0.3 (the toolkit version used for the calculations) to the currentsmirnoff99Frosst.offxml
. Here are the parameters:Is this expected? If it is, should I just ignore the test case?
The text was updated successfully, but these errors were encountered: