-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geometry engine platform usage cleanup #1092
Conversation
…r utils stays as fastest one.
Thanks! This looks like a significant improvement! |
@ijpulidos : Everything looks good except I actually think we should keep
Is it considered a software best practice to only keep functions used solely in tests in test/utils.py? If a function is used in both a test and in the main API, I would assume its ok to keep it in test/utils.py I also don't like that in the current PR, |
@zhang-ivy While this is mostly a convention and not a well-established "best practice", the rationale here is that we don't want production code to rely on tests. In terms of organization, I think it's just better if we agree that the Just to be clear, I only moved out of I also think that separating the |
Yes I understand all of your rationale for why you made the changes, but I still don't 100% agree with it, because I don't think we should be assuming that changing things in the test code won't affect production code and vice versa. Even if we do merge this PR as is, any time we change a function, we should search the repo for all occurrences to check how the change will affect subsequent calls to the function. Happy to be overruled here, though, as long as we make sure to search for the repo for all occurrences of the changed function in the future.
|
Also, if we are going to keep the PR as is, I think we do need a changelog entry for this PR -- something that mentions that we are fixing a problem that was introduced in #1091 and that we moved a bunch of energy validation related functions used in both tests and production from tests/utils.py to dispersed/utils.py |
Yes of course, we always need to check for all the occurrences, that was never in discussion and that's always been done. Sorry if maybe my wording implied otherwise, but that wasn't the intention. I'm just saying it is a fact that these things happen regardless if we check or not, because there are many unexpected consequences. And once this happens it is better to see if it happened inside
The point here is that
Yes, we can do this, even though our rationale about the changelog is in terms of what affect users, as far as I can see. So, it is more about the "what" than the "how", as in, users don't really care that these functions now live somewhere else if they can still import them from |
I think this should be a safe assumption, at least in the direction that changing testing code shouldn't break our production code. Few notes, first, not every package even ships the testing code since it tends to be a waste of space since most end users don't run-or even know how to run-the tests that ship with the package. So when packages are arranged like:
The testing code doesn't get packaged for end users, so you can't rely on code in Second, with well written pytest code, it can be hard to use the testing code anyway, since functions may be using fixtures which won't work. For example, this bit of testing code https://github.com/OpenFreeEnergy/gufe/blob/17357f9c23b6f6e242f7d4be667705315dd85669/gufe/tests/test_protein_molecule.py#L24-L29 has an argument Third, it is confusing to end users and developers to be importing something from |
I vote we break things. Do user scripts use these functions, or are they only used internally. If it is only internal, then we should break backwards compatibility. If users are using them, it is probably only @dominicrufa and @zhang-ivy (pharma friends are using the cli) so I rather we make breaking changes that are easy to fix (just different namespace) then having to keep around crud. |
Maybe I missed something, but I assume this wasn't done with PR #1091 -- this is how I found the problem in the first place. I saw that
Yes that's correct. It's only used in testing. I thought you had been saying it isn't needed at all, which is why I was explaining why it was needed. If we are sticking with Iván proposed structure, we should just leave
Ok, that's fine with me.
Yes, that's true, but for perses, the test code does get shipped with the rest of the code, so that's not a problem here.
I think this comment highlights the reason why we are disagreeing. I would prefer to see the fewest changes to the code as possible, so that my current workflow/scripts are not disrupted. From a scientist's point of view, moving the functions from tests/utils.py to dispersed/utils.py doesn't change anything besides break my workflows, so I was trying to avoid the inconvenience, though the inconvenience is minor. From a software perspective, moving the functions has some benefits (which Iván and Mike discuss at length above), but I wouldn't say that the code is "crud" -- it works fine the way it is, but maybe is not completely ideal. Since the fix for workflow disruptions is very simple (changing the import and remembering that |
I guess a way to move forward here is to keep the backward compatibility at least for the 0.10.x branch, such that users can still import from |
Sorry! I really didn't mean to call the code "crud" I meant to say that I consider keeping something to maintain backwards compatibility around "crud' since it is easy to loose track of things like that and makes the code harder to maintain. I'm fine for keeping it backwards compatible in 10.x and removing the backwards compatibility in 11.x
I'm not sure how I would order these priorities, but I would consider these to be the main priorities/areas of concern in scientific software development to be:
Moving the function breaks user stuff (boo) but makes the code easier to maintain (yay). I think we are in a good spot with moving the function, but doing it in a backwards compatible way. I didn't do a good enough job reviewing #1091 as I took for granted that the location of the function was meaningful. A quick grep in the whole code base would have highlighted this issue, but instead I searched for where the |
I think the order of the bottom 2 bullets are where things get a bit grey depending on the case! But otherwise I agree |
@@ -438,7 +438,7 @@ def __init__(self, | |||
assert not flatten_torsions and not flatten_exceptions, "Cannot conduct endstate validation if flatten_torsions or flatten_exceptions is True" | |||
|
|||
if generate_unmodified_hybrid_topology_factory: | |||
from perses.tests.utils import validate_endstate_energies | |||
from perses.dispersed.utils import validate_endstate_energies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move these to the top for import
@@ -371,7 +371,7 @@ def _logp_propose(self, top_proposal, old_positions, beta, new_positions=None, d | |||
""" | |||
_logger.info("Conducting forward proposal...") | |||
import copy | |||
from perses.tests.utils import compute_potential_components | |||
from perses.dispersed.utils import compute_potential_components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to the top, I know sometimes it is nice to keep imports local (optional dep, import side effects, slow import) but in this case I don't think it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it that you think so? dispersed/utils.py
does import a LOT of things (even scipy), I think it is worth it to keep this in the lower level. Maybe what we can do to make the readability a bit better is to have them in the top of the method (_logp_propose
in this case). Does that sound okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we make a new utils file here:
https://github.com/choderalab/perses/tree/main/perses/utils
that is like hybrid_topology_utils or something
I figure if we are going to move this, lets move it to the right spot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree dispersed/utils.py
might not be the best place. I'm not actually sure what functionality dispersed/
is meant to contain.
I agree that perses/utils/
is a better place, maybe call it energy_validation.py
instead of hybrid_topology.py
, as the functions are used for generating htfs as well as geometry proposals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this sound reasonable? #1092 (comment)
@@ -517,7 +517,8 @@ def HybridTopologyFactory_energies(current_mol = 'toluene', proposed_mol = '1,2- | |||
""" | |||
Test whether the difference in the nonalchemical zero and alchemical zero states is the forward valence energy. Also test for the one states. | |||
""" | |||
from perses.tests.utils import generate_solvated_hybrid_test_topology, generate_endpoint_thermodynamic_states | |||
from perses.dispersed.utils import generate_endpoint_thermodynamic_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these up to the top as well
@@ -302,11 +300,11 @@ def generate_dipeptide_top_pos_sys(topology, | |||
assert not flatten_torsions and not flatten_exceptions, "Cannot conduct endstate validation if flatten_torsions or flatten_exceptions is True" | |||
|
|||
if generate_rest_capable_hybrid_topology_factory: | |||
from perses.tests.utils import validate_endstate_energies_point | |||
from perses.dispersed.utils import validate_endstate_energies_point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to top
Ok, from internal discussions with @mikemhenry and @zhang-ivy, separately, I think we might have reached an agreement for this PR and future changes. 0.11 release is going to be an API changing and breaking release. Judging from how these utility functions are being used, we probably want some of these (especially the For the other functions (namely Does that sound reasonable as a way to move forward with this PR? Thank you all for the feedback and comments, it's been really helpful. |
Description
These changes make the
CPU
platform to be the default platform for the geometry engine.While working on these changes, it was noted that some of the utility functions lived in the wrong module under
perses.tests.utils
, they were moved to a "better" module inperses.dispersed.utils
, since we should reserve theperses.tests.utils
module for things that are only required in our tests.Backward compatibility is achieved by having variables in
perses.tests.utils
to point to the newperses.dispersed.utils
functions as needed. This is to avoid breaking workflows of users calling these functions directly fromperses.tests.utils
,Motivation and context
It was noted in #1091 (comment) that we didn't want to change the default platform in general for that module, but just to change the platform used in the geometry engine.
How has this been tested?
Tested locally, verifying that the
CPU
platform is exclusively used when generating HTFs. This also gives better performance than usingCUDA
platform for "big" and "small" systems.Change log
Should be the same as in #1091 adding the following