-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add LRU cache to structure matcher #4036
add LRU cache to structure matcher #4036
Conversation
Beautiful! That's an impressive performance gain! I guess we should also be aware of the potential memory leakage issue for
Also here is a more detailed explanation from Anthony Sottile . I guess we could just make it a |
Structure hash fixes
I would move the method out of the class and just make it a method. This will resolve the memory leak issue (though I don't think it is a significant problem in our case). |
I would be very wary of the current hash function. That looks like an extremely expensive hash function. Furthermore, depending on transformations, this hash violates some basic rules on hashes. I would argue it is better to use the formula as a hash. |
@shyuep, it could be expensive but my concern about using the formula is uniqueness - thinking about polymorphs, displacements, different magnetic configs For transformations, shouldn't the hash change to reflect that the underlying object has changed? Maybe this is my misunderstanding of how hashes should apply to mutables My thinking is: if I run a transformation on a structure to displace a site, the original hash should no longer apply to the transformed structure, right? |
@esoteric-ephemera The rule for hashes is that they do not have to be unique. Ultimately, the See https://www.geeksforgeeks.org/what-are-hash-functions-and-how-to-choose-a-good-hash-function Efficiency and minimizing collisions is the key. But collisions need not be 0 (that is what |
@shyuep @esoteric-ephemera An alternative to adding a hash function to Structure that would work for this use case is to create an IStructure copy in the class method and call the new cached function with that copy. It would be something like this: changing the class method to @classmethod
def _get_reduced_structure(cls, struct: Structure, primitive_cell: bool = True, niggli: bool = True) -> Structure:
immutable_struct = IStructure.from_dict(struct.as_dict())
return _get_reduced_structure(immutable_struct, primitive_cell, niggli) and the new function to @lru_cache(maxsize=LRU_CACHE_SIZE)
def _get_reduced_structure(
struct: IStructure, primitive_cell: bool, niggli: bool
) -> Structure:
"""Helper method to find a reduced structure."""
if niggli:
struct = struct.get_reduced_structure(reduction_algo="niggli")
if primitive_cell:
struct = struct.get_primitive_structure()
return Structure.from_dict(struct.as_dict()) |
…ination and missing pytest approx in TestBSPlot)
Fix garbage collection and cache sharing
I think we have a good solution now that doesn't require hashing |
Merged. Thanks. |
Could we push a release relatively soon (or a release candidate)? Would help with builder development on our side |
Summary
This adds an LRU cache to the structure matcher, specifically to the StructureMatcher._get_reduced_structure function.
This suggested improvement is motivated by the performance of the AflowPrototypeMatcher._match_prototype function (and in turn the robocrys builder that uses it). We believe this will speed up that calculation by ~288x since the sometimes expensive calculation in StructureMatcher._get_reduced_structure will only need to be performed once per structure rather than the 288 times it is performed now.