add LRU cache to structure matcher #4036

kbuma · 2024-09-03T21:48:13Z

Summary

This adds an LRU cache to the structure matcher, specifically to the StructureMatcher._get_reduced_structure function.

This suggested improvement is motivated by the performance of the AflowPrototypeMatcher._match_prototype function (and in turn the robocrys builder that uses it). We believe this will speed up that calculation by ~288x since the sometimes expensive calculation in StructureMatcher._get_reduced_structure will only need to be performed once per structure rather than the 288 times it is performed now.

make structure hashable

… hashability

DanielYang59 · 2024-09-04T02:53:42Z

Beautiful! That's an impressive performance gain!

I guess we should also be aware of the potential memory leakage issue for lru_cache when applied on methods (I believe a reference to the cls would also be held in the case of class methods, but I could be wrong here):

Using the functools.lru_cache and functools.cache decorators on methods can lead to memory leaks, as the global cache will retain a reference to the instance, preventing it from being garbage collected.

Also here is a more detailed explanation from Anthony Sottile .

I guess we could just make it a staticmethod to avoid this unless there's some reason to make it classmethod that I'm not aware of?

Structure hash fixes

shyuep · 2024-09-04T16:59:40Z

I would move the method out of the class and just make it a method. This will resolve the memory leak issue (though I don't think it is a significant problem in our case).

shyuep · 2024-09-04T17:03:48Z

I would be very wary of the current hash function. That looks like an extremely expensive hash function. Furthermore, depending on transformations, this hash violates some basic rules on hashes. I would argue it is better to use the formula as a hash.

… leakage issue

esoteric-ephemera · 2024-09-04T17:08:21Z

@shyuep, it could be expensive but my concern about using the formula is uniqueness - thinking about polymorphs, displacements, different magnetic configs

For transformations, shouldn't the hash change to reflect that the underlying object has changed? Maybe this is my misunderstanding of how hashes should apply to mutables

My thinking is: if I run a transformation on a structure to displace a site, the original hash should no longer apply to the transformed structure, right?

shyuep · 2024-09-04T17:13:40Z

@esoteric-ephemera The rule for hashes is that they do not have to be unique. Ultimately, the __eq__ function is what must be unique for different objects. A valid but bad hash function would be return 42. This is a common implementation in simple "Hello World" type tutorials for many languages.

See https://www.geeksforgeeks.org/what-are-hash-functions-and-how-to-choose-a-good-hash-function

Efficiency and minimizing collisions is the key. But collisions need not be 0 (that is what __eq__ is for). For a structure, the formula is a pretty good hash function - equal structures definitely have the same formula, and formula is cheap to compute. There will be collisions but they should be relatively minimal (polymorphs basically).

kbuma · 2024-09-04T17:24:23Z

@shyuep @esoteric-ephemera An alternative to adding a hash function to Structure that would work for this use case is to create an IStructure copy in the class method and call the new cached function with that copy. It would be something like this:

changing the class method to

@classmethod
def _get_reduced_structure(cls, struct: Structure, primitive_cell: bool = True, niggli: bool = True) -> Structure:
    immutable_struct = IStructure.from_dict(struct.as_dict())
    return _get_reduced_structure(immutable_struct, primitive_cell, niggli)

and the new function to

@lru_cache(maxsize=LRU_CACHE_SIZE)
def _get_reduced_structure(
  struct: IStructure, primitive_cell: bool, niggli: bool
) -> Structure:
  """Helper method to find a reduced structure."""
  if niggli:
      struct = struct.get_reduced_structure(reduction_algo="niggli")
  if primitive_cell:
      struct = struct.get_primitive_structure()
  return Structure.from_dict(struct.as_dict())

…ination and missing pytest approx in TestBSPlot)

Fix garbage collection and cache sharing

esoteric-ephemera · 2024-09-06T15:53:49Z

I think we have a good solution now that doesn't require hashing Structure, and the failing ruff stuff is unrelated to this PR -should be ready for review

shyuep · 2024-09-06T15:56:59Z

Merged. Thanks.

esoteric-ephemera · 2024-09-06T17:39:08Z

Could we push a release relatively soon (or a release candidate)? Would help with builder development on our side

add LRU cache to _get_reduced_structure computations

496e77a

kbuma requested review from shyuep, janosh and mkhorton as code owners September 3, 2024 21:48

pre-commit-ci bot and others added 5 commits September 3, 2024 21:49

pre-commit auto-fixes

db10aba

make structure hashable via as_dict

b49c7e5

Merge pull request #1 from esoteric-ephemera/hash_struct_lru

3b59541

make structure hashable

make structure hash recursive as_dict, change structure test to check…

a83eea4

… hashability

precommit

c55077d

Merge pull request #2 from esoteric-ephemera/hash_struct_lru

72a7df5

Structure hash fixes

kbuma and others added 2 commits September 4, 2024 10:05

moved computation using lru_cache out of class method to avoid memory…

fec1f19

… leakage issue

pre-commit auto-fixes

db984f2

esoteric-ephemera and others added 8 commits September 5, 2024 12:12

fix structure matcher caching, fix a few tests (mcsqs wrong file dest…

a9d5df7

…ination and missing pytest approx in TestBSPlot)

precommit

fd01611

add suggested SiteOrderedIStructure from @kbuma

35d7f92

Merge pull request #3 from esoteric-ephemera/hash_struct_lru

6cc945c

Fix garbage collection and cache sharing

Merge branch 'master' into cache-structure-matcher

e37b961

pre-commit auto-fixes

32c14b1

add cast in eq for SiteOrderedIStructure to make mypy happy

d13f798

pre-commit auto-fixes

d8b2a21

shyuep merged commit 51ea7de into materialsproject:master Sep 6, 2024
42 of 43 checks passed

janosh added performance Some functionality is too slow or regressed analysis Concerning pymatgen.analysis labels Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add LRU cache to structure matcher #4036

add LRU cache to structure matcher #4036

kbuma commented Sep 3, 2024

DanielYang59 commented Sep 4, 2024 •

edited

Loading

shyuep commented Sep 4, 2024

shyuep commented Sep 4, 2024

esoteric-ephemera commented Sep 4, 2024

shyuep commented Sep 4, 2024 •

edited

Loading

kbuma commented Sep 4, 2024 •

edited by janosh

Loading

esoteric-ephemera commented Sep 6, 2024

shyuep commented Sep 6, 2024

esoteric-ephemera commented Sep 6, 2024

add LRU cache to structure matcher #4036

add LRU cache to structure matcher #4036

Conversation

kbuma commented Sep 3, 2024

Summary

DanielYang59 commented Sep 4, 2024 • edited Loading

shyuep commented Sep 4, 2024

shyuep commented Sep 4, 2024

esoteric-ephemera commented Sep 4, 2024

shyuep commented Sep 4, 2024 • edited Loading

kbuma commented Sep 4, 2024 • edited by janosh Loading

esoteric-ephemera commented Sep 6, 2024

shyuep commented Sep 6, 2024

esoteric-ephemera commented Sep 6, 2024

DanielYang59 commented Sep 4, 2024 •

edited

Loading

shyuep commented Sep 4, 2024 •

edited

Loading

kbuma commented Sep 4, 2024 •

edited by janosh

Loading