Cache entry point lookups #6124

danielhollas · 2023-09-18T14:47:12Z

In #6091 we have sped up aiida.orm import significantly by delaying the call to get_type_string_from_class. However, we did not solve the underlying issue -- the reason why this call is so expensive is because it essentially involves a quadruply-nested for loop.

Here I cache the results of importlib_metadata.entry_points.select() calls, which should get rid of the two inner loops coming from importlib_metadata code. This change shaves off around 25ms from verdi invocation, but should help throughout the code base. I am happy to do further benchmarking if needed (what should I benchmark?)

cc @sphuber

ltalirz

Thanks a lot @danielhollas !

introducing caches is always increasing complexity,
but in this particular case I can't really come up with a scenario where this would hurt us (eps() is already being cached, so this should just be saving computational time)

so to me this looks fine

tests/plugins/test_entry_point.py

ltalirz · 2023-09-22T09:36:48Z

aiida/plugins/entry_point.py

 def eps():
    return _eps()


+@functools.lru_cache(maxsize=100)


interesting, so do I understand correctly that eps() is cached, nothing is being read from disk here, and processing the few entry points in memory is still taking 25ms?

Can you please document this function to explain the quadruple loop problem here, and why this additional cache is necessary?

Yes, I'll write a comment. You can look at the cProfile info on the forum. https://aiida.discourse.group/t/why-is-aiidalab-base-widget-import-so-slow/32/16?u=danielhollas

Together with the fact that we were calling this during each Node class build (in a metaclass), this completely explained why the aiida.orm import was so slow. See #6091

sphuber

As @ltalirz I am bit hesitant whether this could introduce some subtle bugs related to cache invalidation, but couldn't come up with one for now. So think we can add this. Just have one comment

sphuber · 2023-09-22T10:05:20Z

aiida/plugins/entry_point.py

 def eps():
    return _eps()


+@functools.lru_cache(maxsize=100)


Are we worried about the size of the cache? I think the number of different calls to eps_select should be reasonable, not exceeding on the order of 100. So wouldn't we be better of with lru_cache(maxsize=None) i.e. cache which will be faster. Not sure how much faster it will be compared to an LRU cache of max 100 items. Might be negligible

I am actually worried. I wouldn't be surprised if the number of calls was bigger then 100, especially if plugins are installed, since in some functions we're essentially iterating over all existing entry points when looking for the entry point for a given class. I'll take a closer look and do some more benchmarking.

Are we doing this iteration or is it importlib internally?

Are we trying to find an entry point for a class without knowing the name of the entry point?

since in some functions we're essentially iterating over all existing entry points when looking for the entry point for a given class. I'll take a closer look and do some more benchmarking.

These don't call this eps_select function though, do they? The cache here simply applies to the number of combinations of arguments with which it is called. Since it just has group and name, it should just be the list of all (group, name) tuples with which the function is called. This should, reasonably, not be much larger than the entry points that exist.

aiida/manage/configuration/config.py

tests/plugins/test_entry_point.py

danielhollas · 2023-10-02T13:44:13Z

I looked a bit more carefully at the implementation of the get_entry_point_from_class.
In this function, we essentially need to iterate over all existing entry points (in my environment, there are 360 of them!).
And we're doing this iteration very inefficiently, because we first grab the groups, and then filter entry points in each group. But this filtering is done using EntryPoints.select() which itself iterates over all the entry points.

Instead, we can simply iterate over all the entry points directly, see the last commit.

Here are some timings, using a non-existing class to measure the worst-case scenario:
(note that I have disabled cache for get_entry_point_from_class for these timings)

Main branch

In [1]: from aiida.plugins.entry_point import get_entry_point_from_class, eps
In [2]: p = eps() #warmup cache
In [3]: %timeit get_entry_point_from_class('invalid', 'invalid')
21.4 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Caching eps.select() on this branch

In [1]: from aiida.plugins.entry_point import get_entry_point_from_class, eps
In [2]: p = eps(); get_entry_point_from_class('invalid', 'invalid') # warmup caches
In [12]: %timeit  get_entry_point_from_class('invalid', 'invalid')
329 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Iterating over all entry points directly

In [1]: from aiida.plugins.entry_point import get_entry_point_from_class, eps
In [2]: p = eps()
In [5]: %timeit get_entry_point_from_class('invalid', 'invalid')
288 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So the last approach clearly wins, and perhaps we could even remove the cache on top of this function.

I'll do a bit more investigation to see if it still makes sense to cache eps.select() since it is called from other functions as well.

EDIT: The last commit does not work, need to investigate more...

sphuber · 2023-10-02T15:31:58Z

EDIT: The last commit does not work, need to investigate more..

The problem is that get_entry_points returns a list of EntryPoint whereas the importlib_metadata.entry_points function returns a list of strings. So you would have to load the entry point for each string in order to be able to get the module and class name. Not sure what the impact of having to load each entry point is going to be for the performance and whether the implementation will then still be faster than simply using get_entry_points.

danielhollas · 2023-10-02T15:41:50Z

@sphuber yes. The reason why this worked for me locally is that I had importlib_metadata==5.2.0 instead of importlib_metadata==4.13. So if we bumped the version this new approach would work. I am not sure if there is a reason why we are at 4.13?

EDIT: To clarify in 5.2.0 entry_points() returns the EntryPoints instance that has all the necessary info and can be directly iterated over.

EDIT: Updating 5.2.0 would require at the very least updating the EntryPointManager pytest fixture.

This reverts commit 6cf4b2e.

sphuber · 2023-10-06T09:02:06Z

EDIT: Updating 5.2.0 would require at the very least updating the EntryPointManager pytest fixture.

This should be straightforward. The eps has just changed from a nested mapping to a flat list, so instead of having add_entry_point treat it like a dict, we should simply append the entry point to the list. @danielhollas want me to take resolve this so we can wrap up this PR?

danielhollas · 2023-10-06T13:53:26Z

@sphuber So I should have clarified my hesitation here: I was not sure if you'd be okay with upgrading importlib_metadata because I wasn't sure if downstream packages might depend on this package as well. If that is not a concern I'll go ahead and finish this today or over the weekend.

The eps has just changed from a nested mapping to a flat list, so instead of having add_entry_point treat it like a dict, we should simply append the entry point to the list. @danielhollas want me to take resolve this so we can wrap up this PR?

Indeed, but I stumbled at first because EntryPoints is in fact a subclass of tuple and not a list hence we'll probably need to recreate it instead of modifying it directly.

EDIT: I've checked aiida-quantumespresso and it depends on importlib_resources and not importlib_metadata, I also have these mixed up. I don't know if there's a reason why other plugins should depend on importlib_metadata since aiida-core provides useful abstractions over it.

sphuber · 2023-10-06T13:59:06Z

No, I think it is fine for use to update the minimum requirement of importlib_metadat. Please feel free to go ahead with this

sphuber

Thanks @danielhollas . Implementation looks good, just some minor docstring requests and then we can merge

aiida/manage/tests/pytest_fixtures.py

aiida/plugins/entry_point.py

ltalirz · 2023-10-09T13:16:40Z

I'll do a bit more investigation to see if it still makes sense to cache eps.select() since it is called from other functions as well.

Just checking whether this is then indeed still necessary or whether we can reduce the caching

danielhollas · 2023-10-10T02:24:11Z

Just checking whether this is then indeed still necessary or whether we can reduce the caching

So some timings: tl;dr: entry_points.select() is a fairly expensive operation and we should probably cache it since aiida uses entry points extensively:

# Without cache
In [7]: %timeit eps.select(group='invalid', name='invalid')
363 µs ± 4.13 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [8]: %timeit eps.select(group='console_scripts', name='verdi')
376 µs ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# with cache
%timeit eps_select(group='console_scripts', name='verdi')
201 ns ± 0.886 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

The get_entry_point_from_class now takes similer in the order of hundreds of microseconds, so at least order of magnitude faster then before, but still 3 orders of magnitude slower than a cache so removing that cache might have some unintended consequences in some cases.
We can nonetheless achieve further speed up by pre-sorting the EntryPoints by the group name so that aiida. groups come first. This can give large speedups in some cases.

# Best case scenario, first item in sorted `EntryPoints`
In [64]: %timeit get_entry_point_from_class_uncached('aiida.calculations.arithmetic.add', 'ArithmeticAddCalculation')
2.4 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# Last item in aiida groups in sorted `EntryPoints`
In [65]: %timeit get_entry_point_from_class_uncached('aiida.workflows.arithmetic.add_multiply', 'add_multiply')
83.4 µs ± 362 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# Iterating through all entry points (243 in this case)
%timeit get_entry_point_from_class_uncached('invalid', 'invalid')
231 µs ± 416 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

The pre-sorting itself takes ~30µs so its worth it in most cases

 %timeit EntryPoints(sorted(eps_unsorted, key=lambda x: x.group))
37.2 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

danielhollas

@sphuber thank you for the review. I've modified the docstrings, and added the EntryPoints pre-sorting, as described above.

environment.yml

aiida/orm/autogroup.py

ltalirz · 2023-10-10T09:35:08Z

Thanks @danielhollas!

danielhollas added 6 commits September 18, 2023 15:38

WIP

9af19f0

WIP2

d18595d

Fix tests

5c4271f

More fix

77d7727

more

6ecd4ad

use cache instead of lru_cache

9add91d

ltalirz reviewed Sep 22, 2023

View reviewed changes

sphuber reviewed Sep 22, 2023

View reviewed changes

Iterate directly through EntryPoints in get_entry_point_from_class

afd59c6

Bump Python version to 3.10 to see if tests pass

6cf4b2e

danielhollas added 2 commits October 2, 2023 16:42

Revert "Bump Python version to 3.10 to see if tests pass"

190935c

This reverts commit 6cf4b2e.

Try bumping importlib_resources

0705701

danielhollas force-pushed the cache-entry-lookups branch from a47009a to 0705701 Compare October 2, 2023 15:53

danielhollas added 3 commits October 6, 2023 19:58

Fix tests

026f86e

Fix pre-commit

f47457f

Merge branch 'main' into cache-entry-lookups

674b27e

sphuber requested changes Oct 9, 2023

View reviewed changes

aiida/manage/tests/pytest_fixtures.py Outdated Show resolved Hide resolved

aiida/manage/tests/pytest_fixtures.py Outdated Show resolved Hide resolved

aiida/plugins/entry_point.py Outdated Show resolved Hide resolved

danielhollas added 3 commits October 10, 2023 01:08

Docstrings

53345af

Bump importlib-metadata to ~6.0

0a4b185

pre-sort EntryPoints

1bb0e15

microoptimization: unnecessary list construction

d77e464

danielhollas commented Oct 10, 2023

View reviewed changes

environment.yml Show resolved Hide resolved

aiida/orm/autogroup.py Show resolved Hide resolved

danielhollas requested review from sphuber and ltalirz October 10, 2023 02:44

This seems cleaner

9a8e308

sphuber approved these changes Oct 10, 2023

View reviewed changes

sphuber merged commit 12cc930 into aiidateam:main Oct 10, 2023
32 checks passed

danielhollas deleted the cache-entry-lookups branch October 10, 2023 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache entry point lookups #6124

Cache entry point lookups #6124

danielhollas commented Sep 18, 2023

ltalirz left a comment •

edited

Loading

ltalirz Sep 22, 2023

danielhollas Sep 22, 2023

sphuber left a comment

sphuber Sep 22, 2023

danielhollas Sep 22, 2023

ltalirz Sep 22, 2023 •

edited

Loading

sphuber Sep 22, 2023

danielhollas commented Oct 2, 2023 •

edited

Loading

sphuber commented Oct 2, 2023

danielhollas commented Oct 2, 2023 •

edited

Loading

sphuber commented Oct 6, 2023

danielhollas commented Oct 6, 2023 •

edited

Loading

sphuber commented Oct 6, 2023

sphuber left a comment

ltalirz commented Oct 9, 2023

danielhollas commented Oct 10, 2023 •

edited

Loading

danielhollas left a comment

ltalirz commented Oct 10, 2023

Cache entry point lookups #6124

Cache entry point lookups #6124

Conversation

danielhollas commented Sep 18, 2023

ltalirz left a comment • edited Loading

Choose a reason for hiding this comment

ltalirz Sep 22, 2023

Choose a reason for hiding this comment

danielhollas Sep 22, 2023

Choose a reason for hiding this comment

sphuber left a comment

Choose a reason for hiding this comment

sphuber Sep 22, 2023

Choose a reason for hiding this comment

danielhollas Sep 22, 2023

Choose a reason for hiding this comment

ltalirz Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

sphuber Sep 22, 2023

Choose a reason for hiding this comment

danielhollas commented Oct 2, 2023 • edited Loading

Main branch

Caching eps.select() on this branch

Iterating over all entry points directly

sphuber commented Oct 2, 2023

danielhollas commented Oct 2, 2023 • edited Loading

sphuber commented Oct 6, 2023

danielhollas commented Oct 6, 2023 • edited Loading

sphuber commented Oct 6, 2023

sphuber left a comment

Choose a reason for hiding this comment

ltalirz commented Oct 9, 2023

danielhollas commented Oct 10, 2023 • edited Loading

danielhollas left a comment

Choose a reason for hiding this comment

ltalirz commented Oct 10, 2023

ltalirz left a comment •

edited

Loading

ltalirz Sep 22, 2023 •

edited

Loading

danielhollas commented Oct 2, 2023 •

edited

Loading

danielhollas commented Oct 2, 2023 •

edited

Loading

danielhollas commented Oct 6, 2023 •

edited

Loading

danielhollas commented Oct 10, 2023 •

edited

Loading