Skip to content

Commit

Permalink
Cache Factory._find_provider_class module look-ups (#2112)
Browse files Browse the repository at this point in the history
This caches the look-ups of the provider class based on the provider
name and locale (and specific subclass of Factory, if any), making
construction of multiple Faker instances ~20✕ faster.

This shouldn't change external behaviour unless someone is doing
things that seem surprising:

- using the same `provider_path` to refer to different modules, via
  some sort of dynamic module magic

- a provider that is highly dynamic somehow, e.g. its `default_locale`
  attribute changes

Doing the provider class look-up can be quite seemingly because of the
`list_module` traversals, resulting in this appearing very high in the
profiles of some test suites in my work repo (which create many
independent faker instances, separately seeded).

For instance, running profiling in IPython with Faker v30.1.0 via:

```python
%prun -l 10 -s cumtime [faker.Faker() for _ in range(100)]
```

Takes 1.86 seconds and has this as the top 10 (cumulatively) slowest
calls:

```
  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.862    1.862 {built-in method builtins.exec}
        1    0.000    0.000    1.862    1.862 <string>:1(<module>)
        1    0.000    0.000    1.862    1.862 <string>:1(<listcomp>)
      100    0.001    0.000    1.861    0.019 proxy.py:31(__init__)
      100    0.005    0.000    1.860    0.019 factory.py:23(create)
     2500    0.006    0.000    1.726    0.001 factory.py:66(_find_provider_class)
     1900    0.002    0.000    1.650    0.001 loading.py:31(list_module)
     1900    0.013    0.000    1.616    0.001 loading.py:38(<listcomp>)
    61700    0.032    0.000    1.603    0.000 pkgutil.py:110(iter_modules)
    61700    0.106    0.000    1.551    0.000 pkgutil.py:144(_iter_file_finder_modules)
```

By putting `@functools.cache` on `Factory._find_provider_class`, that
function only runs once for each combination of provider_path, locale
and cls (Factory subclass). This potentially increases memory usage
slightly, but in all but extreme cases, each of those args should only
be used with a limited number of values.

Benchmarks:

- Running `%timeit faker.Faker()` in IPython:

  - Before: `12.2 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)`
  - After: `555 µs ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)`

- Faker's test suite: running Faker's own test suite (specifically the
  number reported in pytest's footer after running the 'main' test
  suite, not tests/pytest/session_overides, and not including any of the
  other commands tox runs) show approximately this behaviour: ~90s -> ~60s.

- With a similar change hacked into my real work repo, time to run a
  particular test suite that creates a lot of Fakers goes from ~35s ->
  ~15s.

(NB. the second two "macro" benchmarks are very noisy.)

Running the same profiling command now takes 0.135s and shows these
top 10 calls:

```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.135    0.135 {built-in method builtins.exec}
        1    0.000    0.000    0.135    0.135 <string>:1(<module>)
        1    0.000    0.000    0.135    0.135 <string>:1(<listcomp>)
      100    0.000    0.000    0.135    0.001 proxy.py:31(__init__)
      100    0.002    0.000    0.134    0.001 factory.py:24(create)
     2500    0.052    0.000    0.131    0.000 generator.py:32(add_provider)
     2500    0.032    0.000    0.032    0.000 {built-in method builtins.dir}
   176400    0.016    0.000    0.016    0.000 {method 'startswith' of 'str' objects}
    80400    0.009    0.000    0.016    0.000 generator.py:100(set_formatter)
    98500    0.011    0.000    0.011    0.000 {built-in method builtins.getattr}
```
  • Loading branch information
huonw authored Oct 4, 2024
1 parent b25d2e8 commit b605d21
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 2 deletions.
2 changes: 2 additions & 0 deletions faker/factory.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import functools
import locale as pylocale
import logging
import sys
Expand Down Expand Up @@ -64,6 +65,7 @@ def create(
return faker

@classmethod
@functools.cache

Check failure on line 68 in faker/factory.py

View workflow job for this annotation

GitHub Actions / typing (3.8)

Module has no attribute "cache"
def _find_provider_class(
cls,
provider_path: str,
Expand Down
8 changes: 6 additions & 2 deletions tests/test_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,10 @@ class Provider:
def __init__(self, *args, **kwargs):
pass

# There's a cache based on the provider name, so when the provider changes behaviour we need
# a new name:
provider_path = f"test_lang_localized_provider_{with_default}"

with patch.multiple(
"faker.factory",
import_module=MagicMock(return_value=DummyProviderModule()),
Expand All @@ -167,8 +171,8 @@ def __init__(self, *args, **kwargs):
("ar_EG", with_default), # True if module defines a default locale
]
for locale, expected_used in test_cases:
factory = Factory.create(providers=["dummy"], locale=locale)
assert factory.providers[0].__provider__ == "dummy"
factory = Factory.create(providers=[provider_path], locale=locale)
assert factory.providers[0].__provider__ == provider_path
from faker.config import DEFAULT_LOCALE

print(f"requested locale = {locale} , DEFAULT LOCALE {DEFAULT_LOCALE}")
Expand Down

0 comments on commit b605d21

Please sign in to comment.