Cache Factory._find_provider_class module look-ups (#2112)

This caches the look-ups of the provider class based on the provider name and locale (and specific subclass of Factory, if any), making construction of multiple Faker instances ~20✕ faster. This shouldn't change external behaviour unless someone is doing things that seem surprising: - using the same `provider_path` to refer to different modules, via some sort of dynamic module magic - a provider that is highly dynamic somehow, e.g. its `default_locale` attribute changes Doing the provider class look-up can be quite seemingly because of the `list_module` traversals, resulting in this appearing very high in the profiles of some test suites in my work repo (which create many independent faker instances, separately seeded). For instance, running profiling in IPython with Faker v30.1.0 via: ```python %prun -l 10 -s cumtime [faker.Faker() for _ in range(100)] ``` Takes 1.86 seconds and has this as the top 10 (cumulatively) slowest calls: ``` ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.862 1.862 {built-in method builtins.exec} 1 0.000 0.000 1.862 1.862 <string>:1(<module>) 1 0.000 0.000 1.862 1.862 <string>:1(<listcomp>) 100 0.001 0.000 1.861 0.019 proxy.py:31(__init__) 100 0.005 0.000 1.860 0.019 factory.py:23(create) 2500 0.006 0.000 1.726 0.001 factory.py:66(_find_provider_class) 1900 0.002 0.000 1.650 0.001 loading.py:31(list_module) 1900 0.013 0.000 1.616 0.001 loading.py:38(<listcomp>) 61700 0.032 0.000 1.603 0.000 pkgutil.py:110(iter_modules) 61700 0.106 0.000 1.551 0.000 pkgutil.py:144(_iter_file_finder_modules) ``` By putting `@functools.cache` on `Factory._find_provider_class`, that function only runs once for each combination of provider_path, locale and cls (Factory subclass). This potentially increases memory usage slightly, but in all but extreme cases, each of those args should only be used with a limited number of values. Benchmarks: - Running `%timeit faker.Faker()` in IPython: - Before: `12.2 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)` - After: `555 µs ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)` - Faker's test suite: running Faker's own test suite (specifically the number reported in pytest's footer after running the 'main' test suite, not tests/pytest/session_overides, and not including any of the other commands tox runs) show approximately this behaviour: ~90s -> ~60s. - With a similar change hacked into my real work repo, time to run a particular test suite that creates a lot of Fakers goes from ~35s -> ~15s. (NB. the second two "macro" benchmarks are very noisy.) Running the same profiling command now takes 0.135s and shows these top 10 calls: ``` ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.135 0.135 {built-in method builtins.exec} 1 0.000 0.000 0.135 0.135 <string>:1(<module>) 1 0.000 0.000 0.135 0.135 <string>:1(<listcomp>) 100 0.000 0.000 0.135 0.001 proxy.py:31(__init__) 100 0.002 0.000 0.134 0.001 factory.py:24(create) 2500 0.052 0.000 0.131 0.000 generator.py:32(add_provider) 2500 0.032 0.000 0.032 0.000 {built-in method builtins.dir} 176400 0.016 0.000 0.016 0.000 {method 'startswith' of 'str' objects} 80400 0.009 0.000 0.016 0.000 generator.py:100(set_formatter) 98500 0.011 0.000 0.011 0.000 {built-in method builtins.getattr} ```
joke2k · Oct 4, 2024 · b605d21 · b605d21
1 parent b25d2e8
commit b605d21
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 2 deletions.
diff --git a/faker/factory.py b/faker/factory.py
@@ -1,3 +1,4 @@
+import functools
 import locale as pylocale
 import logging
 import sys
@@ -64,6 +65,7 @@ def create(
         return faker
 
     @classmethod
+    @functools.cache
     def _find_provider_class(
         cls,
         provider_path: str,

diff --git a/tests/test_factory.py b/tests/test_factory.py
@@ -151,6 +151,10 @@ class Provider:
                 def __init__(self, *args, **kwargs):
                     pass
 
+        # There's a cache based on the provider name, so when the provider changes behaviour we need
+        # a new name:
+        provider_path = f"test_lang_localized_provider_{with_default}"
+
         with patch.multiple(
             "faker.factory",
             import_module=MagicMock(return_value=DummyProviderModule()),
@@ -167,8 +171,8 @@ def __init__(self, *args, **kwargs):
                 ("ar_EG", with_default),  # True if module defines a default locale
             ]
             for locale, expected_used in test_cases:
-                factory = Factory.create(providers=["dummy"], locale=locale)
-                assert factory.providers[0].__provider__ == "dummy"
+                factory = Factory.create(providers=[provider_path], locale=locale)
+                assert factory.providers[0].__provider__ == provider_path
                 from faker.config import DEFAULT_LOCALE
 
                 print(f"requested locale = {locale} , DEFAULT LOCALE {DEFAULT_LOCALE}")