Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Cache Factory._find_provider_class module look-ups (#2112)
This caches the look-ups of the provider class based on the provider name and locale (and specific subclass of Factory, if any), making construction of multiple Faker instances ~20✕ faster. This shouldn't change external behaviour unless someone is doing things that seem surprising: - using the same `provider_path` to refer to different modules, via some sort of dynamic module magic - a provider that is highly dynamic somehow, e.g. its `default_locale` attribute changes Doing the provider class look-up can be quite seemingly because of the `list_module` traversals, resulting in this appearing very high in the profiles of some test suites in my work repo (which create many independent faker instances, separately seeded). For instance, running profiling in IPython with Faker v30.1.0 via: ```python %prun -l 10 -s cumtime [faker.Faker() for _ in range(100)] ``` Takes 1.86 seconds and has this as the top 10 (cumulatively) slowest calls: ``` ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.862 1.862 {built-in method builtins.exec} 1 0.000 0.000 1.862 1.862 <string>:1(<module>) 1 0.000 0.000 1.862 1.862 <string>:1(<listcomp>) 100 0.001 0.000 1.861 0.019 proxy.py:31(__init__) 100 0.005 0.000 1.860 0.019 factory.py:23(create) 2500 0.006 0.000 1.726 0.001 factory.py:66(_find_provider_class) 1900 0.002 0.000 1.650 0.001 loading.py:31(list_module) 1900 0.013 0.000 1.616 0.001 loading.py:38(<listcomp>) 61700 0.032 0.000 1.603 0.000 pkgutil.py:110(iter_modules) 61700 0.106 0.000 1.551 0.000 pkgutil.py:144(_iter_file_finder_modules) ``` By putting `@functools.cache` on `Factory._find_provider_class`, that function only runs once for each combination of provider_path, locale and cls (Factory subclass). This potentially increases memory usage slightly, but in all but extreme cases, each of those args should only be used with a limited number of values. Benchmarks: - Running `%timeit faker.Faker()` in IPython: - Before: `12.2 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)` - After: `555 µs ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)` - Faker's test suite: running Faker's own test suite (specifically the number reported in pytest's footer after running the 'main' test suite, not tests/pytest/session_overides, and not including any of the other commands tox runs) show approximately this behaviour: ~90s -> ~60s. - With a similar change hacked into my real work repo, time to run a particular test suite that creates a lot of Fakers goes from ~35s -> ~15s. (NB. the second two "macro" benchmarks are very noisy.) Running the same profiling command now takes 0.135s and shows these top 10 calls: ``` ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.135 0.135 {built-in method builtins.exec} 1 0.000 0.000 0.135 0.135 <string>:1(<module>) 1 0.000 0.000 0.135 0.135 <string>:1(<listcomp>) 100 0.000 0.000 0.135 0.001 proxy.py:31(__init__) 100 0.002 0.000 0.134 0.001 factory.py:24(create) 2500 0.052 0.000 0.131 0.000 generator.py:32(add_provider) 2500 0.032 0.000 0.032 0.000 {built-in method builtins.dir} 176400 0.016 0.000 0.016 0.000 {method 'startswith' of 'str' objects} 80400 0.009 0.000 0.016 0.000 generator.py:100(set_formatter) 98500 0.011 0.000 0.011 0.000 {built-in method builtins.getattr} ```
- Loading branch information