Improve import time of various stdlib modules #118761

layday · 2024-05-08T13:13:05Z

Feature or enhancement

Proposal:

Following on from #109653, further improvements can be made to import times.

Links to previous discussion of this feature:

https://discuss.python.org/t/deferred-computation-evalution-for-toplevels-imports-and-dataclasses/34173

For example:

importlib.metadata is often used for tasks that need to happen at import, e.g. to enumerate/load entry point plug-ins, so it might be worth seeing if we can cut down its own import time a bit more.

importlib.metadata imports zipfile at the top for a function that won't be called in the vast majority of cases. It also imports importlib.abc, which in turn imports importlib.resources, to subclass an ABC with a single, non-abstract method - I assume redefining the method in importlib.metadata would be harmless. Some other less frequently-used imports which are only accessed once or twice, such as json, could also be tucked away in their calling functions.

Linked PRs

The text was updated successfully, but these errors were encountered:

hugovk · 2024-08-05T14:31:44Z

@layday Is it okay if I repurpose this issue as an "Improve import time of various stdlib modules" like #109653 but for 3.14?

I've got some pprint improvements, and if we have importlib.metadata and some others, we can group them under the same umbrella issue like last time.

layday · 2024-08-05T16:20:48Z

Sure!

danielhollas · 2024-08-06T02:30:31Z

I've opened a PR over at the importlib_metadata repo that avoids importing inspect. python/importlib_metadata#499

importlib.metadata imports zipfile at the top for a function that won't be called in the vast majority of cases.
Some other less frequently-used imports which are only accessed once or twice, such as json, could also be tucked away in their calling functions.

@layday were you planning on tackling these?

It also imports importlib.abc, which in turn imports importlib.resources, to subclass an ABC with a single, non-abstract method

This seems to be solved on main, importlib.abc no longer imports importlib.resources.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

danielhollas · 2024-08-27T15:21:50Z

I've opened a PR over at the importlib_metadata repo that avoids importing inspect. python/importlib_metadata#499

This has been merged and released in version 8.4 of importlib_metadata 🎉

importlib.metadata imports zipfile at the top for a function that won't be called in the vast majority of cases. It also imports importlib.abc, which in turn imports importlib.resources, to subclass an ABC with a single, non-abstract method - I assume redefining the method in importlib.metadata would be harmless. Some other less frequently-used imports which are only accessed once or twice, such as json, could also be tucked away in their calling functions.

I've submitted python/importlib_metadata#502 that defers zip import, and python/importlib_metadata#503 which defers json and platform.

picnixz · 2024-08-31T08:11:49Z

(removing the 3.14 label since features always target the main branch)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>

hugovk · 2025-01-07T11:55:55Z

Note for when documenting this in What's New in Python 3.14, can also include #128559 / #128560.

Importing `pickle` is now roughly 25% faster. Importing the `re` module is no longer needed and thus `re` is no more implicitly exposed as `pickle.re`. --------- Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

vstinner · 2025-01-14T12:50:17Z

@picnixz: I suggest to close this issue.

picnixz · 2025-01-14T12:52:00Z

There is also pickletools that I want to improve (because we can remove the dependency on re as well) and base64 that remains to be merged.

…ficient alternative (#128736) Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Chris Markiewicz <effigies@gmail.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>

picnixz · 2025-01-15T08:42:58Z

I can simplify pickletool as follows:

diff --git a/Lib/pickletools.py b/Lib/pickletools.py
index d9c4fb1e63e..f39819b217a 100644
--- a/Lib/pickletools.py
+++ b/Lib/pickletools.py
@@ -13,7 +13,6 @@
 import codecs
 import io
 import pickle
-import re
 import sys

 __all__ = ['dis', 'genops', 'optimize']
@@ -2225,7 +2224,7 @@ def assure_pickle_consistency(verbose=False):

     copy = code2op.copy()
     for name in pickle.__all__:
-        if not re.match("[A-Z][A-Z0-9_]+$", name):
+        if not name.isidentifier() or not name.isupper() or name.startswith('_'):
             if verbose:
                 print("skipping %r: it doesn't look like an opcode name" % name)
             continue

We would gain roughly 2 ms according to hyperfine (from 10.3ms to 8.2ms) for ./python -c import pickletools. The -X importtime benchmarks are not really intersting because assure_pickle_consistency is always executed when importing pickletools. So, I don't think we need to improve pickletools.

There are some modules that we can optimize:

csv: use one local re import (only) and makes import time 5 times faster.
difflib ❌ : needs refactorization to remove global re to make 2 times faster

There are other occurrences of the re module, but they are in submodules with lots of dependencies (and tracing the imports is a pain so I just gave up) or re is used at the module-level for compiling patterns (more important than reducing import time). Also, sometimes, just removing one import is not sufficient (for instance, some submodules of http import re, but in the end we likely have re imported since all submodules will be imported, or inspect can remove its re import but it wouldn't help as the module would already imported by another dependency that cannot be removed).

Note that, re is not the slow module actually. It's because it imports enum which is the module that is slow to import (the rest of the import time spent in re is due to functools which imports collections):

$ ./python -X importtime -c 'import re'
import time: self [us] | cumulative | imported package
...
import time:       496 |       3192 | site
import time:       224 |        224 | linecache
import time:       319 |        319 |     types
import time:      1818 |       2136 |   enum
import time:        69 |         69 |     _sre
import time:       238 |        238 |       re._constants
import time:       364 |        601 |     re._parser
import time:        89 |         89 |     re._casefix
import time:       301 |       1059 |   re._compiler
import time:       243 |        243 |       itertools
import time:       116 |        116 |       keyword
import time:        55 |         55 |         _operator
import time:       206 |        260 |       operator
import time:       125 |        125 |       reprlib
import time:        50 |         50 |       _collections
import time:       635 |       1426 |     collections
import time:        39 |         39 |     _functools
import time:       526 |       1990 |   functools
import time:       114 |        114 |   copyreg
import time:       443 |       5741 | re

We also always have

import time:      1105 |       1105 |     _collections_abc

but we can't be more efficient there since we construct lots of classes. In conclusion, I will only open one PR for optimizing csv. Then we can close the issue. The patch is as follows:

diff --git a/Lib/csv.py b/Lib/csv.py
index cd202659873..0a627ba7a51 100644
--- a/Lib/csv.py
+++ b/Lib/csv.py
@@ -63,7 +63,6 @@ class excel:
         written as two quotes
 """

-import re
 import types
 from _csv import Error, writer, reader, register_dialect, \
                  unregister_dialect, get_dialect, list_dialects, \
@@ -281,6 +280,7 @@ def _guess_quote_and_delimiter(self, data, delimiters):
         If there is no quotechar the delimiter can't be determined
         this way.
         """
+        import re

         matches = []
         for restr in (r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)', # ,".*?",

layday mentioned this issue May 8, 2024

Improve import time of various stdlib modules #109653

Closed

AlexWaygood added topic-importlib performance Performance or resource usage type-feature A feature request or enhancement labels May 8, 2024

danielhollas mentioned this issue Aug 5, 2024

Speed up import time by deferring inspect python/importlib_metadata#499

Merged

hugovk changed the title ~~Further improve import time of importlib.metadata~~ Improve import time of various stdlib modules Aug 6, 2024

hugovk added the 3.14 new features, bugs and security fixes label Aug 6, 2024

bedevere-app bot mentioned this issue Aug 6, 2024

gh-118761: Improve import time of pprint #122725

Merged

hugovk added a commit that referenced this issue Aug 7, 2024

gh-118761: Improve import time of pprint (#122725)

42d9bec

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024

pythongh-118761: Improve import time of pprint (python#122725)

0549dc7

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

bedevere-app bot mentioned this issue Aug 30, 2024

gh-118761: Speedup pathlib import by deferring shutil #123520

Merged

picnixz removed the 3.14 new features, bugs and security fixes label Aug 31, 2024

hugovk mentioned this issue Aug 31, 2024

gh-121423: Improve import time of socket by writing socket.errorTab as a constant and lazy import modules #121424

Merged

barneygale pushed a commit that referenced this issue Sep 1, 2024

gh-118761: Speedup pathlib import by deferring shutil (#123520)

2304774

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>

Wulian233 mentioned this issue Sep 23, 2024

gh-121468: Show asyncio information in pdb #124367

Draft

bedevere-app bot mentioned this issue Nov 18, 2024

gh-118761: Improve import time of mimetypes #126979

Merged

hugovk added a commit that referenced this issue Nov 21, 2024

gh-118761: Improve import time of mimetypes (#126979)

dc7a2b6

Wulian233 mentioned this issue Dec 2, 2024

Speed-up lazy heapq import in collections #127538

Merged

This was referenced Jan 11, 2025

gh-118761: improve import time for pickle #128732

Merged

gh-118761: substitute re import in base64.b16decode for a more efficient alternative #128736

Merged

gh-118761: improve import time for secrets #128738

Closed

ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025

pythongh-118761: Improve import time of mimetypes (python#126979)

883457a

bedevere-app bot mentioned this issue Jan 15, 2025

gh-118761: Improve import time for csv #128858

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve import time of various stdlib modules #118761

Improve import time of various stdlib modules #118761

layday commented May 8, 2024 •

edited by bedevere-app bot

Loading

hugovk commented Aug 5, 2024

layday commented Aug 5, 2024

danielhollas commented Aug 6, 2024 •

edited

Loading

danielhollas commented Aug 27, 2024

picnixz commented Aug 31, 2024

hugovk commented Jan 7, 2025

vstinner commented Jan 14, 2025

picnixz commented Jan 14, 2025 •

edited

Loading

picnixz commented Jan 15, 2025 •

edited

Loading

Improve import time of various stdlib modules #118761

Improve import time of various stdlib modules #118761

Comments

layday commented May 8, 2024 • edited by bedevere-app bot Loading

Feature or enhancement

Proposal:

Links to previous discussion of this feature:

For example:

Linked PRs

hugovk commented Aug 5, 2024

layday commented Aug 5, 2024

danielhollas commented Aug 6, 2024 • edited Loading

danielhollas commented Aug 27, 2024

picnixz commented Aug 31, 2024

hugovk commented Jan 7, 2025

vstinner commented Jan 14, 2025

picnixz commented Jan 14, 2025 • edited Loading

picnixz commented Jan 15, 2025 • edited Loading

layday commented May 8, 2024 •

edited by bedevere-app bot

Loading

danielhollas commented Aug 6, 2024 •

edited

Loading

picnixz commented Jan 14, 2025 •

edited

Loading

picnixz commented Jan 15, 2025 •

edited

Loading