Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve import time of various stdlib modules #118761

Open
layday opened this issue May 8, 2024 · 9 comments
Open

Improve import time of various stdlib modules #118761

layday opened this issue May 8, 2024 · 9 comments
Labels
performance Performance or resource usage topic-importlib type-feature A feature request or enhancement

Comments

@layday
Copy link

layday commented May 8, 2024

Feature or enhancement

Proposal:

Following on from #109653, further improvements can be made to import times.

Links to previous discussion of this feature:

https://discuss.python.org/t/deferred-computation-evalution-for-toplevels-imports-and-dataclasses/34173

For example:

importlib.metadata is often used for tasks that need to happen at import, e.g. to enumerate/load entry point plug-ins, so it might be worth seeing if we can cut down its own import time a bit more.

importlib.metadata imports zipfile at the top for a function that won't be called in the vast majority of cases. It also imports importlib.abc, which in turn imports importlib.resources, to subclass an ABC with a single, non-abstract method - I assume redefining the method in importlib.metadata would be harmless. Some other less frequently-used imports which are only accessed once or twice, such as json, could also be tucked away in their calling functions.

Linked PRs

@AlexWaygood AlexWaygood added topic-importlib performance Performance or resource usage type-feature A feature request or enhancement labels May 8, 2024
@hugovk
Copy link
Member

hugovk commented Aug 5, 2024

@layday Is it okay if I repurpose this issue as an "Improve import time of various stdlib modules" like #109653 but for 3.14?

I've got some pprint improvements, and if we have importlib.metadata and some others, we can group them under the same umbrella issue like last time.

@layday
Copy link
Author

layday commented Aug 5, 2024

Sure!

@danielhollas
Copy link
Contributor

danielhollas commented Aug 6, 2024

I've opened a PR over at the importlib_metadata repo that avoids importing inspect. python/importlib_metadata#499

importlib.metadata imports zipfile at the top for a function that won't be called in the vast majority of cases.
Some other less frequently-used imports which are only accessed once or twice, such as json, could also be tucked away in their calling functions.

@layday were you planning on tackling these?

It also imports importlib.abc, which in turn imports importlib.resources, to subclass an ABC with a single, non-abstract method

This seems to be solved on main, importlib.abc no longer imports importlib.resources.

@hugovk hugovk changed the title Further improve import time of importlib.metadata Improve import time of various stdlib modules Aug 6, 2024
@hugovk hugovk added the 3.14 new features, bugs and security fixes label Aug 6, 2024
hugovk added a commit that referenced this issue Aug 7, 2024
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@danielhollas
Copy link
Contributor

I've opened a PR over at the importlib_metadata repo that avoids importing inspect. python/importlib_metadata#499

This has been merged and released in version 8.4 of importlib_metadata 🎉

importlib.metadata imports zipfile at the top for a function that won't be called in the vast majority of cases. It also imports importlib.abc, which in turn imports importlib.resources, to subclass an ABC with a single, non-abstract method - I assume redefining the method in importlib.metadata would be harmless. Some other less frequently-used imports which are only accessed once or twice, such as json, could also be tucked away in their calling functions.

I've submitted python/importlib_metadata#502 that defers zip import, and python/importlib_metadata#503 which defers json and platform.

@picnixz picnixz removed the 3.14 new features, bugs and security fixes label Aug 31, 2024
@picnixz
Copy link
Member

picnixz commented Aug 31, 2024

(removing the 3.14 label since features always target the main branch)

barneygale pushed a commit that referenced this issue Sep 1, 2024
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@hugovk
Copy link
Member

hugovk commented Jan 7, 2025

Note for when documenting this in What's New in Python 3.14, can also include #128559 / #128560.

ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
picnixz added a commit that referenced this issue Jan 14, 2025
Importing `pickle` is now roughly 25% faster.

Importing the `re` module is no longer needed and
thus `re` is no more implicitly exposed as `pickle.re`.

---------

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@vstinner
Copy link
Member

@picnixz: I suggest to close this issue.

@picnixz
Copy link
Member

picnixz commented Jan 14, 2025

There is also pickletools that I want to improve (because we can remove the dependency on re as well) and base64 that remains to be merged.

AA-Turner added a commit that referenced this issue Jan 14, 2025
…ficient alternative (#128736)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@picnixz
Copy link
Member

picnixz commented Jan 15, 2025

I can simplify pickletool as follows:

diff --git a/Lib/pickletools.py b/Lib/pickletools.py
index d9c4fb1e63e..f39819b217a 100644
--- a/Lib/pickletools.py
+++ b/Lib/pickletools.py
@@ -13,7 +13,6 @@
 import codecs
 import io
 import pickle
-import re
 import sys

 __all__ = ['dis', 'genops', 'optimize']
@@ -2225,7 +2224,7 @@ def assure_pickle_consistency(verbose=False):

     copy = code2op.copy()
     for name in pickle.__all__:
-        if not re.match("[A-Z][A-Z0-9_]+$", name):
+        if not name.isidentifier() or not name.isupper() or name.startswith('_'):
             if verbose:
                 print("skipping %r: it doesn't look like an opcode name" % name)
             continue

We would gain roughly 2 ms according to hyperfine (from 10.3ms to 8.2ms) for ./python -c import pickletools. The -X importtime benchmarks are not really intersting because assure_pickle_consistency is always executed when importing pickletools. So, I don't think we need to improve pickletools.

There are some modules that we can optimize:

  • csv: use one local re import (only) and makes import time 5 times faster.
  • difflib ❌ : needs refactorization to remove global re to make 2 times faster

There are other occurrences of the re module, but they are in submodules with lots of dependencies (and tracing the imports is a pain so I just gave up) or re is used at the module-level for compiling patterns (more important than reducing import time). Also, sometimes, just removing one import is not sufficient (for instance, some submodules of http import re, but in the end we likely have re imported since all submodules will be imported, or inspect can remove its re import but it wouldn't help as the module would already imported by another dependency that cannot be removed).

Note that, re is not the slow module actually. It's because it imports enum which is the module that is slow to import (the rest of the import time spent in re is due to functools which imports collections):

$ ./python -X importtime -c 'import re'
import time: self [us] | cumulative | imported package
...
import time:       496 |       3192 | site
import time:       224 |        224 | linecache
import time:       319 |        319 |     types
import time:      1818 |       2136 |   enum
import time:        69 |         69 |     _sre
import time:       238 |        238 |       re._constants
import time:       364 |        601 |     re._parser
import time:        89 |         89 |     re._casefix
import time:       301 |       1059 |   re._compiler
import time:       243 |        243 |       itertools
import time:       116 |        116 |       keyword
import time:        55 |         55 |         _operator
import time:       206 |        260 |       operator
import time:       125 |        125 |       reprlib
import time:        50 |         50 |       _collections
import time:       635 |       1426 |     collections
import time:        39 |         39 |     _functools
import time:       526 |       1990 |   functools
import time:       114 |        114 |   copyreg
import time:       443 |       5741 | re

We also always have

import time:      1105 |       1105 |     _collections_abc

but we can't be more efficient there since we construct lots of classes. In conclusion, I will only open one PR for optimizing csv. Then we can close the issue. The patch is as follows:

diff --git a/Lib/csv.py b/Lib/csv.py
index cd202659873..0a627ba7a51 100644
--- a/Lib/csv.py
+++ b/Lib/csv.py
@@ -63,7 +63,6 @@ class excel:
         written as two quotes
 """

-import re
 import types
 from _csv import Error, writer, reader, register_dialect, \
                  unregister_dialect, get_dialect, list_dialects, \
@@ -281,6 +280,7 @@ def _guess_quote_and_delimiter(self, data, delimiters):
         If there is no quotechar the delimiter can't be determined
         this way.
         """
+        import re

         matches = []
         for restr in (r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)', # ,".*?",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-importlib type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants