-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-121423: Improve import time of socket
by writing socket.errorTab
as a constant and lazy import modules
#121424
Conversation
socket.errotTab
and lazy import selectors
socket.errorTab
and lazy import selectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that your benchmark is correct. Because you put import socket
into your setup part. So, the import time itself is not counted.
You are right, ↓ benchmark remove
1.14x faster |
Misc/NEWS.d/next/Library/2024-07-06-12-37-10.gh-issue-121423.vnxrl4.rst
Outdated
Show resolved
Hide resolved
…nxrl4.rst Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
socket.errorTab
and lazy import selectors
socket
by writing socket.errorTab
as a constant and lazy import of selectors
Misc/NEWS.d/next/Library/2024-07-06-12-37-10.gh-issue-121423.vnxrl4.rst
Outdated
Show resolved
Hide resolved
…nxrl4.rst Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
The benchmark only imports socket once. the other 999 imports are no-op as socket in in sys.modules. |
Am I right? I'm not familiar with benchmark, sorry I'm wrong |
You can use hyperfine to measure the entire Python process, in which case using a ❯ hyperfine -w 16 -u microsecond 'python -c pass'
Benchmark 1: python -c pass
Time (mean ± σ): 10921.2 µs ± 844.5 µs [User: 8745.1 µs, System: 2091.4 µs]
Range (min … max): 9720.3 µs … 15591.1 µs 242 runs
❯ hyperfine -w 16 -u microsecond 'python -c "import socket; socket.__all__"'
Benchmark 1: python -c "import socket; socket.__all__"
Time (mean ± σ): 14554.3 µs ± 1154.4 µs [User: 12061.3 µs, System: 2368.5 µs]
Range (min … max): 12826.1 µs … 20450.0 µs 206 runs |
Using a PGO+LTO build on macOS, so this will just be measuring the ❯ hyperfine --warmup 16 \
--prepare "git checkout socket" './python.exe -c "import socket; socket.__all__"' \
--prepare "git checkout 6239d41527d5977aa5d44e4b894d719bc045860e" './python.exe -c "import socket; socket.__all__"'
Benchmark 1: ./python.exe -c "import socket; socket.__all__"
Time (mean ± σ): 16.2 ms ± 0.8 ms [User: 13.1 ms, System: 2.5 ms]
Range (min … max): 15.5 ms … 21.7 ms 91 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.
Benchmark 2: ./python.exe -c "import socket; socket.__all__"
Time (mean ± σ): 17.3 ms ± 1.0 ms [User: 13.9 ms, System: 2.7 ms]
Range (min … max): 16.6 ms … 25.1 ms 88 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.
Summary
./python.exe -c "import socket; socket.__all__" ran
1.06 ± 0.08 times faster than ./python.exe -c "import socket; socket.__all__" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we have a general purpose issue for import time improvements -- #118761 -- shall we use that as an umbrella issue for this PR as well?
Using tuna to visualise import times (again, on macOS):
./python.exe -X importtime -c "import socket" 2> import.log && tuna import.log
Before: 3ms
Current PR: 2ms
Also: 1ms
Moving the couple of import array
s into two functions that call it:
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@@ -348,6 +349,9 @@ def makefile(self, mode="r", buffering=None, *, | |||
if hasattr(os, 'sendfile'): | |||
|
|||
def _sendfile_use_sendfile(self, file, offset=0, count=None): | |||
# Lazy import to improve module import time | |||
import selectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use a global variable to avoid the import at each call:
global selectors
if selectors is None:
import selectors
You can define the selectors variable to None at the top of the file with a comment:
# module imported lazily
selectors = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked several recent PRs making module loading lazy, but most of them do not use the pattern with a global variable (I suspect for readability reasons).
@vstinner Are there any other reasons besides the small performance improvement for using the global variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@serhiy-storchaka: Do you think that it's still useful in 2024 to use a global variable to avoid import selectors
at each function call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just do the import. It's a mere dict lookup without a conditional when the import has already happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just measure. It is more than a mere dict lookup (we also need to check that the module is not partially initialized, this adds 2 more dict lookups or like).
In this case, I think that the difference may be small even in comparison with a single os.fstat() call. The idiom proposed by @vstinner may be used when the whole function is very fast.
Can you also make the array module import lazy in this PR? |
I measured that the change saves 0.7 ms on timeit:
hyperfine:
|
Thank you! I just replaced the incorrect benchmark results in the PR with the correct ones |
@Wulian233 Please could you do this as well? |
Of course! I just lazy imported array and haven't done full benchmarking yet. I also mentioned this optimization in 3.14.rst, do you think it should be included? @hugovk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
What's New entry:
which results in a 30% speed up in standard pyperformance benchmarks.
This sentence can be misunderstood as "everything is 30% faster".
Which pyperformance benchmark is now faster?
@@ -348,6 +349,9 @@ def makefile(self, mode="r", buffering=None, *, | |||
if hasattr(os, 'sendfile'): | |||
|
|||
def _sendfile_use_sendfile(self, file, offset=0, count=None): | |||
# Lazy import to improve module import time | |||
import selectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just measure. It is more than a mere dict lookup (we also need to check that the module is not partially initialized, this adds 2 more dict lookups or like).
In this case, I think that the difference may be small even in comparison with a single os.fstat() call. The idiom proposed by @vstinner may be used when the whole function is very fast.
What do you think of this👀 |
For What's New, we can follow the example @AlexWaygood wrote for 3.13, which grouped a few import improvements together:
Let's do the same with #118761 in 3.14. We have two under that issue so far. I recommend we also group this PR under #118761 as well -> rename this PR title So we can follow something like that now, or leave it out for now and add a grouped summary later. |
Omit details that are not interested to the end user. I would also remove any mention from Ehat's New -- this is an insignificant change. I am sure that if you measure import time of different modules, you will find larger changes between versions (maybe even between bugfix releases) without anybody noticing. |
socket
by writing socket.errorTab
as a constant and lazy import of selectors
socket
by writing socket.errorTab
as a constant and lazy import of selectors
Okay, now this pr belongs to 118761. I revert the changes to 3.14 so we can write them together future when there are more modules optimizations :) |
socket
by writing socket.errorTab
as a constant and lazy import of selectors
socket
by writing socket.errorTab
as a constant and lazy import modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename to Misc/NEWS.d/next/Library/2024-07-06-12-37-10.gh-issue-118761.vnxrl4.rst
(containing gh-issue-118761
) and we're ready to merge :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#121423 is the right issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
socket
by writing socket.errorTab
as a constant and lazy import modulessocket
by writing socket.errorTab
as a constant and lazy import modules
Merged. Thanks for your nice enhancement @Wulian233. |
hyperfine:
≈ 30% faster
socket.errorTab
and lazy importselectors
#121423