[profiling] Add profiling instrumentation centered on the eval loop #144
Replies: 13 comments
-
This should mostly be part of the job of the interpreter generator, see #5, IMO. Adding options to the interpreter generator to insert the instrumentation will be a lot more pleasant than dealing with C macros. |
Beta Was this translation helpful? Give feedback.
-
Disregarding how we instrument the code, are there any other significant boundaries we should consider? |
Beta Was this translation helpful? Give feedback.
-
As an exercise to better familiarize myself with the knowledge domain, I spent a little time adding some minimal instrumentation (and logging) to the eval loop, as well as a script to analyze the results. See my eval-loop-profiling branch. The instrumentation generates a log file with a minimal format:
For reference, a Python script that has a total runtime of about a second generates roughly 16MB of trace data. By default, the analysis script makes the log more human-friendly:
The script can also provide a summary: Sorted by Count
Sorted by Average Elapsed Time
|
Beta Was this translation helpful? Give feedback.
-
This effort proved to be quite insightful for me. This isn't something to pursue further right now but will be worth revisiting later when we've added code generation for the eval loop. |
Beta Was this translation helpful? Give feedback.
-
I think this is worth revisiting. This is relevant to both startup and general execution. The text summary sounds useful, could we also get flamegraphs? I think they are a great way to visualize where time is being spent. We should also generate results using |
Beta Was this translation helpful? Give feedback.
-
FWIW, I generated a flamegraph for https://raw.githubusercontent.com/faster-cpython/ideas/main/flamegraphs/vm-freq999.svg If you download the file locally you can interact with it in your browser. The commands (after installing dependencies) look something like this: #git clone https://github.com/brendangregg/FlameGraph ../FlameGraph
sample_rate_orig=$(cat /proc/sys/kernel/perf_event_max_sample_rate)
max_stack_orig=$(cat /proc/sys/kernel/perf_event_max_stack)
sudo sysctl -w kernel.perf_event_max_sample_rate=100000
sudo sysctl -w kernel.perf_event_max_stack=1000
sudo perf record -F max -a -g -- ./python -c 'pass'
sudo perf script > out.perf
../FlameGraph/stackcollapse-perf.pl out.perf > out.folded
../FlameGraph/flamegraph.pl out.folded > flamegraph.svg
sudo sysctl -w kernel.perf_event_max_sample_rate=$sample_rate_orig
sudo sysctl -w kernel.perf_event_max_stack=$max_stack_orig See https://github.com/faster-cpython/tools/tree/main/scripts/run-perf.sh for the full operation, including installing dependencies. |
Beta Was this translation helpful? Give feedback.
-
Greetings from the Central European Timezone. :-)
Is there somewhere that describes what this particular flamegraph means?
(I've seen explanations of flamegraphs in general, but I'm not familiar
with the tool or methodology used to generate this particular one.)
|
Beta Was this translation helpful? Give feedback.
-
I ran what amounts to PyPerformance's bench_expand, bench_integrate, bench_sum, and bench_str benchmarks in a loop for about a minute with a PGO ("PGUpdate") build, and Visual Studio's Performance Profiler got me some results. The hottest 15 functions:
I also got a Line-by-line-profile of _PyEval_EvalFrameDefault, and VS shows it like this: I couldn't find a way to make VS dump the output as text, but here are 17 (low-quality, but at least mostly legible) screenshots of the results on the whole function |
Beta Was this translation helpful? Give feedback.
-
Flamegraphs are relatively new to me. After a little digging, I found a good resource: https://www.brendangregg.com/flamegraphs.html. This pointed me to https://github.com/brendangregg/FlameGraph, which is what I used to produce the flame graph. As noted above, it involved generating data in a specific way using the linux "perf" tool and then passing that data through some scripts to produce the SVG file. (Note that the SVG is better/interactive if you download it and then open the downloaded copy in your browser.) As to what it means (from the web page):
|
Beta Was this translation helpful? Give feedback.
-
I just uploaded better flame graphs (10khz and 100khz), which I generated:
|
Beta Was this translation helpful? Give feedback.
-
Observations from the flame graphs:
|
Beta Was this translation helpful? Give feedback.
-
I've opened #75 to dive into runtime startup/shutdown profiling. I also created #76 to do the same for our target workloads. |
Beta Was this translation helpful? Give feedback.
-
We can circle back later (if we need it) to the specific instrumentation described in the original post above. |
Beta Was this translation helpful? Give feedback.
-
Here are the key boundaries:
_PyEval_EvalFrameDefault()
_PyEval_EvalFrameDefault()
and top of loop (setting up)Beta Was this translation helpful? Give feedback.
All reactions