PEP 669: Low Impact Instrumentation and Monitoring for CPython. #2070

markshannon · 2021-09-13T15:42:35Z

A PEP for low impact debugging, profiling, coverage, etc. using quickening.

brettcannon

Please update .github/CODEOWNERS as part of this PR.

brandtbucher

Very cool! Found a few possible typos that you might want to address before merging:

brandtbucher · 2021-09-13T22:57:25Z

pep-0669.rst

+
+Using a profiler or debugger in CPython can have a severe impact on
+performance. Slowdowns by an order of magnitude are not uncommon.
+It does not have this bad.


Suggested change

It does not have this bad.

It does not have to be this bad.

brandtbucher · 2021-09-13T22:59:05Z

pep-0669.rst

+expect to be active for the entire lifetime of the program.
+
+Monitoring can occur at any point in the program's life and be applied
+anywhere in the program. Monitoring points are expected to few.


Suggested change

anywhere in the program. Monitoring points are expected to few.

anywhere in the program. Monitoring points are expected to be few.

brandtbucher · 2021-09-13T23:05:58Z

pep-0669.rst

+onto the control flow graph of the code object.
+``BRANCH``, ``JUMPBACK``, ``START`` and ``RESUME`` events will inform which
+basic blocks have started to execute.
+The ``RAISE`` event with mark any blocks that did not complete.


Suggested change

The ``RAISE`` event with mark any blocks that did not complete.

The ``RAISE`` event will mark any blocks that did not complete.

brandtbucher · 2021-09-13T23:07:16Z

pep-0669.rst

+* ENTER
+* EXIT
+* UNWIND
+* C_CALL
+* C_RETURN
+* RAISE


Suggested change

* ENTER

* EXIT

* UNWIND

* C_CALL

* C_RETURN

* RAISE

* ``ENTER``

* ``EXIT``

* ``UNWIND``

* ``C_CALL``

* ``C_RETURN``

* ``RAISE``

gvanrossum

Sorry for the very chatty review. I read the doc in a single pass and remarked upon anything that wasn't clear to me from what came before.

gvanrossum · 2021-09-13T20:41:52Z

pep-0669.rst

+Using the new API, code run under a debugger on 3.11 should easily outperform
+code run without a debugger on 3.10.


Presumably only when the debugger is not actually invoked? I can believe it will be faster for code in a frame containing a breakpoint that is never hit. But I cannot believe it would be faster if some Python code would be run during e.g. line tracing or when evaluating a condition each time a breakpoint is hit.

gvanrossum · 2021-09-13T20:45:15Z

pep-0669.rst

+Motivation
+==========
+
+Developers should not have to pay an unreasonable cost to use debuggers,
+profilers and other similar tools.
+
+C++ and Java developers expect to be able to run a program at full speed
+(or very close to it) under a debugger.
+Python developers should expect that too.


This section is just rhetoric. There's already a bunch of that in the Abstract ("It does not have this bad", "at low cost", "easily outperform") -- I'd like to hear more about what approach this is actually taking sooner.

It is the "motivation" section; it should be motivating 🙂
I've toned down the abstract to compensate.

gvanrossum · 2021-09-13T20:48:41Z

pep-0669.rst

+The quickening mechanism provided by PEP 659 provides a way to dynamically
+modify executing Python bytecode. These modifications have no cost beyond
+the parts of the code that are modified and a relatively low cost to those
+parts that are modified. We can leverage this to provide an efficient
+mechanism for instrumentation and monitoring that was not possible in 3.10
+or earlier.


This is something that you could lead with.

gvanrossum · 2021-09-13T20:51:08Z

pep-0669.rst

+Instrumentation occurs early in a program's life cycle and persists through
+the lifetime of the program. It is expected to be pervasive, but fixed.
+Instrumentation is designed to support profiling and coverage tools that 
+expect to be active for the entire lifetime of the program.


What if I want to profile only a specific function call? It that not covered? I suppose it is, but your simplified description of instrumentation leaves little room for it.

To profile calls to a specific function foo:

instrumentation.instrument(foo.__code__, instrumentation.ENTER) instrumentation.register(instrumentation.ENTER, profiler_func)

gvanrossum · 2021-09-13T21:01:33Z

pep-0669.rst

+
+Monitoring can occur at any point in the program's life and be applied
+anywhere in the program. Monitoring points are expected to few.
+The capabilities of monitoring are a superset of that of profiling,


Now I'm confused. The use cases that I'm familiar with are coverage, profiling and debugging (both stepping through code and breakpoints). How do these use cases map to the two parts, instrumentation and monitoring? You imply there's a mapping, but I'm confused by what the mapping is supposed to be, since debugging seems to fall in neither category.

Debugging is a form of monitoring. The debugger is monitoring the program being run.
Any ideas for a better term than monitoring"? I don't want to use "debugging" as that is just a particular use case.

I think the terms are fine, all I'm asking for is that you clarify the mapping between the two concepts and the three or more use cases. 'Monitoring is a superset of profiling" doesn't do that for me, but "A profiler would use the monitoring API" would. Except your example of profiling a function (in the comment above) uses instrumentation, not profiling. So I'm still confused about the relationship between instrumentation and monitoring, and between those and the use cases.

I don't think the draft terms are fine, as I initially thought the "instrumentation API" was going to be "How clients specify what to monitor" and the monitoring API was going to be "How clients are notified of monitored events".

Even after having the distinction explained, I don't think I'd be able to remember "instrumentation is low overhead, monitoring is high overhead". If anything, I would expect them to be the other way around due to the way somewhat similar terminology gets used in hardware testing (non-intrusively monitoring ordinary device outputs vs instrumenting the test rig with additional data collection points).

For the actual distinction being made, my suggestions would be:

"passive monitoring API" (pervasive collection of events, low overhead; e.g. coverage, profiling)

"dynamic monitoring API" (selective collection of events, potentially high overhead; e.g. break points, watching local variables for changes)

gvanrossum · 2021-09-13T23:35:59Z

pep-0669.rst

+To avoid excessive overhead, a single call should be made to
+``instrumentation.insert_monitors`` passing all the offsets at once.
+
+Breakpoints can suspended with ``instrumentation.monitor_off``.


Does that re-specialize the instruction at that offset?

Regardless of the answer it would seem this is deleting a breakpoint, not just suspending it. (If there was a difference, I'd thing that suspending makes it easy to re-enable, but there doesn't seem to be a semantic difference in this case, so I prefer the simpler "deleting".)

Does that re-specialize the instruction at that offset?

Maybe, but probably not.

Turning a monitor off and removing it have very different performance profiles.

Turning a monitor off is cheap, but will interfere with optimization.
Removing a monitor is very expensive, but will ultimately allow full optimization.

It is up to the implementer of the debugger which to use, but if the user of the debugger turns a breakpoint off, rather than removing it, it implies they may want to turn it on again soon.

gvanrossum · 2021-09-13T23:37:26Z

pep-0669.rst

+Breakpoints can suspended with ``instrumentation.monitor_off``.
+
+Debuggers can break on exceptions being raised by registering a callable 
+for ``RAISE``:


So the RAISE event applies to any reason an exception is raised, not just a raise statement, right? This could bear clarification.

gvanrossum · 2021-09-13T23:38:28Z

pep-0669.rst

+onto the control flow graph of the code object.
+``BRANCH``, ``JUMPBACK``, ``START`` and ``RESUME`` events will inform which
+basic blocks have started to execute.
+The ``RAISE`` event with mark any blocks that did not complete.


gvanrossum · 2021-09-13T23:39:48Z

pep-0669.rst

+* UNWIND
+* C_CALL
+* C_RETURN
+* RAISE


Why is RAISE needed here?

To match C_CALL if the callee raises.

gvanrossum · 2021-09-13T23:41:19Z

pep-0669.rst

+References
+==========
+
+[A collection of URLs used as references through the PEP.]


You can remove this if you're not planning to add any references.

gvanrossum · 2021-09-13T23:42:47Z

(I should add that, like Brandt, I think this is very cool, and a nice win for Quickening.)

fabioz

Hi Mark,

Sorry for the long comment here... I couldn't really find a way to fit those as separate comments to specific points of the PEP.

So, I currently have a mixed feelings regarding this approach... I do see some potential performance wins here, but I think it'll still depend a lot on the actual implementation (so, I'm a bit wary that the abstract starts talking about the current slowdowns when there are currently already alternatives to do something similar with the existing frame evaluation).

i.e.: using the python frame evaluation mechanism to add breakpoints by changing bytecode is already done in debuggers nowadays and debuggers will still need to register a callback to know when a Python function will be entered and then monitor changes and potentially reapply the monitoring on breakpoint changes, so, I'm not 100% sure this will be a win performance wise if you're actually comparing with that case.

It's also a bit troublesome that it says that modifying the monitoring will be slow especially if this will be the way to make stepping after a breakpoint is reached. It could potentially be a net-negative if that's the case -- as far as I understood, for stepping debuggers should track all code entered and add monitoring for all line changes, so if that's slow compared to having the tracing function, the final result would be a net-negative when stepping (if I have misunderstood, please let me know).

As a note, I do see some cases where the approach presented here may be faster, especially if exceptions have to be tracked (which does require a tracing function even when using programmatic breakpoints in the current approach), but I'm also not sure how much would be due to improvements here or because the current version of tracing provided by CPython calls local frame updates at each step due -- see: https://bugs.python.org/issue42197 -- presumably having that fixed would automatically make debuggers in general an order of magnitude faster without these changes -- although specialized debuggers probably know this caveat already and do provide their own C-level tracing function ;)

It may be nice to have some reference implementation with some performance numbers comparing approaches, especially when having to do step-handling to know this won't be a net negative...

So, let me try to elaborate a bit in terms of what I see as pros/cons:

Pros:

Tracking of unhandled and handled exceptions through a single instrumentation point (it's hard to detect whether an unhandled exception is being the point of a return of a function right now and this would make the life of debugger implementors dealing with handled/unhandled exceptions much nicer).
Dealing with the current frame evaluation mode to be able to modify bytecode right now works, but the API isn't that nice (the instrumentation does seem like a nicer API).

Cons:

if instrumenting line breakpoints for step-tracking is slow after a breakpoint is hit, it may not be feasible for debuggers to use it.
Performance-wise, I see a similar overhead in debuggers that are currently optimized to deal with the frame eval mode (although they may need to fallback to tracing on breakpoint changes for the frames that are already currently running or when tracking exceptions, so, in a few situations this could be a win).

Some questions:

Will this tracing be disabled on recursion errors as happens with the current tracing function set in sys.settrace? This is a major annoyance because debuggers that do use tracing can get lost right now, so, it'd be nice to know how the recursion limit would work for debuggers in this case.
It seems there are no mentions on dealing with particular threads... it'd be nice if monitoring could be completely removed for some target thread (for instance, debuggers using internal threads could completely disable the tracing for a given thread -- although note that in general I do want to instrument for all threads, but it'd be nice to disable it specifically for some threads so that those don't have any additional overhead).
The PEP should probably make it clear that instrumentation calls will still be called when code is called from another instrumentation point (i.e.: when a user is in a breakpoint and calls custom evaluation code, a debugger may still want to hit breakpoints in that case).
Will it be possible for debuggers to change the line being executed? i.e.: one feature available for debuggers right now is that inside a tracing function it's possible to set the next line to be executed (which is a common feature) and there's no mention of that here (if that's not possible, debuggers will still have to fallback to tracing to actually handling breakpoints as is the case right now when using the frame evaluation mode).
One nice thing on java is the ability to drop frames and restart the execution with a changed code object (so, for instance, it's possible to change the code and restart the execution of a frame), do you see any way for that to work here?

markshannon · 2021-09-15T13:53:01Z

Hi Mark,

Sorry for the long comment here... I couldn't really find a way to fit those as separate comments to specific points of the PEP.

So, I currently have a mixed feelings regarding this approach... I do see some potential performance wins here, but I think it'll still depend a lot on the actual implementation (so, I'm a bit wary that the abstract starts talking about the current slowdowns when there are currently already alternatives to do something similar with the existing frame evaluation).

Correct me if I'm wrong, but a PEP 523 based debugger still needs to check if the callee function contains a breakpoint on every call, and replace the bytecode which will prevent quickening and specialization.

OOI, what's the slowdown of running under the PyDev debugger with a few breakpoints set (but not hit) for 3.9/3.10?

i.e.: using the python frame evaluation mechanism to add breakpoints by changing bytecode is already done in debuggers nowadays and debuggers will still need to register a callback to know when a Python function will be entered and then monitor changes and potentially reapply the monitoring on breakpoint changes, so, I'm not 100% sure this will be a win performance wise if you're actually comparing with that case.

With this PEP there will be no need for the debugger to change bytecodes, just set a checkpoint and let the VM handle it.
If you need to know when a particular function is entered then that can be handled using instrumentation.

Not only should it make your life much easier, it is faster because the VM knows which optimizations it can still apply.

It's also a bit troublesome that it says that modifying the monitoring will be slow especially if this will be the way to make stepping after a breakpoint is reached. It could potentially be a net-negative if that's the case -- as far as I understood, for stepping debuggers should track all code entered and add monitoring for all line changes, so if that's slow compared to having the tracing function, the final result would be a net-negative when stepping (if I have misunderstood, please let me know).

Stepping in a debugger requires human input, which takes hundreds of milliseconds at the least. It won't be any slower.

As a note, I do see some cases where the approach presented here may be faster, especially if exceptions have to be tracked (which does require a tracing function even when using programmatic breakpoints in the current approach), but I'm also not sure how much would be due to improvements here or because the current version of tracing provided by CPython calls local frame updates at each step due -- see: https://bugs.python.org/issue42197 -- presumably having that fixed would automatically make debuggers in general an order of magnitude faster without these changes -- although specialized debuggers probably know this caveat already and do provide their own C-level tracing function ;)

It may be nice to have some reference implementation with some performance numbers comparing approaches, especially when having to do step-handling to know this won't be a net negative...

So, let me try to elaborate a bit in terms of what I see as pros/cons:

Pros:
* Tracking of unhandled and handled exceptions through a single instrumentation point (it's hard to detect whether an unhandled exception is being the point of a return of a function right now and this would make the life of debugger implementors dealing with handled/unhandled exceptions much nicer).

* Dealing with the current frame evaluation mode to be able to modify bytecode right now works, but the API isn't that nice (the instrumentation does seem like a nicer API).
Cons:
* if instrumenting line breakpoints for step-tracking is slow after a breakpoint is hit, it may not be feasible for debuggers to use it.

The alternative is to be slow all the time, as debuggers currently are. Slow immediately after a breakpoint is much better than slow all the time.

* Performance-wise, I see a similar overhead in debuggers that are currently optimized to deal with the frame eval mode (although they may need to fallback to tracing on breakpoint changes for the frames that are already currently running or when tracking exceptions, so, in a few situations this could be a win).

Using PEP 523 prevents specializing and flattening of Python-to-Python calls. Rewriting the bytecode prevents quickening and all other specializations. A checkpoint only prevents specialization at the checkpoint. A debugger using checkpoints for breakpoints will easily outperform a PEP 523 based debugger, assuming both are implemented to the same standard.

Some questions:

* Will this tracing be disabled on recursion errors as happens with the current tracing function set in `sys.settrace`? This is a major annoyance because debuggers that do use tracing can get lost right now, so, it'd be nice to know how the recursion limit would work for debuggers in this case.

There is no tracing to be disabled, just checkpoints. The recursion limit will be unchanged.
You might need to temporarily increase the recursion limit in your callback functions when handling RecursionErrors.

Because we will flatten Python-to-Python calls in 3.11 (that's the plan anyway), you should have plenty of headroom, even on MacOS.

* It seems there are no mentions on dealing with particular threads... it'd be nice if monitoring could be completely removed for some target thread (for instance, debuggers using internal threads could completely disable the tracing for a given thread -- although note that in general I do want to instrument for all threads, but it'd be nice to disable it specifically for some threads so that those don't have any additional overhead).

There is no way to disable callbacks for specific threads. That is left to the user.
If you want to only break in a particular thread, then you'll need to check the thread in the callback.

* The PEP should probably make it clear that instrumentation calls will still be called when code is called from another instrumentation point (i.e.: when a user is in a breakpoint and calls custom evaluation code, a debugger may still want to hit breakpoints in that case).

I'll add that when I get a chance.

* Will it be possible for debuggers to change the line being executed? i.e.: one feature available for debuggers right now is that inside a tracing function it's possible to set the next line to be executed (which is a common feature) and there's no mention of that here (if that's not possible, debuggers will still have to fallback to tracing to actually handling breakpoints as is the case right now when using the frame evaluation mode).

We will probably need to allow frame.f_lineno = ... to work outside of trace functions, maybe add a special context manager to allow it.

* One nice thing on java is the ability to drop frames and restart the execution with a changed code object (so, for instance, it's possible to change the code and restart the execution of a frame), do you see any way for that to work here?

That is outside the scope of this PEP. It seems a strange feature, is it used much?

fabioz · 2021-09-16T12:18:41Z

Hi Mark,
Sorry for the long comment here... I couldn't really find a way to fit those as separate comments to specific points of the PEP.
So, I currently have a mixed feelings regarding this approach... I do see some potential performance wins here, but I think it'll still depend a lot on the actual implementation (so, I'm a bit wary that the abstract starts talking about the current slowdowns when there are currently already alternatives to do something similar with the existing frame evaluation).

Correct me if I'm wrong, but a PEP 523 based debugger still needs to check if the callee function contains a breakpoint on every call, and replace the bytecode which will prevent quickening and specialization.

Well, the code object is cached and then it's set in the frame before the frame executes, but in a way, it's always the same code object which is set in a new frame, so, I'm not sure why that would prevent specialization of said frame/code object (I'm not that familiar as you're in the area though, so, if you say it'll prevent quickening and specialization I believe you).

Also, you lost me a bit... as far as I understood the debugger will still need to instrument things to check and update the monitors on every call so that it can reach all the code objects that'll be executed (there's a fast way out in which nothing is done after a few checks, but that's true for the existing approach too) -- I don't really see any other way (i.e.: how else would the debugger monitor things such as anonymous functions inside of functions?) or are you thinking about a different approach? So, the approach (as far as I can see) is mostly the same and the overhead would also be mostly the same (for line breakpoints)...

OOI, what's the slowdown of running under the PyDev debugger with a few breakpoints set (but not hit) for 3.9/3.10?

I just did some experiments in this (on 3.9).

I created a case which is the worst-case for the debugger, which is a case where method-calls dominate the program time (since that's the place it does all the computation, afterwards it just waits for a breakpoint to be hit) and then the method does nothing:

def method2():
    i = 1

def method():

    for i in range(200000):
        method2()

        if False:
            # Unreachable breakpoint here
            pass

def caller():
    start_time = time.time()
    method()
    print('TotalTime>>%s<<' % (time.time()-start_time,))

So, in my machine running without debugging this takes around 0.021s.
Just with the frame eval function (which does add the programmatic breakpoints when a call is made), it takes around 0.087s.
If the tracing is set (to a no-op) it takes around 0.156s.
If python with sys.settrace with a function that returns itself (without really doing anything and not under the debugger), it takes around 0.27s

Keep in mind that this is the absolute worse case -- but in a way also the simplest only with line breakpoints -- if exception breakpoints are added, then the debugger can't forgo tracing for all the events and the time goes up to 0.91s, so, just the fact that exceptions would be notified from VM events could be a huge win in that particular case.

Now, having given those numbers (on the worse case), from what I've seen in the general case (on real world programs), the debugger makes programs 1.5-4 times slower when the user doesn't have exception breakpoints and 8-10 times slower if he does enable the use of exception breakpoints -- and for this case this PEP would help quite a bit, so, it's a good +1 from me for this case alone ;)

Will it be possible for debuggers to change the line being executed? i.e.: one feature available for debuggers right now is that inside a tracing function it's possible to set the next line to be executed (which is a common feature) and there's no mention of that here (if that's not possible, debuggers will still have to fallback to tracing to actually handling breakpoints as is the case right now when using the frame evaluation mode).
We will probably need to allow frame.f_lineno = ... to work outside of trace functions, maybe add a special context manager to allow it.

Seems good to me (it'd be good if this was a part of the PEP or if it's addressed in a related PEP -- this is a major feature in the debugger and without this the debugger wouldn't be able to use most of this PEP).

* One nice thing on java is the ability to drop frames and restart the execution with a changed code object (so, for instance, it's possible to change the code and restart the execution of a frame), do you see any way for that to work here?

That is outside the scope of this PEP. It seems a strange feature, is it used much?

In Java land it's used quite a lot to hot-swap code during a debug session, in Python users have to get by by calling custom evaluate functions while in a breakpoint and then setting the frame.f_lineno to try to mimic it if the code wouldn't change much (think of setting a variable and going inside some if), although many times it's not as convenient and there are cases where it's not possible to use that workflow as a replacement... Anyways, I agree, this is a feature outside of this PEP.

markshannon · 2021-09-16T15:49:27Z

Hi Mark,
Sorry for the long comment here... I couldn't really find a way to fit those as separate comments to specific points of the PEP.
So, I currently have a mixed feelings regarding this approach... I do see some potential performance wins here, but I think it'll still depend a lot on the actual implementation (so, I'm a bit wary that the abstract starts talking about the current slowdowns when there are currently already alternatives to do something similar with the existing frame evaluation).

Correct me if I'm wrong, but a PEP 523 based debugger still needs to check if the callee function contains a breakpoint on every call, and replace the bytecode which will prevent quickening and specialization.

Well, the code object is cached and then it's set in the frame before the frame executes, but in a way, it's always the same code object which is set in a new frame, so, I'm not sure why that would prevent specialization of said frame/code object (I'm not that familiar as you're in the area though, so, if you say it'll prevent quickening and specialization I believe you).

I hadn't realized that you cached the whole object (although that makes perfect sense), in which case most specializations would still work. However, PEP 523 will prevent specializations across calls.

Also, you lost me a bit... as far as I understood the debugger will still need to instrument things to check and update the monitors on every call so that it can reach all the code objects that'll be executed (there's a fast way out in which nothing is done after a few checks, but that's true for the existing approach too) -- I don't really see any other way (i.e.: how else would the debugger monitor things such as anonymous functions inside of functions?) or are you thinking about a different approach? So, the approach (as far as I can see) is mostly the same and the overhead would also be mostly the same (for line breakpoints)...

Because breakpoints are defined on the code objects, when setting a breakpoint you can do a one time search for the relevant code object instead of checking every call.

E.g.

def f(seq):
    return [
         x * x for x in seq    # Break point here
    ]

You'll need to be able to find the code object for the list comprehension. I assume that you can already do that by collecting line ranges on code objects with an import hook. Once you have the code object, you can find the offsets for the line with co.co_lines() and set the monitor. Once that is done the VM takes care of it.

co = find_code_object(file, line)
offsets = [offset for offset, _, l in co.co_lines() if line == l]
instrumentation.insert_monitors(co, *offsets)

OOI, what's the slowdown of running under the PyDev debugger with a few breakpoints set (but not hit) for 3.9/3.10?

I just did some experiments in this (on 3.9).

Thanks, those numbers aren't as bad as I was expecting, but that's still quite a slowdown compared to normal execution.

fabioz · 2021-09-16T17:37:46Z

You'll need to be able to find the code object for the list comprehension. I assume that you can already do that by collecting line ranges on code objects with an import hook. Once you have the code object, you can find the offsets for the line with co.co_lines() and set the monitor. Once that is done the VM takes care of it.
co = find_code_object(file, line)
offsets = [offset for offset, _, l in co.co_lines() if line == l]
instrumentation.insert_monitors(co, *offsets)

Well, an import hook isn't enough to collect all code-objects -- it may work on some cases, but I can think of cases where the user creates code to run without using an import -- for instance, he could create it in C/C++ and then execute it and we'd still want to track that or if a breakpoint is set after the import is done... so, the only reasonable way I see it working is checking the code right before it's executed (where some caching would be used to try to discover the fast path where a code object doesn't need to be tracked). Anyways, from the API you're proposing this would still be possible, so, it works for me ;)

As a note, not sure if you've seen, but maybe you're aware of https://mail.python.org/archives/list/python-dev@python.org/thread/CP2PTFCMTK57KM3M3DLJNWGO66R5RVPB/ (where line numbers are a bit odd -- it doesn't affect the tracing but I think it'd affect the instrumentation given those odd line numbers in the code object for the jump instruction).

zooba

A few minor comments, plus a big-ish one about threading, which I see coming up relatively frequently for the current tracing system. Would be great to figure out a neat way to deal with it while we're doing something better.

zooba · 2021-09-20T23:11:56Z

pep-0669.rst

+
+C++ and Java developers expect to be able to run a program at full speed
+(or very close to it) under a debugger.
+Python developers should expect that too.


For those who think that pdb is the peak of debugging, and don't suffer from any performance issues with that, is there any motivation to support the change?

I'm obviously in agreement that a better approach would be great, just concerned that there may be push back from people who don't see the value when a few lines here about the current state of things (maybe the frequent checks for tracing?) would probably motivate things.

zooba · 2021-09-20T23:13:56Z

pep-0669.rst

+
+Code objects can be instrumented by calling::
+
+  instrumentation.instrument(codeobject, events)


Is this a new module? Why not put it in sys? (Even as sys.intrumentation...)

zooba · 2021-09-20T23:14:54Z

pep-0669.rst

+  instrumentation.instrument(codeobject, events)
+
+Code objects must be instrumented before they are executed.
+An exception will be raised if the code object has been executed before it


It said earlier that quickening lets us modify "executing" bytecode. Is that not actually the case?

And if so, when do we have a chance to instrument new code objects? Do they all have to be done on creation? Or can we do it on first call?

One place that should have the chance to instrument called code objects is the event handling in the parent function. I think there's a missing event type for that purpose though: there's currently no event that triggers just before a code object is invoked (receiving the code objects for both the caller and callee, and the offset in the caller).

zooba · 2021-09-20T23:36:02Z

pep-0669.rst

+Functions can be unregistered by calling
+``instrumentation.register(event, None)``.
+
+Callback functions can be registered at any time.


Seconding the question about whether they're global/per-interpreter/per-thread/per-...whatever.

Right now, people hit issues, confusion and sometimes ideas due to settrace's single-threadedness (particularly when you start tracing an already-running application and don't have a chance to force your trace into every thread, which is something we also saw regularly building the debugger for VS). The main problem being that tracing on one thread doesn't give you any way to trace - or even pause execution of - the others.

e.g. once you hit a breakpoint in one thread, the first thing you do that releases the GIL is going to let other threads start executing. Having some way to also trigger an event in the context of every thread would let a thread-aware debugger control which of them can keep executing. (Or maybe some other kind of approach makes more sense, that's just an idea that jumps out at me.) It doesn't seem to be essential to this PEP for this to be supported, but it's a good opportunity to add it while we're defining new events.

At the very least, having clear statements about the threading behaviour for what this PEP does add are essential.

zooba · 2021-09-20T23:37:01Z

pep-0669.rst

+When an event occurs the registered function will be called.
+The arguments provided are as follows:
+
+* BRANCH: ``func(code: CodeType, offset: int, taken:bool)``


Is the code object an important parameter? The frame is generally more useful (and implicitly includes the code object).

Then again, I guess for a profiler the code object is enough as it's likely maintaining its own call stacks. Debuggers will need to walk the frame even if callers weren't instrumented, and so having them do an extra call to start walking the stack is better than preemptively passing it in.

If that's the case, a sentence explaining it would be nice, so the next reader doesn't have to guess.

+1 to the case where the frame is needed for debuggers (i.e.: this PEP should have some note stating the recommended way to get the frame related to the instrumentation notification).

-- as a note, if performance-wise it'd be the same, I'd say that receiving the frame which contains the code object would be better. If there's some performance penalty, then another way to get it would also be reasonable since it's not always needed (but the recommended way to get it should be explicit in the PEP).

It wouldn't be the same performance wise. The new eval loop mostly avoids creating full python frame objects, so this new API provides an opportunity to retain that performance improvement, whereas the old one loses it (the frame objects have to be created anyway in order to pass them to the installed trace hook).

I do wonder if it might make sense to pass in a callable that requests the full frame object though. Otherwise hooks would have to use something similar to sys._getframes(), which feels awkward. That said, the callable wouldn't be cheap to create either, so @zooba's suggestion is probably the way to go (i.e. add text to the PEP explaining why the event callback API is the way it is)

zooba · 2021-09-20T23:43:16Z

pep-0669.rst

+Callback function arguments
+'''''''''''''''''''''''''''
+
+When an event occurs the registered function will be called.


And events only occur if they were instrumented?

fabioz · 2021-09-21T10:58:05Z

pep-0669.rst

+* EXIT: ``func(code: CodeType, offset: int)``
+* C_CALL: ``func(code: CodeType, offset: int, value: object)``
+* C_RETURN: ``func(code: CodeType, offset: int, value: object)``
+* C_EXCEPT: ``func(code: CodeType, offset: int, exception: BaseException)``


I believe that both the exception as well as the traceback would be important in the C_EXCEPT, RAISE, EXCEPT as well as the UNWIND cases.

As a note, if the idea is using exception.__traceback__, it'd be nice if if that's noted in the PEP... as a note, I'm not sure if exception.__traceback__ is always available -- is it possible that some exception raised -- say some custom exception object from C/C++ doesn't have it? (in which case it'd need to be passed as a parameter?)

fabioz · 2021-09-21T12:18:58Z

pep-0669.rst

+
+Code objects must be instrumented before they are executed.
+An exception will be raised if the code object has been executed before it
+is instrumented.


I think I had misread this PEP... I thought that only by using instrumentation.register(instrumentation.ENTER, profiler_func) I'd start to listen to any function call and instrumentation.register(instrumentation.UNWIND, profiler_func) would make it listen to any unhandled exception, but now, rereading, I'm under the impression that I'd just listen to function calls for code objects previously instrumented and it wouldn't even be possible to change it for code objects already running...

I think that the PEP could make that clearer.

Now, if it's really the case that:

there's no callback to listen to any call just prior to a code object/frame being executed

there's no way to listen to any events for code objects/frames already running

I'd say I'm -1 on the PEP as it'd just not be usable for many use cases (the happy path where this API would work would be too narrow to be usable)...

For instance, it's very common for users to attach to a running program and this PEP wouldn't support that at all and it'd be next to impossible to get a hold of all the code objects that'd need to be instrumented before they're actually run even on the case where the program is started with the debugger in place (an import hook to instrument code objects isn't enough as there are many corner cases where it wouldn't be able to get it).

Now, if I misunderstood it, then I think the PEP should make it clearer that those are in fact supported and which API would be used for that...

This is the distinction between the low overhead passive monitoring API ('instrumentation' in the current text) and the high overhead dynamic monitoring API ('monitoring' in the current text).

Turning on code coverage or profiling would have to be done early, so could be done dynamically only in the presence of dynamic compilation. But an interactive debugger would use the kinds of hooks that can be added to an existing code object, it wouldn't use the ones that are intended for code coverage and profiling.

vstinner · 2021-10-20T00:51:02Z

FYI I added https://www.python.org/dev/peps/pep-0670/ and so I skipped the number 669 reserved to this PR ;-)

ncoghlan

The general proposal here looks promising to me, but I think we want to try really hard to have the user facing runtime API just refer to "events" and "callbacks", and not require users to understand any new terminology that's specific to the new runtime hook implementation.

ncoghlan · 2021-10-24T05:52:45Z

pep-0669.rst

+
+Monitoring can occur at any point in the program's life and be applied
+anywhere in the program. Monitoring points are expected to few.
+The capabilities of monitoring are a superset of that of profiling,


I don't think the draft terms are fine, as I initially thought the "instrumentation API" was going to be "How clients specify what to monitor" and the monitoring API was going to be "How clients are notified of monitored events".

Even after having the distinction explained, I don't think I'd be able to remember "instrumentation is low overhead, monitoring is high overhead". If anything, I would expect them to be the other way around due to the way somewhat similar terminology gets used in hardware testing (non-intrusively monitoring ordinary device outputs vs instrumenting the test rig with additional data collection points).

For the actual distinction being made, my suggestions would be:

"passive monitoring API" (pervasive collection of events, low overhead; e.g. coverage, profiling)

"dynamic monitoring API" (selective collection of events, potentially high overhead; e.g. break points, watching local variables for changes)

ncoghlan · 2021-10-24T06:12:07Z

pep-0669.rst

+This PEP is fully backwards compatible.
+
+We may seek to remove ``sys.settrace`` in the future once the APIs provided
+by this PEP have been widely adopted, but that is for another PEP.


It would make sense to specify whether the new callback hooks are invoked before or after the existing trace hooks.

ncoghlan · 2021-10-24T06:15:18Z

pep-0669.rst

+
+Code objects must be instrumented before they are executed.
+An exception will be raised if the code object has been executed before it
+is instrumented.


This is the distinction between the low overhead passive monitoring API ('instrumentation' in the current text) and the high overhead dynamic monitoring API ('monitoring' in the current text).

Turning on code coverage or profiling would have to be done early, so could be done dynamically only in the presence of dynamic compilation. But an interactive debugger would use the kinds of hooks that can be added to an existing code object, it wouldn't use the ones that are intended for code coverage and profiling.

ncoghlan · 2021-10-24T06:25:49Z

pep-0669.rst

+When an event occurs the registered function will be called.
+The arguments provided are as follows:
+
+* BRANCH: ``func(code: CodeType, offset: int, taken:bool)``


It wouldn't be the same performance wise. The new eval loop mostly avoids creating full python frame objects, so this new API provides an opportunity to retain that performance improvement, whereas the old one loses it (the frame objects have to be created anyway in order to pass them to the installed trace hook).

ncoghlan · 2021-10-24T06:30:23Z

pep-0669.rst

+When an event occurs the registered function will be called.
+The arguments provided are as follows:
+
+* BRANCH: ``func(code: CodeType, offset: int, taken:bool)``


I do wonder if it might make sense to pass in a callable that requests the full frame object though. Otherwise hooks would have to use something similar to sys._getframes(), which feels awkward. That said, the callable wouldn't be cheap to create either, so @zooba's suggestion is probably the way to go (i.e. add text to the PEP explaining why the event callback API is the way it is)

ncoghlan · 2021-10-24T06:31:40Z

pep-0669.rst

+  instrumentation.register(event, func)
+
+Functions can be unregistered by calling
+``instrumentation.register(event, None)``.


Could something more descriptive than "register" be used as the name here? If I've understood the purpose of the function correctly, then set_event_callback feels like it would be clearer.

ncoghlan · 2021-10-24T06:35:54Z

pep-0669.rst

+Code objects can be instrumented by calling::
+
+  instrumentation.instrument(codeobject, events)
+


Could this function just be called enable_event_callbacks rather than requiring users to know what the term instrument means in the context of this PEP?

ncoghlan · 2021-10-24T06:41:20Z

pep-0669.rst

+  instrumentation.instrument(codeobject, events)
+
+Code objects must be instrumented before they are executed.
+An exception will be raised if the code object has been executed before it


One place that should have the chance to instrument called code objects is the event handling in the parent function. I think there's a missing event type for that purpose though: there's currently no event that triggers just before a code object is invoked (receiving the code objects for both the caller and callee, and the offset in the caller).

ncoghlan · 2021-10-24T06:43:13Z

pep-0669.rst

+* BRANCH: ``func(code: CodeType, offset: int, taken:bool)``
+* JUMPBACK: ``func(code: CodeType, offset: int)``
+* ENTER: ``func(code: CodeType, offset: int)``
+* EXIT: ``func(code: CodeType, offset: int)``


Are these with statement entry and exit or function entry and exit? Either way, it's likely worth making the event name longer to be clear about it.

ncoghlan · 2021-10-24T06:51:21Z

pep-0669.rst

+
+The following functions are provided to insert monitoring points::
+
+  instrumentation.insert_monitors(codeobject, *offsets)


With the adjusted naming I proposed above, the associated names for these APIs would be:

insert_dynamic_callbacks

remove_dynamic_callbacks

disable_dynamic_callback

enable_dynamic_callback

The callback registration function monitor_register would instead become set_dynamic_callback. (Although it isn't clear to me why the registration API needs to be different from the regular event hook registration - couldn't DYNAMIC just be another event type for callback registration purposes, even if the methods for adding dynamic hooks are different from those for enabling other event types?)

markshannon · 2021-11-24T16:28:41Z

I just want to say that I haven't abandoned this, but @fabioz comments about attaching a debugger have forced me to do redesign. I expect to push the new design early next month.

The new design will use the same approach internally (modifying the quickened bytecode at runtime) for performance, but provide a more event based interface to make it easier to work with.
The VM will be responsible for instrumentation, the user just needs to specify which events they are interested in.

Here are the events in my current whiteboard design:

PY_CALL -- Call of a Python function (actually immediately after the call, the callee's frame will be on the stack)
PY_RESUME -- Resumption of a Python function (for generator and coroutine functions), except for throw() calls.
PY_THROW -- A Python function is resumed by a throw() call.
PY_RETURN -- Return from a Python function
PY_YIELD -- Yield from a Python function
PY_UNWIND -- Exit from a Python function during exception unwinding.
C_CALL -- Call of a builtin function (before the call in this case)
C_RETURN -- Return from a builtin function (before the call in this case)
RAISE -- An exception is raised
HANDLED -- An exception is handled
LINE -- A new line in the program is reached.
INSTRUCTION -- A new VM instruction is about to be executed -- This is outrageously expensive, but does seem to have uses.
BACKEDGE -- An unconditional backedge in the control flow graph is reached.
BRANCH -- A conditional branch is about to be taken (or not).
BREAKPOINT -- A breakpoint is hit

This is sufficient to support all the use cases I am aware of, and will allow us to support sys.settrace and sys.setprofile tracing on top of it.

Feel free to suggest others, if you need them.

encukou · 2021-11-24T16:47:27Z

@nedbat, you might want to take a look at the list: #2070 (comment)

markshannon · 2021-12-06T13:37:06Z

Closing in favor of #2183 which describes the new design.

Thanks everyone for the feedback so far.

markshannon added 6 commits August 20, 2021 14:47

Initial draft of high performance debugging and profiling PEP.

2b3da81

Refine events and extend API a bit.

b9c6233

Fix number, dates and clean up text.

613d1ad

Fix formatting errors.

1dc252c

More edits to PEP 669.

ceb9540

Change title of PEP 669

c6dbfdd

markshannon requested a review from a team as a code owner September 13, 2021 15:42

the-knights-who-say-ni added the CLA signed label Sep 13, 2021

brettcannon requested changes Sep 13, 2021

View reviewed changes

brandtbucher reviewed Sep 13, 2021

View reviewed changes

gvanrossum reviewed Sep 13, 2021

View reviewed changes

fabioz reviewed Sep 15, 2021

View reviewed changes

zooba reviewed Sep 20, 2021

View reviewed changes

fabioz reviewed Sep 21, 2021

View reviewed changes

vstinner mentioned this pull request Oct 19, 2021

PEP 670: Convert macros to functions in the Python C API #2114

Merged

ncoghlan reviewed Oct 24, 2021

View reviewed changes

int19h mentioned this pull request Nov 19, 2021

Add tooltips to exception categories microsoft/debugpy#790

Closed

markshannon mentioned this pull request Nov 24, 2021

bpo-45786: Allocate space for frame in frame object. python/cpython#29729

Merged

markshannon closed this Dec 6, 2021

markshannon mentioned this pull request Dec 8, 2021

PEP 669: Add an enum and namespace the interface. #2186

Merged

markshannon deleted the high-performance-debugging branch March 23, 2022 12:17

	anywhere in the program. Monitoring points are expected to few.
	anywhere in the program. Monitoring points are expected to be few.

	The ``RAISE`` event with mark any blocks that did not complete.
	The ``RAISE`` event will mark any blocks that did not complete.

-* ENTER
-* EXIT
-* UNWIND
-* C_CALL
-* C_RETURN
-* RAISE
+* ``ENTER``
+* ``EXIT``
+* ``UNWIND``
+* ``C_CALL``
+* ``C_RETURN``
+* ``RAISE``

		Using the new API, code run under a debugger on 3.11 should easily outperform
		code run without a debugger on 3.10.


		Code objects can be instrumented by calling::

		instrumentation.instrument(codeobject, events)


		The following functions are provided to insert monitoring points::

		instrumentation.insert_monitors(codeobject, *offsets)

PEP 669: Low Impact Instrumentation and Monitoring for CPython. #2070

PEP 669: Low Impact Instrumentation and Monitoring for CPython. #2070

Conversation

markshannon commented Sep 13, 2021

brettcannon left a comment

Choose a reason for hiding this comment

brandtbucher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvanrossum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon Sep 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvanrossum commented Sep 13, 2021

fabioz left a comment • edited Loading

Choose a reason for hiding this comment

markshannon commented Sep 15, 2021

fabioz commented Sep 16, 2021 • edited Loading

markshannon commented Sep 16, 2021

fabioz commented Sep 16, 2021 • edited Loading

zooba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner commented Oct 20, 2021

ncoghlan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon commented Nov 24, 2021

encukou commented Nov 24, 2021

markshannon commented Dec 6, 2021

markshannon Sep 14, 2021 •

edited

Loading

fabioz left a comment •

edited

Loading

fabioz commented Sep 16, 2021 •

edited

Loading

fabioz commented Sep 16, 2021 •

edited

Loading