Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 669: Low Impact Instrumentation and Monitoring for CPython. #2070

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
318 changes: 318 additions & 0 deletions pep-0669.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,318 @@
PEP: 669
Title: Low Impact Instrumentation and Monitoring for CPython
Author: Mark Shannon <mark@hotpy.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Aug-2021
Post-History: 13-Sep-2021


Abstract
========

Using a profiler or debugger in CPython can have a severe impact on
performance. Slowdowns by an order of magnitude are not uncommon.
It does not have this bad.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It does not have this bad.
It does not have to be this bad.


This PEP proposes an API for instrumentation and monitoring of Python
programs running on CPython that will enable the insertion of instrumentation
and monitoring at low cost.

Using the new API, code run under a debugger on 3.11 should easily outperform
code run without a debugger on 3.10.
Comment on lines +22 to +23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably only when the debugger is not actually invoked? I can believe it will be faster for code in a frame containing a breakpoint that is never hit. But I cannot believe it would be faster if some Python code would be run during e.g. line tracing or when evaluating a condition each time a breakpoint is hit.


Profiling will still slow down execution, but by much less than in 3.10.

Motivation
==========

Developers should not have to pay an unreasonable cost to use debuggers,
profilers and other similar tools.

C++ and Java developers expect to be able to run a program at full speed
(or very close to it) under a debugger.
Python developers should expect that too.
Comment on lines +27 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is just rhetoric. There's already a bunch of that in the Abstract ("It does not have this bad", "at low cost", "easily outperform") -- I'd like to hear more about what approach this is actually taking sooner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the "motivation" section; it should be motivating 🙂
I've toned down the abstract to compensate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those who think that pdb is the peak of debugging, and don't suffer from any performance issues with that, is there any motivation to support the change?

I'm obviously in agreement that a better approach would be great, just concerned that there may be push back from people who don't see the value when a few lines here about the current state of things (maybe the frequent checks for tracing?) would probably motivate things.


Rationale
=========

The quickening mechanism provided by PEP 659 provides a way to dynamically
modify executing Python bytecode. These modifications have no cost beyond
the parts of the code that are modified and a relatively low cost to those
parts that are modified. We can leverage this to provide an efficient
mechanism for instrumentation and monitoring that was not possible in 3.10
or earlier.
Comment on lines +40 to +45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that you could lead with.


Specification
=============

There are two parts to this specification, instrumentation and monitoring.

Instrumentation occurs early in a program's life cycle and persists through
the lifetime of the program. It is expected to be pervasive, but fixed.
Instrumentation is designed to support profiling and coverage tools that
expect to be active for the entire lifetime of the program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want to profile only a specific function call? It that not covered? I suppose it is, but your simplified description of instrumentation leaves little room for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To profile calls to a specific function foo:

instrumentation.instrument(foo.__code__, instrumentation.ENTER)
instrumentation.register(instrumentation.ENTER, profiler_func)


Monitoring can occur at any point in the program's life and be applied
anywhere in the program. Monitoring points are expected to few.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
anywhere in the program. Monitoring points are expected to few.
anywhere in the program. Monitoring points are expected to be few.

The capabilities of monitoring are a superset of that of profiling,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm confused. The use cases that I'm familiar with are coverage, profiling and debugging (both stepping through code and breakpoints). How do these use cases map to the two parts, instrumentation and monitoring? You imply there's a mapping, but I'm confused by what the mapping is supposed to be, since debugging seems to fall in neither category.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging is a form of monitoring. The debugger is monitoring the program being run.
Any ideas for a better term than monitoring"? I don't want to use "debugging" as that is just a particular use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the terms are fine, all I'm asking for is that you clarify the mapping between the two concepts and the three or more use cases. 'Monitoring is a superset of profiling" doesn't do that for me, but "A profiler would use the monitoring API" would. Except your example of profiling a function (in the comment above) uses instrumentation, not profiling. So I'm still confused about the relationship between instrumentation and monitoring, and between those and the use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the draft terms are fine, as I initially thought the "instrumentation API" was going to be "How clients specify what to monitor" and the monitoring API was going to be "How clients are notified of monitored events".

Even after having the distinction explained, I don't think I'd be able to remember "instrumentation is low overhead, monitoring is high overhead". If anything, I would expect them to be the other way around due to the way somewhat similar terminology gets used in hardware testing (non-intrusively monitoring ordinary device outputs vs instrumenting the test rig with additional data collection points).

For the actual distinction being made, my suggestions would be:

  • "passive monitoring API" (pervasive collection of events, low overhead; e.g. coverage, profiling)
  • "dynamic monitoring API" (selective collection of events, potentially high overhead; e.g. break points, watching local variables for changes)

but bulk insertion of monitoring points will be *much* more
expensive than insertion of instrumentation.
Comment on lines +60 to +61
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a non-sequitur. First you compare monitoring to profiling, and then you compare it to instrumentation, separated by "but". I miss the intention of that "but".


Both instrumentation and monitoring is performed by insertion of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"are"

checkpoints in a code object.

Checkpoints
-----------

A checkpoint is simply a point in code defined by a
``(codeobject, offset)`` pair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are offsets measured in bytes or instructions?

Every time a checkpoint is reached, the registered callable is called.

Instrumentation
---------------

Instrumentation supports the bulk insertion of checkpoints, but does not
allow insertion or removal of checkpoints after code has started to execute.

The events are::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define "event" first.


* BRANCH: A conditional branch is reached.
* JUMPBACK: A backwards, unconditional branch is reached.
* ENTER: A Python function is entered.
* EXIT: A Python function exits normally (without an exception).
* UNWIND: A Python function exits with an unhandled exception.
* C_CALL: A call to any object that is not a Python function.
* C_RETURN: A return from any object that is not a Python function.
* RAISE: An exception is raised.
* EXCEPT: An exception is handled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At what point is an exception considered being handled? E.g. in

try:
    1/0
except RuntimeError:
    print(1)
except ZeroDivisionError:
    print(2)

Does the checking whether the raised exception matches RuntimeError count as "handling"? Or should I see this as happening just before print(2)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the checking whether the raised exception matches RuntimeError count as "handling"?

Yes. These are VM level events. I've changed the wording to :
EXCEPT: Control is transferred to an exception handler.


For each ``ENTER`` event there will be a corresponding
``EXIT`` or ``UNWIND`` event.
For each ``C_CALL`` event there will be a corresponding
``C_RETURN`` or ``RAISE`` event.

All events are integer powers of two and can be bitwise or-ed together to
instrument multiple events.

Instrumenting code objects
''''''''''''''''''''''''''

Code objects can be instrumented by calling::

instrumentation.instrument(codeobject, events)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code objects containing other code objects (e.g. nested functions/classes, lambdas, comprehensions), does this affect the sub-objects?

Also, at this point I'm really holding my breath until I see how the event is delivered to some kind of event handler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code objects containing other code objects (e.g. nested functions/classes, lambdas, comprehensions), does this affect the sub-objects?

No. Just the code object specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's worth calling out in the text as a clarification then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new module? Why not put it in sys? (Even as sys.intrumentation...)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this function just be called enable_event_callbacks rather than requiring users to know what the term instrument means in the context of this PEP?

Code objects must be instrumented before they are executed.
An exception will be raised if the code object has been executed before it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It said earlier that quickening lets us modify "executing" bytecode. Is that not actually the case?

And if so, when do we have a chance to instrument new code objects? Do they all have to be done on creation? Or can we do it on first call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One place that should have the chance to instrument called code objects is the event handling in the parent function. I think there's a missing event type for that purpose though: there's currently no event that triggers just before a code object is invoked (receiving the code objects for both the caller and callee, and the offset in the caller).

is instrumented.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had misread this PEP... I thought that only by using instrumentation.register(instrumentation.ENTER, profiler_func) I'd start to listen to any function call and instrumentation.register(instrumentation.UNWIND, profiler_func) would make it listen to any unhandled exception, but now, rereading, I'm under the impression that I'd just listen to function calls for code objects previously instrumented and it wouldn't even be possible to change it for code objects already running...

I think that the PEP could make that clearer.

Now, if it's really the case that:

  • there's no callback to listen to any call just prior to a code object/frame being executed
  • there's no way to listen to any events for code objects/frames already running

I'd say I'm -1 on the PEP as it'd just not be usable for many use cases (the happy path where this API would work would be too narrow to be usable)...

For instance, it's very common for users to attach to a running program and this PEP wouldn't support that at all and it'd be next to impossible to get a hold of all the code objects that'd need to be instrumented before they're actually run even on the case where the program is started with the debugger in place (an import hook to instrument code objects isn't enough as there are many corner cases where it wouldn't be able to get it).

Now, if I misunderstood it, then I think the PEP should make it clearer that those are in fact supported and which API would be used for that...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the distinction between the low overhead passive monitoring API ('instrumentation' in the current text) and the high overhead dynamic monitoring API ('monitoring' in the current text).

Turning on code coverage or profiling would have to be done early, so could be done dynamically only in the presence of dynamic compilation. But an interactive debugger would use the kinds of hooks that can be added to an existing code object, it wouldn't use the ones that are intended for code coverage and profiling.


Register callback functions for instrumentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this section needs to come first?

'''''''''''''''''''''''''''''''''''''''''''''''

To register a callable for events call::

instrumentation.register(event, func)

Functions can be unregistered by calling
``instrumentation.register(event, None)``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could something more descriptive than "register" be used as the name here? If I've understood the purpose of the function correctly, then set_event_callback feels like it would be clearer.


Callback functions can be registered at any time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These callbacks are global, right? (Per-interpreter, I presume.)

Which makes me wonder about threads and reentrancy. I suppose to some extent callbacks are protected by the GIL. But callbacks can be Python code (presumably). I suppose events are disabled while a callback is running?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events are always active unless explicitly disabled. Callbacks can be any callable, including Python code.
Instrumenting your instrumentation callback function will result in a recursion error. So you can't profile a profiler.

You can add monitor checkpoints to the callback function for instrumentation. It will be possible to debug a profiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this makes sense now I understand that each code object needs to be instrumented separately. Nevertheless your clarifications in the comments would be helpful if added to the text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconding the question about whether they're global/per-interpreter/per-thread/per-...whatever.

Right now, people hit issues, confusion and sometimes ideas due to settrace's single-threadedness (particularly when you start tracing an already-running application and don't have a chance to force your trace into every thread, which is something we also saw regularly building the debugger for VS). The main problem being that tracing on one thread doesn't give you any way to trace - or even pause execution of - the others.

e.g. once you hit a breakpoint in one thread, the first thing you do that releases the GIL is going to let other threads start executing. Having some way to also trigger an event in the context of every thread would let a thread-aware debugger control which of them can keep executing. (Or maybe some other kind of approach makes more sense, that's just an idea that jumps out at me.) It doesn't seem to be essential to this PEP for this to be supported, but it's a good opportunity to add it while we're defining new events.

At the very least, having clear statements about the threading behaviour for what this PEP does add are essential.


Callback function arguments
'''''''''''''''''''''''''''

When an event occurs the registered function will be called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And events only occur if they were instrumented?

The arguments provided are as follows:

* BRANCH: ``func(code: CodeType, offset: int, taken:bool)``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the code object an important parameter? The frame is generally more useful (and implicitly includes the code object).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again, I guess for a profiler the code object is enough as it's likely maintaining its own call stacks. Debuggers will need to walk the frame even if callers weren't instrumented, and so having them do an extra call to start walking the stack is better than preemptively passing it in.

If that's the case, a sentence explaining it would be nice, so the next reader doesn't have to guess.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the case where the frame is needed for debuggers (i.e.: this PEP should have some note stating the recommended way to get the frame related to the instrumentation notification).

-- as a note, if performance-wise it'd be the same, I'd say that receiving the frame which contains the code object would be better. If there's some performance penalty, then another way to get it would also be reasonable since it's not always needed (but the recommended way to get it should be explicit in the PEP).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't be the same performance wise. The new eval loop mostly avoids creating full python frame objects, so this new API provides an opportunity to retain that performance improvement, whereas the old one loses it (the frame objects have to be created anyway in order to pass them to the installed trace hook).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if it might make sense to pass in a callable that requests the full frame object though. Otherwise hooks would have to use something similar to sys._getframes(), which feels awkward. That said, the callable wouldn't be cheap to create either, so @zooba's suggestion is probably the way to go (i.e. add text to the PEP explaining why the event callback API is the way it is)

* JUMPBACK: ``func(code: CodeType, offset: int)``
* ENTER: ``func(code: CodeType, offset: int)``
* EXIT: ``func(code: CodeType, offset: int)``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these with statement entry and exit or function entry and exit? Either way, it's likely worth making the event name longer to be clear about it.

* C_CALL: ``func(code: CodeType, offset: int, value: object)``
* C_RETURN: ``func(code: CodeType, offset: int, value: object)``
* C_EXCEPT: ``func(code: CodeType, offset: int, exception: BaseException)``
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that both the exception as well as the traceback would be important in the C_EXCEPT, RAISE, EXCEPT as well as the UNWIND cases.

As a note, if the idea is using exception.__traceback__, it'd be nice if if that's noted in the PEP... as a note, I'm not sure if exception.__traceback__ is always available -- is it possible that some exception raised -- say some custom exception object from C/C++ doesn't have it? (in which case it'd need to be passed as a parameter?)

* RAISE: ``func(code: CodeType, offset: int, exception: BaseException)``
* EXCEPT: ``func(code: CodeType, offset: int)``
* UNWIND: ``func(code: CodeType)``
Comment on lines +128 to +137
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order here is different than in the first list of events. Maybe order them the same each time?


Monitoring
----------

Monitoring allows checkpoints to be inserted or removed at any
point in the program's execution.

The following functions are provided to insert monitoring points::

instrumentation.insert_monitors(codeobject, *offsets)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the adjusted naming I proposed above, the associated names for these APIs would be:

  • insert_dynamic_callbacks
  • remove_dynamic_callbacks
  • disable_dynamic_callback
  • enable_dynamic_callback

The callback registration function monitor_register would instead become set_dynamic_callback. (Although it isn't clear to me why the registration API needs to be different from the regular event hook registration - couldn't DYNAMIC just be another event type for callback registration purposes, even if the methods for adding dynamic hooks are different from those for enabling other event types?)

instrumentation.remove_monitors(codeobject, *offsets)
instrumentation.monitor_off(codeobject, offset)
instrumentation.monitor_on(codeobject, offset)
Comment on lines +147 to +150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is monitor_off(co, offs) equivalent to insert_monitors(co, [offs])? If so, why bother with the single-offset API at all? Certainly a user of this API can be trusted to listify the argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is monitor_off(co, offs) equivalent to insert_monitors(co, [offs])?

No. Turning a monitor on or off is a very cheap (<1us) operation. Inserting or removing a monitor is a very expensive operation maybe taking 100s of milliseconds as it may cause cascading de-optimizations.

So, we want to allow multiple monitoring checkpoints to be inserted at once because:

  1. Each insertion (or removal) or monitors can be very expensive.
  2. Debuggers need to be able to insert multiple check points for lines that have several instructions (e.g. code in finally blocks, and duplicated tests in while loops).

If a user can't listify the arguments, I don't have much hope of them implementing a debugger 🙂

Copy link
Member

@gvanrossum gvanrossum Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. This important distinction might be easier to follow if you allowed yourself some mention of the (roughly) intended implementation. In my imagination, inserting monitors is roughly equivalent to

  1. Specialize the code object if not already specialized
  2. For each offset, replace the specialized opcode with a MONITOR opcode
  3. Maybe clear the inline cache for that code object?

But then monitor_off() would seem to be equivalent to replacing the MONITOR opcode with the original opcode (or perhaps the specialized variant, if we care).

What kind of cascading de-optimizations am I missing? Anything that's currently implemented, or is this referring to e.g. generating machine code?


All functions return ``True`` if a monitor checkpoint was present,
or ``False`` if a monitor checkpoint was not present.
Comment on lines +152 to +153
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little vague. If I insert checkpoints at offsets 10 and 20, and one was present at 10 but not at 20, should it return True or False? Or should it return a list[bool]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is ambiguous.
I've changed that part of the API.

Turning on, or off, a non-existent checkpoint is a no-op;
no exception is raised.
Comment on lines +154 to +155
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But presumably it would return False, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


To register a callable for monitoring function events call::

instrumentation.monitor_register(func)

The callback function will be called with the code object and offset as arguments::

func(code: CodeType, offset: int)

For optimizing virtual machines, such as future versions of CPython
(and ``PyPy`` should they choose to support this API), a call to
``insert_monitors`` and ``remove_monitors`` in a long running program
could be quite expensive, possibly taking 100s of milliseconds as it
triggers de-optimizations. Repeated calls to ``insert_monitors``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you foresee these 100s of ms being spent? Tracking down all the affected code objects? Rewriting machine code? Clearing the inline cache data? What else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes those sort of things. We might need to scan all compiled code to see what needs to be de-optimized, then do quite a lot of clean up.

I would expect this to only happen in an interactive debugger, where a 500ms pause is barely noticeable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I realize I asked the same question again before reading your reply. But I'm still not sure of the scope of the deoptimization. All your APIs are specific to a code object, and I can't imagine deoptimizing a single code object taking more than a few msec. So there must be some kind of optimization you are planning that spans multiple code objects. What? Is there an issue in the faster-cpython/ideas tracker about this?)

and ``remove_monitors``, as may be required in an interactive debugger,
should be relatively inexpensive.

Combining Checkpoints
---------------------

Only one instrumentation checkpoint and one monitoring checkpoint is allowed
per bytecode instruction. It is possible to have both a monitoring and
instrumentation checkpoint on the same instruction; they are independent.
Monitors will be called before instrumentation if both are present.
Comment on lines +176 to +179
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confused me, since the instrumentation API has no explicit mention of checkpoints (there are no offsets in the instrumentation API, only code objects). I take it that instrumentation works by replacing specific bytecodes (e.g. certain JUMP instructions for BRANCH and JUMPBACK), and monitoring also works by doing that? And the instruction has two flag bits indicating whether it is an instrumentation or monitoring checkpoint or both?

It would seem that the APIs already imply that only one instrumentation checkpoint can exist per instruction, and only one monitoring checkpoint. So this paragraph only adds in which order they will be called.


Backwards Compatibility
=======================

This PEP is fully backwards compatible.

We may seek to remove ``sys.settrace`` in the future once the APIs provided
by this PEP have been widely adopted, but that is for another PEP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to specify whether the new callback hooks are invoked before or after the existing trace hooks.



Security Implications
=====================

Allowing modification of running code has some security implications,
but no more than the ability to generate and call new code.

All the functions listed above will trigger audit hooks.


Implementation
==============

The implementation of this PEP will be built on top of PEP 659 quickening.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The implementation of this PEP will be built on top of PEP 659 quickening.
The implementation of this PEP will be built on top of PEP 659 (Specializing Adaptive Interpreter).

Instrumentation or monitoring of a code object will cause it to be quickened.
Checkpoints will then be implemented by inserting one of several special
``CHECKPOINT`` instructions into the quickened code. These instructions
will call the registered callable before executing the original instruction.

Note that this can interfere with specialization, which will result in
performance degradation in addition to the overhead of calling the
registered callable.

Implementing tools
==================

It is the philosophy of this PEP that third-party tools should be able to
achieve high-performance, not that it should be easy for them to do so.
This PEP provides the necessary API for tools, but does nothing to help
them determine when and where to insert instrumentation or monitors.

Debuggers
---------

Inserting breakpoints
'''''''''''''''''''''

Breakpoints should be implemented as monitors.
To insert a breakpoint at a given line, the matching instruction offsets
should be found from ``codeobject.co_lines()``.
Then a monitor should be added for each of those offsets.
Comment on lines +227 to +229
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with co_lines(). When does it return multiple offsets? Only when the code of that line is duplicated (e.g. by loop unrolling or similar optimizations)?

To avoid excessive overhead, a single call should be made to
``instrumentation.insert_monitors`` passing all the offsets at once.

Breakpoints can suspended with ``instrumentation.monitor_off``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that re-specialize the instruction at that offset?

Regardless of the answer it would seem this is deleting a breakpoint, not just suspending it. (If there was a difference, I'd thing that suspending makes it easy to re-enable, but there doesn't seem to be a semantic difference in this case, so I prefer the simpler "deleting".)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that re-specialize the instruction at that offset?

Maybe, but probably not.

Turning a monitor off and removing it have very different performance profiles.

Turning a monitor off is cheap, but will interfere with optimization.
Removing a monitor is very expensive, but will ultimately allow full optimization.

It is up to the implementer of the debugger which to use, but if the user of the debugger turns a breakpoint off, rather than removing it, it implies they may want to turn it on again soon.


Debuggers can break on exceptions being raised by registering a callable
for ``RAISE``:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the RAISE event applies to any reason an exception is raised, not just a raise statement, right? This could bear clarification.


``instrumentation.register(RAISE, break_on_raise_handler)``

Stepping
''''''''

Debuggers usually offer the ability to step execution by a
single instruction or line.

This can be implemented by inserting a new monitor at the required
offset(s) of the code to be stepped to,
and by removing or disabling the current monitor.

It is the job of the debugger to compute the relevant offset(s).

Coverage Tools
--------------

Coverage tools need to track which parts of the control graph have been
executed. To do this, they need to track most events and map those events
onto the control flow graph of the code object.
``BRANCH``, ``JUMPBACK``, ``START`` and ``RESUME`` events will inform which
basic blocks have started to execute.
The ``RAISE`` event with mark any blocks that did not complete.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ``RAISE`` event with mark any blocks that did not complete.
The ``RAISE`` event will mark any blocks that did not complete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with"?


This can be then be converted back into a line based report after execution
has completed.

Profilers
---------

Simple profilers need to gather information about calls.
To do this profilers should register for the following events:

* ENTER
* EXIT
* UNWIND
* C_CALL
* C_RETURN
* RAISE
Comment on lines +271 to +276
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ENTER
* EXIT
* UNWIND
* C_CALL
* C_RETURN
* RAISE
* ``ENTER``
* ``EXIT``
* ``UNWIND``
* ``C_CALL``
* ``C_RETURN``
* ``RAISE``

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is RAISE needed here?

Copy link
Member Author

@markshannon markshannon Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match C_CALL if the callee raises.


Line based profilers
''''''''''''''''''''

Line based profilers will also need to handle ``BRANCH`` and ``JUMPBACK``
events.
Beware that handling these extra events will have a large performance impact.

.. note::

Instrumenting profilers have a significant overhead and will distort the
results of profiling. Unless you need exact call counts,
consider using a statistical profiler.

Open Issues
===========

[Any points that are still being decided/discussed.]


References
==========

[A collection of URLs used as references through the PEP.]
Comment on lines +297 to +300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this if you're not planning to add any references.



Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.



..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: