From 2b3da814553beab14aacb2658bd2dc431f455df4 Mon Sep 17 00:00:00 2001 From: Mark Shannon Date: Fri, 20 Aug 2021 14:47:09 +0100 Subject: [PATCH 1/6] Initial draft of high performance debugging and profiling PEP. --- pep-9999.rst | 281 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 281 insertions(+) create mode 100644 pep-9999.rst diff --git a/pep-9999.rst b/pep-9999.rst new file mode 100644 index 00000000000..34a18ca3450 --- /dev/null +++ b/pep-9999.rst @@ -0,0 +1,281 @@ +PEP: 12 +Title: High Performance Instrumentation and Monitoring +Author: Mark Shannon +Status: Draft +Type: Standards +Content-Type: text/x-rst +Created: 18-Aug-2021 +Post-History: xx-Aug-2021 + + +Abstract +======== + +The performance of Python when using tools that instrument or monitor the runtime +is awful. It does not need to be so bad. +This PEP proposes a set of APIs for instrumentation and monitoring of Python +programs running on CPython that will enable the insertion of instrumentation +and monitoring at low cost. + +The expectation is that heavily instrumented code, +such as is needed for profiling and coverage tools would suffer a slowdown +of no more than 30% and would run under 3.11 at speeds comparable with +uninstrumented code running on 3.10. + +For lightly instrumented code or code with a monitoring points, there +should be no discernable overhead. + +This means that code run under a debugger for 3.11 should outperform code run +without a debugger on 3.10. +Programs run with profiling and coverage on 3.11 should perform no more than +10% slower than on 3.10. + +Motivation +========== + +Developers should not have to pay an unreasonable cost to use debuggers, profilers +and other tools. + +C++ and Java developers expect to be able to run a program at full speed +(or very close to it). Python developers should be able to expect it too. + +Rationale +========= + +It makes sense to add these features to 3.11 because PEP 659 provides us with a way +to dynamically modify executing Python bytecode in a way that has no cost to parts +of the code that are not modified and only low cost to those parts that are modified. + +Specification +============= + +There are two parts to this specification, instrumentation and monitoring. + +Instrumentation occurs early in a program's life cycle and persists. +It is expected to be pervasive, but fixed. +Instrumentation is designed to support profiling and coverage tools that +expect to be active for the entire lifetime. + +Monitoring can occur at any point in the program's life and be applied anywhere in +the program. Monitoring points are expected to few. The capabilities of monitoring +are a superset of that of profiling, but insertion of monitoring points will be much more +expensive that insertion of instrumentation. + +Both instrumentation and monitoring is performed by insertion of a checkpoint in a code object. + +Checkpoints +----------- + +A checkpoint is simply a point in code, that is defined by a ``(codeobject, offset)`` pair. +Everytime a checkpoint is reached, the registered callable is called. + +Instrumentation +--------------- + +Instrumentation supports the bulk insertion of checkpoints, +but does not allow insertion of checkpoints after code has started to execute. + +``instrumentation.register(event, func)`` + +Functions can be unregistered by calling ``instrumentation.register(event, None)``. + +#define PyTrace_CALL 0 +#define PyTrace_EXCEPTION 1 +#define PyTrace_LINE 2 +#define PyTrace_RETURN 3 +#define PyTrace_C_CALL 4 +#define PyTrace_C_EXCEPTION 5 +#define PyTrace_C_RETURN 6 +#define PyTrace_OPCODE 7 + +The events are:: + + * BRANCH: Any conditional branch is taken, or not. + * JUMPBACK: Any backwards, unconditional branch is taken. + * CALL: A call to a Python function is made. + * C_CALL: A call to any object that is not a Python function. + * RETURN: A return from a Python function. + * C_RETURN: A return from any object that is not a Python function. + * YIELD: A yield occurs. + * RESUME: A generator or coroutine resumes after a YIELD. + * RAISE: An exception is raised. + * EXCEPT: An exception is handled. + * UNWIND: An exception cause the frame stack to be unwound. + * LINE: A new line is reached. + +All events are integer powers of two and can be bitwise or-ed together to instrument multiple events. + +Code objects can be instrumented by calling: + +``instrumentation.instrument(codeobject, events)`` + +Individual instrumentation check points can be turned on or off with: + +``instrumentation.instrument_off(codeobject, offset)`` +``instrumentation.instrument_on(codeobject, offset)`` + +Turning a non-existent checkpoint on or off is a no-op. No exception is raised. + +Instrumentation checkpoints cannot be removed. + +Callback functions +'''''''''''''''''' + +When an event occurs the registered function will be called. The arguments provided are as follows: + +* BRANCH: ``func(code: CodeType, offset: int, taken:bool)`` +* JUMPBACK: ``func(code: CodeType, offset: int)`` +* CALL: ``func(code: CodeType, offset: int, callable: object)`` +* START: ``func(code: CodeType, offset: int)`` +* RETURN: ``func(code: CodeType, offset: int, value: object)`` +* YIELD: ``func(code: CodeType, offset: int, value: object)`` +* RESUME: ``func(code: CodeType, offset: int)`` +* RAISE: ``func(code: CodeType, offset: int, exception: BaseException)`` +* EXCEPT: ``func(code: CodeType, offset: int)`` + +Monitoring +---------- + +Monitoring allows checkpoints to be inserted or removed at any point in the program's execution. + +The following functions are provided: + +``instrumentation.insert_monitors(codeobject, *offsets)`` +``instrumentation.remove_monitors(codeobject, *offsets)`` +``instrumentation.monitor_off(codeobject, offset)`` +``instrumentation.monitor_on(codeobject, offset)`` + +All functions return ``True`` if a monitor checkpoint was present. +Turning a non existent checkpoint on or off is a no-op. No exception is raised. + +``instrumentation.monitor_register(func)`` + +For optimizing virtual machines, such as ``PyPy`` and future versions of CPython, +calls to ``insert_monitors`` and ``remove_monitors`` may be quite expensive. +Calls may take 100s of milliseconds for a large program, as it they trigger de-optimizations. + +Once the call is completed, the impact on performance should be negligible. + +Combining checkpoints +--------------------- + +Only one instrumentation checkpoint and one monitoring checkpoint is allowed per bytecode instruction. +It is possible to have both a monitoring and instrumentation checkpoint on the same instruction; +they are independent. Monitors will be called before instrumentation if both are present. + +Backwards Compatibility +======================= + +This PEP is fully backwards compatible. +We may seek to remove ``sys.settrace`` in the future once the APIs provided by this PEP +have been widely adopted, but that is outside the scope of this PEP. + + +Security Implications +===================== + +Allowing modification of running code has some security implications, +but no more than the ability to generate and call new code. + +All the functions listed above will trigger audit hooks. + + +Implementation +============== + +The implementation of this PEP will be built on top of PEP 659 quickening. +Instrumentation or monitoring of a code object will first cause it to be quickened. +Checkpoints will then be implemented by inserting one of several special ``CHECKPOINT`` +instructions into the quickened code. These instructions will call the registered callable +before executing the original instruction. + +Note that this can interfere with specialization, which will result in performance degradation +in addition to the overhead of calling the registered callable. + +Implementing tools +================== + +Debuggers +--------- + +Most of the features of a debugger are unchanged. Presenting the state of the VM to the user +depends on introspection, not monitoring. It is the insertion of breakpoints, that differs. + +Inserting breakpoints +''''''''''''''''''''' + +Breakpoints are simply monitors. To insert a breakpoint at a given line, the matching instruction +offsets should be found from ``codeobject.co_lines()``. +Then a monitor should be added for each of those offsets. To avoid excessive overhead, a single call +should be made to ``instrumentation.insert_monitors`` passing all the offsets at once. + +Breakpoints can suspended with ``instrumentation.monitor_off``. + +Debuggers can break on exceptions being raised by registering a callable for ``RAISE``: + +``instrumentation.register(RAISE, break_on_raise_handler)`` + +Coverage Tools +-------------- + +Coverage tools need to track which parts of the control graph have been executed. To do this, they need +to track most events and map those events onto the control flow graph of the code object. +``BRANCH``, ``JUMPBACK``, ``START`` and ``RESUME`` events will inform which basic blocks have started to execute. +The ``RAISE`` event with mark any blocks that did not complete. + +This can be then be converted back into a line based report after execution has completed. + +Profilers +--------- + +Simple profilers need to gather information about calls. To do this profilers should register for +the following events: + +* CALL: ``func(code: CodeType, offset: int, callable: object)`` +* START: ``func(code: CodeType, offset: int)`` +* RETURN: ``func(code: CodeType, offset: int, value: object)`` +* YIELD: ``func(code: CodeType, offset: int, value: object)`` +* RESUME: ``func(code: CodeType, offset: int)`` +* RAISE: ``func(code: CodeType, offset: int, exception: BaseException)`` +* EXCEPT: ``func(code: CodeType, offset: int)`` + + +Line based profilers +'''''''''''''''''''' + +Line based profilers will also need to handle ``LINE`` events. +Beware that handling ``LINE`` events will have a large performance impact. + + .. note: + + Instrumenting profilers have a large overhead and will distort the results of profiling. + Unless you need exact call counts, consider using a statistical profiler. + +Open Issues +=========== + +[Any points that are still being decided/discussed.] + + +References +========== + +[A collection of URLs used as references through the PEP.] + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: From b9c62331ebcbdc441c424023c06658da09565ccd Mon Sep 17 00:00:00 2001 From: Mark Shannon Date: Fri, 20 Aug 2021 16:23:52 +0100 Subject: [PATCH 2/6] Refine events and extend API a bit. --- pep-9999.rst | 66 ++++++++++++++++++++++++---------------------------- 1 file changed, 30 insertions(+), 36 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 34a18ca3450..38abb212c25 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -1,11 +1,11 @@ -PEP: 12 +PEP: 6xx Title: High Performance Instrumentation and Monitoring Author: Mark Shannon Status: Draft Type: Standards Content-Type: text/x-rst Created: 18-Aug-2021 -Post-History: xx-Aug-2021 +Post-History: xx-Sep-2021 Abstract @@ -13,22 +13,22 @@ Abstract The performance of Python when using tools that instrument or monitor the runtime is awful. It does not need to be so bad. + This PEP proposes a set of APIs for instrumentation and monitoring of Python programs running on CPython that will enable the insertion of instrumentation and monitoring at low cost. -The expectation is that heavily instrumented code, -such as is needed for profiling and coverage tools would suffer a slowdown -of no more than 30% and would run under 3.11 at speeds comparable with -uninstrumented code running on 3.10. +Using the new APIs, code run under a debugger on 3.11 should easily outperform +code run without a debugger on 3.10. -For lightly instrumented code or code with a monitoring points, there -should be no discernable overhead. +For relatively light instrumentation, such as required for ``cProfile``, +instrumented programs should run faster on 3.11 than +uninstrumented programs on 3.10. -This means that code run under a debugger for 3.11 should outperform code run -without a debugger on 3.10. -Programs run with profiling and coverage on 3.11 should perform no more than -10% slower than on 3.10. +For heavier instrumentation, such as required for ``coverage.py``, instrumented +programs on 3.11 are likely to be slower than uninstrumented programs on 3.10. +However, they should be *much* faster than running on 3.10 using ``sys.settrace`` +based tracing. Motivation ========== @@ -37,12 +37,13 @@ Developers should not have to pay an unreasonable cost to use debuggers, profile and other tools. C++ and Java developers expect to be able to run a program at full speed -(or very close to it). Python developers should be able to expect it too. +(or very close to it). Python developers should expect that too. Rationale ========= -It makes sense to add these features to 3.11 because PEP 659 provides us with a way + +It makes sense to add these features to 3.11, because PEP 659 provides us with a way to dynamically modify executing Python bytecode in a way that has no cost to parts of the code that are not modified and only low cost to those parts that are modified. @@ -54,7 +55,7 @@ There are two parts to this specification, instrumentation and monitoring. Instrumentation occurs early in a program's life cycle and persists. It is expected to be pervasive, but fixed. Instrumentation is designed to support profiling and coverage tools that -expect to be active for the entire lifetime. +expect to be active for the entire lifetime of the program. Monitoring can occur at any point in the program's life and be applied anywhere in the program. Monitoring points are expected to few. The capabilities of monitoring @@ -66,8 +67,8 @@ Both instrumentation and monitoring is performed by insertion of a checkpoint in Checkpoints ----------- -A checkpoint is simply a point in code, that is defined by a ``(codeobject, offset)`` pair. -Everytime a checkpoint is reached, the registered callable is called. +A checkpoint is simply a point in code defined by a ``(codeobject, offset)`` pair. +Every time a checkpoint is reached, the registered callable is called. Instrumentation --------------- @@ -79,29 +80,17 @@ but does not allow insertion of checkpoints after code has started to execute. Functions can be unregistered by calling ``instrumentation.register(event, None)``. -#define PyTrace_CALL 0 -#define PyTrace_EXCEPTION 1 -#define PyTrace_LINE 2 -#define PyTrace_RETURN 3 -#define PyTrace_C_CALL 4 -#define PyTrace_C_EXCEPTION 5 -#define PyTrace_C_RETURN 6 -#define PyTrace_OPCODE 7 - The events are:: * BRANCH: Any conditional branch is taken, or not. * JUMPBACK: Any backwards, unconditional branch is taken. - * CALL: A call to a Python function is made. + * ENTER: A Python function is entered. + * EXIT: A Python function exits normally (without an exception). * C_CALL: A call to any object that is not a Python function. - * RETURN: A return from a Python function. * C_RETURN: A return from any object that is not a Python function. - * YIELD: A yield occurs. - * RESUME: A generator or coroutine resumes after a YIELD. * RAISE: An exception is raised. * EXCEPT: An exception is handled. * UNWIND: An exception cause the frame stack to be unwound. - * LINE: A new line is reached. All events are integer powers of two and can be bitwise or-ed together to instrument multiple events. @@ -116,6 +105,11 @@ Individual instrumentation check points can be turned on or off with: Turning a non-existent checkpoint on or off is a no-op. No exception is raised. +All the checkpoints matching certain events for a code object can be turned on or off simultaneously with: + +``instrumentation.instrument_all_off(codeobject, events)`` +``instrumentation.instrument_all_on(codeobject, events)`` + Instrumentation checkpoints cannot be removed. Callback functions @@ -125,13 +119,13 @@ When an event occurs the registered function will be called. The arguments provi * BRANCH: ``func(code: CodeType, offset: int, taken:bool)`` * JUMPBACK: ``func(code: CodeType, offset: int)`` -* CALL: ``func(code: CodeType, offset: int, callable: object)`` -* START: ``func(code: CodeType, offset: int)`` -* RETURN: ``func(code: CodeType, offset: int, value: object)`` -* YIELD: ``func(code: CodeType, offset: int, value: object)`` -* RESUME: ``func(code: CodeType, offset: int)`` +* ENTER: ``func(code: CodeType, offset: int)`` +* EXIT: ``func(code: CodeType, offset: int)`` +* C_CALL: ``func(code: CodeType, offset: int, value: object)`` +* C_RETURN: ``func(code: CodeType, offset: int, value: object)`` * RAISE: ``func(code: CodeType, offset: int, exception: BaseException)`` * EXCEPT: ``func(code: CodeType, offset: int)`` +* UNWIND: ``func(code: CodeType)`` Monitoring ---------- From 613d1adc7b8f2e4208120a0e702999d6a0b2018a Mon Sep 17 00:00:00 2001 From: Mark Shannon Date: Mon, 13 Sep 2021 15:44:27 +0100 Subject: [PATCH 3/6] Fix number, dates and clean up text. --- pep-0669.rst | 322 +++++++++++++++++++++++++++++++++++++++++++++++++++ pep-9999.rst | 275 ------------------------------------------- 2 files changed, 322 insertions(+), 275 deletions(-) create mode 100644 pep-0669.rst delete mode 100644 pep-9999.rst diff --git a/pep-0669.rst b/pep-0669.rst new file mode 100644 index 00000000000..d4a3d6a825c --- /dev/null +++ b/pep-0669.rst @@ -0,0 +1,322 @@ +PEP: 669 +Title: High Performance Instrumentation and Monitoring for CPython +Author: Mark Shannon +Status: Draft +Type: Standards +Content-Type: text/x-rst +Created: 18-Aug-2021 +Post-History: 13-Sep-2021 + + +Abstract +======== + +Using a profiler or debugging in CPython can have a severe impact on +performance. Slowdowns by an order of magnitude are not uncommon. +It does not have this bad. + +This PEP proposes an API for instrumentation and monitoring of Python +programs running on CPython that will enable the insertion of instrumentation +and monitoring at low cost. + +Using the new API, code run under a debugger on 3.11 should easily outperform +code run without a debugger on 3.10. + +For relatively light instrumentation, such as required for ``cProfile``, +the performance of instrumented programs running on 3.11 should be +comparable with uninstrumented programs on 3.10. + +Motivation +========== + +Developers should not have to pay an unreasonable cost to use debuggers, +profilers and other similar tools. + +C++ and Java developers expect to be able to run a program at full speed +(or very close to it) under a debugger. +Python developers should expect that too. + +Rationale +========= + +The quickening mechanism provided by PEP 659 provides a way to dynamically +modify executing Python bytecode. These modifications have no cost beyond +the parts of the code that are modified and a relatively low cost to those +parts that are modified. We can leverage this to provide an efficient +mechanism for instrumentation and monitoring that was not possible in 3.10 +or earlier. + +Specification +============= + +There are two parts to this specification, instrumentation and monitoring. + +Instrumentation occurs early in a program's life cycle and persists through +the lifetime of the program. It is expected to be pervasive, but fixed. +Instrumentation is designed to support profiling and coverage tools that +expect to be active for the entire lifetime of the program. + +Monitoring can occur at any point in the program's life and be applied +anywhere in the program. Monitoring points are expected to few. +The capabilities of monitoring are a superset of that of profiling, +but insertion of monitoring points will be *much* more +expensive than insertion of instrumentation. + +Both instrumentation and monitoring is performed by insertion of a +checkpoint in a code object. + +Checkpoints +----------- + +A checkpoint is simply a point in code defined by a +``(codeobject, offset)`` pair. +Every time a checkpoint is reached, the registered callable is called. + +Instrumentation +--------------- + +Instrumentation supports the bulk insertion of checkpoints, but does not +allow insertion or removal of checkpoints after code has started to execute. + +The events are:: + + * BRANCH: A conditional branch is reached. + * JUMPBACK: A backwards, unconditional branch is taken. + * ENTER: A Python function is entered. + * EXIT: A Python function exits normally (without an exception). + * UNWIND: A Python function exits with an unhandled exception. + * C_CALL: A call to any object that is not a Python function. + * C_RETURN: A return from any object that is not a Python function. + * RAISE: An exception is raised. + * EXCEPT: An exception is handled. + +For each ``ENTER`` event there will be a corresponding +``EXIT`` or ``UNWIND`` event. +For each ``C_CALL`` event there will be a corresponding +``C_RETURN`` or ``RAISE`` event. + +All events are integer powers of two and can be bitwise or-ed together to +instrument multiple events. + +Instrumenting code objects +'''''''''''''''''''''''''' + +Code objects can be instrumented by calling:: + + instrumentation.instrument(codeobject, events) + +Code objects must be instrumented before they are executed. +An exception will be raised if the code object has been executed before it +is instrumented. + +Register callback functions for instrumentation +''''''''''''''''''''''''''''''''''''''''''''''' + +To register a callable for events call:: + + instrumentation.register(event, func) + +Functions can be unregistered by calling +``instrumentation.register(event, None)``. + +Callback functions can be registered at any time. + +Callback function arguments +''''''''''''''''''''''''''' + +When an event occurs the registered function will be called. +The arguments provided are as follows: + +* BRANCH: ``func(code: CodeType, offset: int, taken:bool)`` +* JUMPBACK: ``func(code: CodeType, offset: int)`` +* ENTER: ``func(code: CodeType, offset: int)`` +* EXIT: ``func(code: CodeType, offset: int)`` +* C_CALL: ``func(code: CodeType, offset: int, value: object)`` +* C_RETURN: ``func(code: CodeType, offset: int, value: object)`` +* C_EXCEPT: ``func(code: CodeType, offset: int, exception: BaseException)`` +* RAISE: ``func(code: CodeType, offset: int, exception: BaseException)`` +* EXCEPT: ``func(code: CodeType, offset: int)`` +* UNWIND: ``func(code: CodeType)`` + +Monitoring +---------- + +Monitoring allows checkpoints to be inserted or removed at any +point in the program's execution. + +The following functions are provided to insert monitoring points:: + + instrumentation.insert_monitors(codeobject, *offsets) + instrumentation.remove_monitors(codeobject, *offsets) + instrumentation.monitor_off(codeobject, offset) + instrumentation.monitor_on(codeobject, offset) + +All functions return ``True`` if a monitor checkpoint was present, +or ``False`` if a monitor checkpoint was not present. +Turning a non existent checkpoint on or off is a no-op; +no exception is raised. + +To register a callable for monitoring function events call:: + + instrumentation.monitor_register(func) + +The callback function will be called with the code object and offset as arguments:: + + func(code: CodeType, offset: int) + +For optimizing virtual machines, such as future versions of CPython +(and ``PyPy`` should they choose to support this API), a call to +``insert_monitors`` and ``remove_monitors`` in a long running program +could be quite expensive, possibly taking 100s of milliseconds as it +may trigger de-optimizations. Repeated calls to ``insert_monitors`` +and ``remove_monitors``, as may be required in an interactive debugger, +should be relatively inexpensive. + +Combining Checkpoints +--------------------- + +Only one instrumentation checkpoint and one monitoring checkpoint is allowed +per bytecode instruction. It is possible to have both a monitoring and +instrumentation checkpoint on the same instruction; they are independent. +Monitors will be called before instrumentation if both are present. + + +Backwards Compatibility +======================= + +This PEP is fully backwards compatible. + +We may seek to remove ``sys.settrace`` in the future once the APIs provided +by this PEP have been widely adopted, but that is for another PEP. + + +Security Implications +===================== + +Allowing modification of running code has some security implications, +but no more than the ability to generate and call new code. + +All the functions listed above will trigger audit hooks. + + +Implementation +============== + +The implementation of this PEP will be built on top of PEP 659 quickening. +Instrumentation or monitoring of a code object will cause it to be quickened. +Checkpoints will then be implemented by inserting one of several special +``CHECKPOINT`` instructions into the quickened code. These instructions +will call the registered callable before executing the original instruction. + +Note that this can interfere with specialization, which will result in +performance degradation in addition to the overhead of calling the +registered callable. + +Implementing tools +================== + +It is the philosophy of this PEP that third-party tools should be able to +achieve high-performance, not that it should be easy for them to do so. +This PEP provides the necessary API for tools, but does nothing to help +them determine when and where to insert instrumentation or monitors. + +Debuggers +--------- + +Inserting breakpoints +''''''''''''''''''''' + +Breakpoints should be implemented as monitors. +To insert a breakpoint at a given line, the matching instruction offsets +should be found from ``codeobject.co_lines()``. +Then a monitor should be added for each of those offsets. +To avoid excessive overhead, a single call should be made to +``instrumentation.insert_monitors`` passing all the offsets at once. + +Breakpoints can suspended with ``instrumentation.monitor_off``. + +Debuggers can break on exceptions being raised by registering a callable +for ``RAISE``: + +``instrumentation.register(RAISE, break_on_raise_handler)`` + +Stepping +'''''''' + +Debuggers usually offer the ability to step execution by a +single instruction or line. + +This can be implemented by inserting a new monitor at the required +offset(s) of the code to be stepped to, +and by removing or disabling the current monitor. + +It is the job of the debugger to compute the relevant offset(s). + +Coverage Tools +-------------- + +Coverage tools need to track which parts of the control graph have been +executed. To do this, they need to track most events and map those events +onto the control flow graph of the code object. +``BRANCH``, ``JUMPBACK``, ``START`` and ``RESUME`` events will inform which +basic blocks have started to execute. +The ``RAISE`` event with mark any blocks that did not complete. + +This can be then be converted back into a line based report after execution +has completed. + +Profilers +--------- + +Simple profilers need to gather information about calls. +To do this profilers should register for the following events: + +* ENTER +* EXIT +* C_CALL +* C_RETURN +* RAISE +* EXCEPT +* UNWIND + +Line based profilers +'''''''''''''''''''' + +Line based profilers will also need to handle ``BRANCH`` and ``JUMPBACK`` +events. +Beware that handling these extra events will have a large performance impact. + + .. note: + + Instrumenting profilers have a significant overhead and will distort the + results of profiling. Unless you need exact call counts, + consider using a statistical profiler. + +Open Issues +=========== + +[Any points that are still being decided/discussed.] + + +References +========== + +[A collection of URLs used as references through the PEP.] + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-9999.rst b/pep-9999.rst deleted file mode 100644 index 38abb212c25..00000000000 --- a/pep-9999.rst +++ /dev/null @@ -1,275 +0,0 @@ -PEP: 6xx -Title: High Performance Instrumentation and Monitoring -Author: Mark Shannon -Status: Draft -Type: Standards -Content-Type: text/x-rst -Created: 18-Aug-2021 -Post-History: xx-Sep-2021 - - -Abstract -======== - -The performance of Python when using tools that instrument or monitor the runtime -is awful. It does not need to be so bad. - -This PEP proposes a set of APIs for instrumentation and monitoring of Python -programs running on CPython that will enable the insertion of instrumentation -and monitoring at low cost. - -Using the new APIs, code run under a debugger on 3.11 should easily outperform -code run without a debugger on 3.10. - -For relatively light instrumentation, such as required for ``cProfile``, -instrumented programs should run faster on 3.11 than -uninstrumented programs on 3.10. - -For heavier instrumentation, such as required for ``coverage.py``, instrumented -programs on 3.11 are likely to be slower than uninstrumented programs on 3.10. -However, they should be *much* faster than running on 3.10 using ``sys.settrace`` -based tracing. - -Motivation -========== - -Developers should not have to pay an unreasonable cost to use debuggers, profilers -and other tools. - -C++ and Java developers expect to be able to run a program at full speed -(or very close to it). Python developers should expect that too. - -Rationale -========= - - -It makes sense to add these features to 3.11, because PEP 659 provides us with a way -to dynamically modify executing Python bytecode in a way that has no cost to parts -of the code that are not modified and only low cost to those parts that are modified. - -Specification -============= - -There are two parts to this specification, instrumentation and monitoring. - -Instrumentation occurs early in a program's life cycle and persists. -It is expected to be pervasive, but fixed. -Instrumentation is designed to support profiling and coverage tools that -expect to be active for the entire lifetime of the program. - -Monitoring can occur at any point in the program's life and be applied anywhere in -the program. Monitoring points are expected to few. The capabilities of monitoring -are a superset of that of profiling, but insertion of monitoring points will be much more -expensive that insertion of instrumentation. - -Both instrumentation and monitoring is performed by insertion of a checkpoint in a code object. - -Checkpoints ------------ - -A checkpoint is simply a point in code defined by a ``(codeobject, offset)`` pair. -Every time a checkpoint is reached, the registered callable is called. - -Instrumentation ---------------- - -Instrumentation supports the bulk insertion of checkpoints, -but does not allow insertion of checkpoints after code has started to execute. - -``instrumentation.register(event, func)`` - -Functions can be unregistered by calling ``instrumentation.register(event, None)``. - -The events are:: - - * BRANCH: Any conditional branch is taken, or not. - * JUMPBACK: Any backwards, unconditional branch is taken. - * ENTER: A Python function is entered. - * EXIT: A Python function exits normally (without an exception). - * C_CALL: A call to any object that is not a Python function. - * C_RETURN: A return from any object that is not a Python function. - * RAISE: An exception is raised. - * EXCEPT: An exception is handled. - * UNWIND: An exception cause the frame stack to be unwound. - -All events are integer powers of two and can be bitwise or-ed together to instrument multiple events. - -Code objects can be instrumented by calling: - -``instrumentation.instrument(codeobject, events)`` - -Individual instrumentation check points can be turned on or off with: - -``instrumentation.instrument_off(codeobject, offset)`` -``instrumentation.instrument_on(codeobject, offset)`` - -Turning a non-existent checkpoint on or off is a no-op. No exception is raised. - -All the checkpoints matching certain events for a code object can be turned on or off simultaneously with: - -``instrumentation.instrument_all_off(codeobject, events)`` -``instrumentation.instrument_all_on(codeobject, events)`` - -Instrumentation checkpoints cannot be removed. - -Callback functions -'''''''''''''''''' - -When an event occurs the registered function will be called. The arguments provided are as follows: - -* BRANCH: ``func(code: CodeType, offset: int, taken:bool)`` -* JUMPBACK: ``func(code: CodeType, offset: int)`` -* ENTER: ``func(code: CodeType, offset: int)`` -* EXIT: ``func(code: CodeType, offset: int)`` -* C_CALL: ``func(code: CodeType, offset: int, value: object)`` -* C_RETURN: ``func(code: CodeType, offset: int, value: object)`` -* RAISE: ``func(code: CodeType, offset: int, exception: BaseException)`` -* EXCEPT: ``func(code: CodeType, offset: int)`` -* UNWIND: ``func(code: CodeType)`` - -Monitoring ----------- - -Monitoring allows checkpoints to be inserted or removed at any point in the program's execution. - -The following functions are provided: - -``instrumentation.insert_monitors(codeobject, *offsets)`` -``instrumentation.remove_monitors(codeobject, *offsets)`` -``instrumentation.monitor_off(codeobject, offset)`` -``instrumentation.monitor_on(codeobject, offset)`` - -All functions return ``True`` if a monitor checkpoint was present. -Turning a non existent checkpoint on or off is a no-op. No exception is raised. - -``instrumentation.monitor_register(func)`` - -For optimizing virtual machines, such as ``PyPy`` and future versions of CPython, -calls to ``insert_monitors`` and ``remove_monitors`` may be quite expensive. -Calls may take 100s of milliseconds for a large program, as it they trigger de-optimizations. - -Once the call is completed, the impact on performance should be negligible. - -Combining checkpoints ---------------------- - -Only one instrumentation checkpoint and one monitoring checkpoint is allowed per bytecode instruction. -It is possible to have both a monitoring and instrumentation checkpoint on the same instruction; -they are independent. Monitors will be called before instrumentation if both are present. - -Backwards Compatibility -======================= - -This PEP is fully backwards compatible. -We may seek to remove ``sys.settrace`` in the future once the APIs provided by this PEP -have been widely adopted, but that is outside the scope of this PEP. - - -Security Implications -===================== - -Allowing modification of running code has some security implications, -but no more than the ability to generate and call new code. - -All the functions listed above will trigger audit hooks. - - -Implementation -============== - -The implementation of this PEP will be built on top of PEP 659 quickening. -Instrumentation or monitoring of a code object will first cause it to be quickened. -Checkpoints will then be implemented by inserting one of several special ``CHECKPOINT`` -instructions into the quickened code. These instructions will call the registered callable -before executing the original instruction. - -Note that this can interfere with specialization, which will result in performance degradation -in addition to the overhead of calling the registered callable. - -Implementing tools -================== - -Debuggers ---------- - -Most of the features of a debugger are unchanged. Presenting the state of the VM to the user -depends on introspection, not monitoring. It is the insertion of breakpoints, that differs. - -Inserting breakpoints -''''''''''''''''''''' - -Breakpoints are simply monitors. To insert a breakpoint at a given line, the matching instruction -offsets should be found from ``codeobject.co_lines()``. -Then a monitor should be added for each of those offsets. To avoid excessive overhead, a single call -should be made to ``instrumentation.insert_monitors`` passing all the offsets at once. - -Breakpoints can suspended with ``instrumentation.monitor_off``. - -Debuggers can break on exceptions being raised by registering a callable for ``RAISE``: - -``instrumentation.register(RAISE, break_on_raise_handler)`` - -Coverage Tools --------------- - -Coverage tools need to track which parts of the control graph have been executed. To do this, they need -to track most events and map those events onto the control flow graph of the code object. -``BRANCH``, ``JUMPBACK``, ``START`` and ``RESUME`` events will inform which basic blocks have started to execute. -The ``RAISE`` event with mark any blocks that did not complete. - -This can be then be converted back into a line based report after execution has completed. - -Profilers ---------- - -Simple profilers need to gather information about calls. To do this profilers should register for -the following events: - -* CALL: ``func(code: CodeType, offset: int, callable: object)`` -* START: ``func(code: CodeType, offset: int)`` -* RETURN: ``func(code: CodeType, offset: int, value: object)`` -* YIELD: ``func(code: CodeType, offset: int, value: object)`` -* RESUME: ``func(code: CodeType, offset: int)`` -* RAISE: ``func(code: CodeType, offset: int, exception: BaseException)`` -* EXCEPT: ``func(code: CodeType, offset: int)`` - - -Line based profilers -'''''''''''''''''''' - -Line based profilers will also need to handle ``LINE`` events. -Beware that handling ``LINE`` events will have a large performance impact. - - .. note: - - Instrumenting profilers have a large overhead and will distort the results of profiling. - Unless you need exact call counts, consider using a statistical profiler. - -Open Issues -=========== - -[Any points that are still being decided/discussed.] - - -References -========== - -[A collection of URLs used as references through the PEP.] - - -Copyright -========= - -This document is placed in the public domain or under the -CC0-1.0-Universal license, whichever is more permissive. - - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - coding: utf-8 - End: From 1dc252ce8ecbda55856c6daea54e406e6e2352c4 Mon Sep 17 00:00:00 2001 From: Mark Shannon Date: Mon, 13 Sep 2021 15:49:23 +0100 Subject: [PATCH 4/6] Fix formatting errors. --- pep-0669.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-0669.rst b/pep-0669.rst index d4a3d6a825c..cc3b7d01b24 100644 --- a/pep-0669.rst +++ b/pep-0669.rst @@ -2,7 +2,7 @@ PEP: 669 Title: High Performance Instrumentation and Monitoring for CPython Author: Mark Shannon Status: Draft -Type: Standards +Type: Standards Track Content-Type: text/x-rst Created: 18-Aug-2021 Post-History: 13-Sep-2021 @@ -286,7 +286,7 @@ Line based profilers will also need to handle ``BRANCH`` and ``JUMPBACK`` events. Beware that handling these extra events will have a large performance impact. - .. note: +.. note:: Instrumenting profilers have a significant overhead and will distort the results of profiling. Unless you need exact call counts, From ceb9540056d21e5ed47a8a83cde2500d71e1d598 Mon Sep 17 00:00:00 2001 From: Mark Shannon Date: Mon, 13 Sep 2021 16:21:39 +0100 Subject: [PATCH 5/6] More edits to PEP 669. --- pep-0669.rst | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/pep-0669.rst b/pep-0669.rst index cc3b7d01b24..f9880f9c84f 100644 --- a/pep-0669.rst +++ b/pep-0669.rst @@ -11,7 +11,7 @@ Post-History: 13-Sep-2021 Abstract ======== -Using a profiler or debugging in CPython can have a severe impact on +Using a profiler or debugger in CPython can have a severe impact on performance. Slowdowns by an order of magnitude are not uncommon. It does not have this bad. @@ -22,9 +22,7 @@ and monitoring at low cost. Using the new API, code run under a debugger on 3.11 should easily outperform code run without a debugger on 3.10. -For relatively light instrumentation, such as required for ``cProfile``, -the performance of instrumented programs running on 3.11 should be -comparable with uninstrumented programs on 3.10. +Profiling will still slow down execution, but by much less than in 3.10. Motivation ========== @@ -59,11 +57,11 @@ expect to be active for the entire lifetime of the program. Monitoring can occur at any point in the program's life and be applied anywhere in the program. Monitoring points are expected to few. The capabilities of monitoring are a superset of that of profiling, -but insertion of monitoring points will be *much* more +but bulk insertion of monitoring points will be *much* more expensive than insertion of instrumentation. -Both instrumentation and monitoring is performed by insertion of a -checkpoint in a code object. +Both instrumentation and monitoring is performed by insertion of +checkpoints in a code object. Checkpoints ----------- @@ -81,7 +79,7 @@ allow insertion or removal of checkpoints after code has started to execute. The events are:: * BRANCH: A conditional branch is reached. - * JUMPBACK: A backwards, unconditional branch is taken. + * JUMPBACK: A backwards, unconditional branch is reached. * ENTER: A Python function is entered. * EXIT: A Python function exits normally (without an exception). * UNWIND: A Python function exits with an unhandled exception. @@ -153,7 +151,7 @@ The following functions are provided to insert monitoring points:: All functions return ``True`` if a monitor checkpoint was present, or ``False`` if a monitor checkpoint was not present. -Turning a non existent checkpoint on or off is a no-op; +Turning on, or off, a non-existent checkpoint is a no-op; no exception is raised. To register a callable for monitoring function events call:: @@ -168,7 +166,7 @@ For optimizing virtual machines, such as future versions of CPython (and ``PyPy`` should they choose to support this API), a call to ``insert_monitors`` and ``remove_monitors`` in a long running program could be quite expensive, possibly taking 100s of milliseconds as it -may trigger de-optimizations. Repeated calls to ``insert_monitors`` +triggers de-optimizations. Repeated calls to ``insert_monitors`` and ``remove_monitors``, as may be required in an interactive debugger, should be relatively inexpensive. @@ -180,7 +178,6 @@ per bytecode instruction. It is possible to have both a monitoring and instrumentation checkpoint on the same instruction; they are independent. Monitors will be called before instrumentation if both are present. - Backwards Compatibility ======================= @@ -273,11 +270,10 @@ To do this profilers should register for the following events: * ENTER * EXIT +* UNWIND * C_CALL * C_RETURN * RAISE -* EXCEPT -* UNWIND Line based profilers '''''''''''''''''''' From c6dbfdd9558394901bf42f9d85ed361736a382ff Mon Sep 17 00:00:00 2001 From: Mark Shannon Date: Mon, 13 Sep 2021 16:23:59 +0100 Subject: [PATCH 6/6] Change title of PEP 669 --- pep-0669.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0669.rst b/pep-0669.rst index f9880f9c84f..6cd1e14fc13 100644 --- a/pep-0669.rst +++ b/pep-0669.rst @@ -1,5 +1,5 @@ PEP: 669 -Title: High Performance Instrumentation and Monitoring for CPython +Title: Low Impact Instrumentation and Monitoring for CPython Author: Mark Shannon Status: Draft Type: Standards Track