Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to traverse the frame stack for third party tools. #100987

Open
markshannon opened this issue Jan 12, 2023 · 49 comments
Open

Make it easier to traverse the frame stack for third party tools. #100987

markshannon opened this issue Jan 12, 2023 · 49 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@markshannon
Copy link
Member

markshannon commented Jan 12, 2023

Profilers and debuggers need to traverse the frame stack, but the layout of the stack is an internal implementation detail.
However can make some limited promises to make porting tools between Python versions a bit easier.

In order to traverse the stack, the offset of the previous pointer needs to be known. To understand the frame, more information is needed.

@pablogsal
@Yhg1s
expressed interest in this.

Linked PRs

@markshannon markshannon added the type-feature A feature request or enhancement label Jan 12, 2023
@markshannon
Copy link
Member Author

Initially, I propose to refactor the PyInterpreterFrame struct such that it starts:

typedef struct _PyInterpreterFrame {
    PyCodeObject *f_code;
    struct _PyInterpreterFrame *previous;
    ...

Currently f_code must be a code object, but we could generalize it to allow other objects.
For example, the shim frame inserted on entry to _PyEval_EvalFrameDefault could have that field set to None indicating it should be skipped in tracebacks, etc.

The order of f_code and previous doesn't really matter, but have f_code first makes #100719 a bit simpler

@pablogsal pablogsal self-assigned this Jan 12, 2023
@pablogsal
Copy link
Member

Let me collect some feedback from maintainers of debuggers and profilers and will comment here the requirements so we can think of solutions.

markshannon added a commit that referenced this issue Feb 13, 2023
…improvement. (GH-100988)

Refactor _PyInterpreterFrame a bit, to assist generator improvement.
carljm added a commit to carljm/cpython that referenced this issue Feb 13, 2023
* main:
  pythongh-101810: Remove duplicated st_ino calculation (pythonGH-101811)
  pythongh-92547: Purge sqlite3_enable_shared_cache() detection from configure (python#101873)
  pythonGH-100987: Refactor `_PyInterpreterFrame` a bit, to assist generator improvement. (pythonGH-100988)
  pythonGH-87849: Simplify stack effect of SEND and specialize it for generators and coroutines. (pythonGH-101788)
  Correct trivial grammar in reset_mock docs (python#101861)
  pythongh-101845: pyspecific: Fix i18n for availability directive (pythonGH-101846)
  pythongh-89792: Limit test_tools freeze test build parallelism based on the number of cores (python#101841)
  pythongh-85984: Utilize new "winsize" functions from termios in pty tests. (python#101831)
  pythongh-89792: Prevent test_tools from copying 1000M of "source" in freeze test (python#101837)
  Fix typo in test_fstring.py (python#101823)
  pythonGH-101797: allocate `PyExpat_CAPI` capsule on heap (python#101798)
  pythongh-101390: Fix docs for `imporlib.util.LazyLoader.factory` to properly call it a class method (pythonGH-101391)
@markshannon
Copy link
Member Author

@pablogsal Any feedback?

@markshannon
Copy link
Member Author

We can further improve traversal of the _PyInterpreterFrame for debugging and introspection by allowing C extensions to create frames without the rigmarole of creating a code object.

We should rename the f_code field to f_executable, and allow any object.

typedef struct _PyVMFrame {
    PyObject *f_executable;
    struct _PyVMFrame *previous;
} PyVMFrame;

Although tools and the VM should tolerate any object, we should in practice only allow a few classes:

  • CodeObject: Implies that the PyVMFrame is a full _PyInterpreterFrame. Only the VM should make this kind of frame
  • Builtin function, method descriptor, slot wrapper, etc. The frame represents a call to the given object.
  • None: Internal shim. Tools should skip this frame.
  • Tuple: First three items should be name, filename, flags where flags determine the meaning of additional entries.

The tuple form is for tools like Cython, Nanobind, etc. Creating a tuple of strs and ints is much simpler and faster than creating a fake code object.

C extension can link themselves into the frame stack at the cost of about 4 memory writes, and 3 reads:

    PyVMFrame frame;
    frame.previous = tstate->current_frame.frame;
    frame.f_executable = &EXECUTABLE_OBJECT;
    tstate->current_frame.frame = &frame;
    /* body of function goes here */
    tstate->current_frame.frame = frame.previous;

We can do this for builtins functions by modifying the vectorcall function assigned to the builtin function/method descriptor.
We would need to benchmark this to see the performance impact, but it will be much cheaper than sys.activate_stack_trampoline()

@pablogsal
Copy link
Member

@pablogsal Any feedback?

I have reached out again to tool authors, give me a couple of days to gather comments. Apologies for the delay

@markshannon
Copy link
Member Author

No problem.

markshannon added a commit that referenced this issue Mar 13, 2023
…PyEval_EvalFrameDefault`. (#102640)

* Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions.

* Remove unused dummy variables.
carljm added a commit to carljm/cpython that referenced this issue Mar 14, 2023
* main: (50 commits)
  pythongh-102674: Remove _specialization_stats from Lib/opcode.py (python#102685)
  pythongh-102660: Handle m_copy Specially for the sys and builtins Modules (pythongh-102661)
  pythongh-102354: change python3 to python in docs examples (python#102696)
  pythongh-81057: Add a CI Check for New Unsupported C Global Variables (pythongh-102506)
  pythonGH-94851: check unicode consistency of static strings in debug mode (python#102684)
  pythongh-100315: clarification to `__slots__` docs. (python#102621)
  pythonGH-100227: cleanup initialization of global interned dict (python#102682)
  doc: Remove a duplicate 'versionchanged' in library/asyncio-task (pythongh-102677)
  pythongh-102013: Add PyUnstable_GC_VisitObjects (python#102014)
  pythonGH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (pythonGH-102649)
  pythongh-102627: Replace address pointing toward malicious web page (python#102630)
  pythongh-98831: Use DECREF_INPUTS() more (python#102409)
  pythongh-101659: Avoid Allocation for Shared Exceptions in the _xxsubinterpreters Module (pythongh-102659)
  pythongh-101524: Fix the ChannelID tp_name (pythongh-102655)
  pythongh-102069: Fix `__weakref__` descriptor generation for custom dataclasses (python#102075)
  pythongh-98169 dataclasses.astuple support DefaultDict (python#98170)
  pythongh-102650: Remove duplicate include directives from multiple source files (python#102651)
  pythonGH-100987: Don't cache references to the names and consts array in `_PyEval_EvalFrameDefault`. (python#102640)
  pythongh-87092: refactor assemble() to a number of separate functions, which do not need the compiler struct (python#102562)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102631)
  ...
@itamarst
Copy link

@benfred -^

@markshannon
Copy link
Member Author

I've made a branch that adds "lightweight" frames (just a pointer to a "code" object and a link pointer), and inserts one for each call to a builtin function in the interpreter. The performance impact is negligible and all builtin function and class calls are present in the frame stack.

Branch: https://github.com/python/cpython/compare/main...faster-cpython:cpython:allow-non-python-frames?expand=1

Performance: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20230315-3.12.0a6%2B-3e2c3ab/bm-20230315-linux-x86_64-faster%252dcpython-allow_non_python_fra-3.12.0a6%2B-3e2c3ab-vs-base.png

@pablogsal
Copy link
Member

I've made a branch that adds "lightweight" frames (just a pointer to a "code" object and a link pointer), and inserts one for each call to a builtin function in the interpreter. The performance impact is negligible and all builtin function and class calls are present in the frame stack.

Branch: https://github.com/python/cpython/compare/main...faster-cpython:cpython:allow-non-python-frames?expand=1

Performance: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20230315-3.12.0a6%2B-3e2c3ab/bm-20230315-linux-x86_64-faster%252dcpython-allow_non_python_fra-3.12.0a6%2B-3e2c3ab-vs-base.png

We still need the concept of entry frames for tools that merge native and python stacks. Why do you removed _PyFrame_IsEntryFrame in your branch?

@markshannon
Copy link
Member Author

It's a proof of concept, it was easier to remove _PyFrame_IsEntryFrame than re-implement it.
_PyFrame_IsEntryFrame can be added back, if necessary.

@pablogsal
Copy link
Member

It's a proof of concept, it was easier to remove _PyFrame_IsEntryFrame than re-implement it. _PyFrame_IsEntryFrame can be added back, if necessary.

👍

@carljm
Copy link
Member

carljm commented Mar 17, 2023

We are also very interested in this proposal from the Cinder JIT perspective.

One difference I see with our use case compared to what the draft PR so far aims to support is that we would like to be able to link in "minimal frames" that are still considered "complete": they are fetched by _PyFrame_GetFirstComplete() and can be materialized into a full PyFrameObject. (In the draft PR here, only _PyInterpreterFrame frames are considered "complete".) We don't want to constantly keep a _PyInterpreterFrame (localsplus contents) up to date while the JIT is running (this is expensive), so we'd need to get some kind of callback to reify our minimal frame into a PyFrameObject on-demand (i.e. some hook into _PyFrame_GetFrameObject).

In the tuple form of f_executable, could there be a well-known bit-flag in the third element that tells the interpreter "this should be considered a 'complete' frame, and the next word in its struct is a function pointer that will take the VMFrame struct and return a PyFrameObject."?

@pablogsal
Copy link
Member

We should rename the f_code field to f_executable, and allow any object.

In general the feedback is that this will make introspection tool much harder to implement. Currently the fact that this can only be a code object makes it very easy to traverse the frame stack. If you allow any Python object it makes it harder or in some cases even impossible.

If we restrict this to a finite set of possibilities, it still makes it much harder but if we add some kind of enumeration to the frame that tells the tool what's going to be there it makes it a bit easier.

In general I don't think that this proposal helps introspection tools, it actually makes the implementation harder and less efficient because more pointers need to be copied and more logic needs to be included.

@pablogsal
Copy link
Member

Some comments from authors:

I feel it won't be too easy to decipher the type of the object remotely. This would likely increase the number of private structures that we need to copy over from Python headers to parse this information (e.g. tuples), making things more complex. Of course one could just try treating the object as a PyCodeObject and check for failures, but this would now imply a potential loss of captured information, unless all the other object types that can appear here are also handled. Perhaps an extra int field that specifies the type of the object being passed with f_executable might help in this direction, to some extent. But perhaps one simplification that depends on a positive answer to the following question could be adopted: is the value f_executable crucial for the actual execution, or is it just added to carry the frame's metadata (e.g. filename, function name, line number, ...)? If that is added just for the metadata, perhaps that could be added directly to the _PyVMFrame structure in the form of extra fields? There could be a core set of fields that are common to all object types (filename, function qualname, location data), plus a generic PyObject reference that can be consumed easily by in-process tools. However, I can see the downside being that the cost would probably end up being slightly more than just 4 W and 3 R operations in general.

@pablogsal
Copy link
Member

In general the sentiment is that the more regular the structure is, the easier is for profilers and debuggers to traverse the stack. The more variations and python-isms (as in, using PyObject* instead of C structs) the harder it makes for these tools to properly traverse the stack, which goes against (partially) what we are trying to do here

@markshannon
Copy link
Member Author

Feedback from who? Which operations for which tools become harder?
It is hard to take vague and anonymous feedback seriously.

No one is forcing tools to handle all possible frames. They can skip frames that have "executable" objects other than code objects. The presence of additional information that tools ignore cannot be worse than that information not being present in the first place.
Add not all tools will ignore it; the PR already give better tracebacks in the faulthandler module.

In general the sentiment is that the more regular the structure is, the easier is for profilers and debuggers to traverse the stack. The more variations and python-isms (as in, using PyObject* instead of C structs) the harder it makes for these tools to properly traverse the stack, which goes against (partially) what we are trying to do here

Two fields, one pointing to the next frame, and one pointing to the executable object, seems quite regular to me.
Traversal of the stack is trivial. Just follow the previous pointer.

@markshannon
Copy link
Member Author

It might be informative to compare this with adding perf support:

  • Adding perf frames causes a slowdown of 8%. The PR above has negligible performance impact.
  • This works on Windows and any machine that does not have perf installed.
  • It works with PEP 523 or PEP 669.

Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this issue Mar 27, 2023
… in `_PyEval_EvalFrameDefault`. (python#102640)

* Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions.

* Remove unused dummy variables.
@pablogsal
Copy link
Member

pablogsal commented Mar 27, 2023

Feedback from who?

Authors of profilers and debuggers. For now authors of Austin, py-spy, scygraph and fil, and myself (memray/pystack). I collected feedback from them but if you prefer that they comment here directly individually I can also ask for that.

Which operations for which tools become harder?

Getting the Python stack from a remote process reading memory.

It might be informative to compare this with adding perf support:

Informative how? perf support is optional, it doesn't affect profiling or debugging tools other than allowing perf to work and it doesn't collide with this work. I am failing to see the argument here. This issue is called "Make it easier to traverse the frame stack for third party tools" and we are literally discussing changes that will achieve the opposite for some tools, not sure how the perf support is involved.

@markshannon
Copy link
Member Author

if you prefer that they comment here directly individually I can also ask for that.

Yes, please.

perf support is optional

Is it really? In that case let's drop it now before it causes trouble for 3.13.
We aren't going to support perf in any future JIT compiler.

@markshannon
Copy link
Member Author

Let's make it clear. The choice isn't between the proposed ABI and the status quo. The choice is between a well defined, if minimal, ABI and no ABI guarantees whatsoever.
The _PyInterpreterFrame struct is internal and will change.

For example, we need to insert frames for shims at the exit from __init__ functions in order to inline them. We might want to do the same for calls to __setitem__ and __setattr__ as well. Create code objects for these shims is a waste of effort.
We might want to replace calls to tuple with a surrogate that constructs the tuple in bytecode. We will want to have the tuple object as the "executable", so that the frame stack looks the same.

And don't forget the producers of frames, as well as the consumers.
JIT compilers may want to insert frames. Cinder does. I suspect we won't, but we might.
Cython and pybind11 may want to insert frames, if not all the time, at least if an error occurs. Creating fake code objects is unnecessary overhead.

If the ABI I'm suggesting is not a good one, then you need to suggest a better one.

I am failing to see the argument here (Contrasting with perf support)

The purpose of adding perf support is that tools can see the mixed C/Python stack.
perf support does that by faking Python frames on the C stack. I propose adding frames for C callables (and other things) to the Python stack.
Adding it to the Python stack means that it is easily accessible to in-process tools, and with very low overhead.
Adding to the C stack requires support for the native debugging format and ABI, and has a large cost.

@pablogsal
Copy link
Member

pablogsal commented Mar 27, 2023

So it doesn't mean optional for us and we are stuck with it?
In that case it needs a PEP.

The feature has landed already in 3.12 and is already released in alpha. I respect your position but I disagree with it. I suggest that if you want to discuss this, we can do it in a more real-time channel other than a GitHub issue.

@pablogsal
Copy link
Member

tighter the restrictions, the easier it is for introspection tools. The looser the restrictions, the easier it is for producers of frame.

Refocusing the discussion on the original issue at hand. I think that if we add some kind of metadata to the frame that tells the tool what kind of frame is this (so basically what's going to be in the "f_executable" field, that's already a win. As you mention, tools may want to skip some of these frames if it cannot be handled.

On the other hand if we support simple structs or simple Python objects (as opposed to custom classes or even dictionaries) in f_executable that's also a win.

Additionally, I would like if we keep the current structure as preserved as possible (with this I mean that most frames will have a f_executable that points to a well-defined code object.

Also, I think having these fields as you propose:

typedef struct _PyInterpreterFrame {
    PyCodeObject *f_code;
    struct _PyInterpreterFrame *previous;
    ...

is a big win as tools don't need to update these definitions every single time.

@markshannon
Copy link
Member Author

The "f_executable" field is its own metadata, as Python objects are self-describing.
Additional data adds bulk to the frame, and slows down frame creation.

We can restrict the number of classes that are officially supported. If "f_executable" object is one of those, then tools can use that information. It is something else, they can just ignore it.

In the minimal case of just supporting code objects:

while (frame);
   if (frame->f_executable->ob_type == &PyCode_Type) {
       do_stuff_with_frame(frame);
   }
   frame = frame->previous;
}

As for what should be supported:

  • Code objects
  • Classes
  • Builtin functions
  • Method descriptors
  • (name, filename, lineno) tuples.
  • (Maybe other C callables, like slot wrappers)*

I'd like to get rid of slot wrappers and other oddities and merge them into builtin functions, but that's another issue.

The VM might create frames for other objects, but tools should ignore them.

@pablogsal
Copy link
Member

The "f_executable" field is its own metadata, as Python objects are self-describing. Additional data adds bulk to the frame, and slows down frame creation.

We can restrict the number of classes that are officially supported. If "f_executable" object is one of those, then tools can use that information. It is something else, they can just ignore it.

In the minimal case of just supporting code objects:

I understand, but this makes life for inspection tools harder because it forces to copy much more stuff (the class and the name at least) instead of inspecting an enum. I would like to have what is in f_executable in the frame object. I understand that you don't but I want to state that I do :)

As for what should be supported:

  • Code objects
  • Classes
  • Builtin functions
  • Method descriptors
  • (name, filename, lineno) tuples.
  • (Maybe other C callables, like slot wrappers)*

I see what you are coming from, but allowing all these things is going to make implementing these tools a nightmare because supporting all these possibilities is a lot. I would like to restrict this list to just simple stuff like tuples, code objects and maybe some c-like struct that can be used for more exotic stuff. This is basically what you said here:

The tighter the restrictions, the easier it is for introspection tools. The looser the restrictions, the easier it is for producers of frame.

I am advocating for much tighter restrictions, but I understand that's not the direction that you want to go and I respect that.

@carljm
Copy link
Member

carljm commented Mar 27, 2023

We could use a tagged pointer in f_executable to provide an easy-to-read "flag" indicating the frame type without adding any additional bulk to frames, and not much extra cost to frame creation. This is what Cinder shadowframes does today: https://github.com/facebookincubator/cinder/blob/cinder/3.10/Include/internal/pycore_shadow_frame_struct.h#L42-L60

In the minimal form, we can leave the low bit 0 to indicate "normal frame, pointer to code object" (then there is also zero overhead in normal interpreter frame creation) and set it to 1 to mean "pointer to something new and different." Then existing tools that want to just handle normal code objects like they already do only need a single bit test to discard frames they don't want to deal with.

There are another two bits we could play with if we want to provide streamlined indication of any other common cases (builtin function, tuple form, maybe?).

I hope we can make life easier for existing inspection tools by making it really easy to detect the common cases they want to care about, but I also hope (from the Cinder JIT perspective) that at least one of the valid options for f_executable is "extensible". E.g. if(name, filename, lineo) tuple is allowed, that it's also valid to have a longer tuple carrying additional payload, with the first three elements interpreted as name, filename, lineo.

@carljm
Copy link
Member

carljm commented Mar 27, 2023

Classes

I'm curious, why would we want frames to hold a pointer to a class (I assume while executing the class body?) rather than to the code object of the class body?

@markshannon
Copy link
Member Author

I'm curious, why would we want frames to hold a pointer to a class

class C: pass

c = C()  # This is a call to a class

@markshannon
Copy link
Member Author

@pablogsal
Is checking for five or six distinct values, rather than two or three that big a deal?
Also, why is comparing to an address a problem, whereas comparing to an int is not?
frame->f_kind == CODE+OBJECT_KIND is a 32 bit comparison.
frame->f_executable == &PyCode_Type is a 64 bit comparison.
Fetching the address of PyCode_Type will need the symbol table, but you'll need that anyway.

@markshannon
Copy link
Member Author

@carlmjm
I don't see the value in tagging bits. The tag you propose holds no additional information, as the same value can be got with the simple comparison f_executable == &PyCodeObject

@pablogsal
Copy link
Member

pablogsal commented Mar 28, 2023

Fetching the address of PyCode_Type will need the symbol table, but you'll need that anyway.

Not necessarily. These tools need to work sometimes with stripped binaries or core files and requiring the symbol table there can be a huge pain compared with just checking against an integer, as we are vendoring the headers anyway. In particular, as an example (please don't focus too much on this) in core files is a huge pain because the address may not be in the core if is in the .rodata segment.


Currently, once you get the frame, you access the f_code pointer and you KNOW is a code object so once you have the layout for it (because we vendor the headers) you know how to extract the function name and the filename.

If we have an integer in the frame that tells what f_executable will have, then we can compare against it directly and know what we are going to find. No extra information or copies are required.

If we need to compare with something now we require:

  • Copy the pointers/structures of all the possible types that can be (that is PyCode_Type and the same for tuples, functions...). This is already problematic because they may not be in the core OR we may not have symbols so we may be unable to locate them even in a live process.
  • Once you find the address of PyCode_Type and friends, you need to relocate to find the real address. Quite simple to do, but is more operations.
  • If we allow random classes then the tools are unable to even compare against pointers because we don't even know where they are

But an enum describing what it contains allows us to KNOW what the pointer will contain and for instance be super sure that the pointer is some custom stuff that we won't be able to understand instead of having to "guess" based on "oh, this pointer is not one of the ones we know about (code, tuples...))

@markshannon
Copy link
Member Author

I appreciate that having extra information will make life easier for a few tool authors, but it might make things a little bit slower for very many Python users.

How do you get the frame without any symbols?

If we allow random classes...

Any class outside of the fixed set (whatever that ends up being) should be ignored, so no "random" classes.

@pablogsal
Copy link
Member

pablogsal commented Mar 28, 2023

How do you get the frame without any symbols?

Find the interpreter state and having the headers vendor so we know the offsets to the pointers in every struct and we know what we are going to find because at the moment is fully determined. The interpreter state can be found because we (cpython) place the runtime structure in its own section so it can be found without symbols:

__attribute__ ((section (".PyRuntime")))

Although this is technically not needed because it can be found by finding the cycle interpreter state <-> thread state by scanning the bss which is what py-spy does.

@carljm
Copy link
Member

carljm commented Mar 28, 2023

I don't see the value in tagging bits. The tag you propose holds no additional information, as the same value can be got with the simple comparison f_executable == &PyCodeObject

Sure, if you have &PyCodeObject available.

Tagging bits could "make life easier for a few tool authors" in the scenarios @pablogsal is mentioning without "making things slower for very many Python users."

EDIT: also, it's not f_executable == &PyCodeObject, it's f_executable->ob_type == &PyCodeObject, so it's adding an extra pointer chase for every frame also.

@markshannon
Copy link
Member Author

Tagging might solve the performance issue. But we need to support 32bit machines, so we only have 2 bits to play with, which is not enough.

@markshannon
Copy link
Member Author

If tools can find the runtime, then we can put an array of pointers there. No runtime overhead at the cost of ~40 bytes.

PyObject *callable_types[] = {
   &PyCode_Type,
   ...
};

@pablogsal
Copy link
Member

If tools can find the runtime, then we can put an array of pointers there.

That would be an acceptable compromise I think.

@markshannon
Copy link
Member Author

OK, let's go with that then.

FTR, one other reason not to use an enum is this: what happens when the enumeration and the executable don't match?
We can be fairly sure it won't happen in our code, but it would be an easy mistake to make in third-party code.

By allowing objects of any class, but designating a small set of "approved" classes, the system is much more robust.

@markshannon
Copy link
Member Author

@pablogsal
Where should the array go, exactly?

@markshannon
Copy link
Member Author

@carljm

I hope we can make life easier for existing inspection tools by making it really easy to detect the common cases they want to care about, but I also hope (from the Cinder JIT perspective) that at least one of the valid options for f_executable is "extensible". E.g. if(name, filename, lineo) tuple is allowed, that it's also valid to have a longer tuple carrying additional payload, with the first three elements interpreted as name, filename, lineo.

I don't see a problem with that.
Tools should check the length of the tuple before extracting the contents, for safety.
We could allow any length array, specifying only that the first three elements, if they exist, should be the name, filename and line number.
("foo",) and ("foo", "foo.py", 121, "special-data-34.8") should both be acceptable.

@pablogsal Would this be OK, or is this too complex for your tastes?

@P403n1x87
Copy link
Contributor

Some comments from authors:

I feel it won't be too easy to decipher the type of the object remotely. This would likely increase the number of private structures that we need to copy over from Python headers to parse this information (e.g. tuples), making things more complex. Of course one could just try treating the object as a PyCodeObject and check for failures, but this would now imply a potential loss of captured information, unless all the other object types that can appear here are also handled. Perhaps an extra int field that specifies the type of the object being passed with f_executable might help in this direction, to some extent. But perhaps one simplification that depends on a positive answer to the following question could be adopted: is the value f_executable crucial for the actual execution, or is it just added to carry the frame's metadata (e.g. filename, function name, line number, ...)? If that is added just for the metadata, perhaps that could be added directly to the _PyVMFrame structure in the form of extra fields? There could be a core set of fields that are common to all object types (filename, function qualname, location data), plus a generic PyObject reference that can be consumed easily by in-process tools. However, I can see the downside being that the cost would probably end up being slightly more than just 4 W and 3 R operations in general.

This would be me, maintainer of Austin. For context, Austin uses system calls like process_vm_readv to read memory out of process.

@P403n1x87
Copy link
Contributor

How do you get the frame without any symbols?

Austin uses symbols to locate _PyRuntime, but if those are not available, there is a fallback on BSS scan to locate something that looks like _PyRuntime or an interpreter state. So symbols are not strictly required (but good to have of course!).

@P403n1x87
Copy link
Contributor

Apologies if I slightly derail the conversation, but I wanted to express the following thought. Based on my experience with Austin, I would regard frame stack unwinding as just one aspect of the more general topic of observability into the Python VM. For example, one other thing that Austin tries to do is to sample the GC state to give an idea of how much CPU time is being spent on GC. Or detect who is holding the GIL to give a better estimate of RSS allocations. Therefore, I would tend to view frame stacks as just a part of what can be observed out of process. So the way I see a tool like Austin extracting this information in the future is by looking into an "observability entry point", much like _PyRuntime, but specifically engineered for out-of-process tools. From there one can rely on an ever growing (in an ideally backwards-compatible fashion) list of things one can observe, e.g.

.section _PyRuntimeStateABI

runtime_state {
  interpreter_state {
    thread_count: int,
    threads: [{
      top_frame: { ... },
      ....,
    ]
  },
  gc_state: ...,
  gil_state: ...,
}

warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
… in `_PyEval_EvalFrameDefault`. (python#102640)

* Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions.

* Remove unused dummy variables.
markshannon added a commit that referenced this issue Jun 14, 2023
…of an internal frame. (GH-105727)

* Add table describing possible executable classes for out-of-process debuggers.

* Remove shim code object creation code as it is no longer needed.

* Make lltrace a bit more robust w.r.t. non-standard frames.
@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants