Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement deferred reference counting in free-threaded builds #117376

Open
colesbury opened this issue Mar 29, 2024 · 0 comments
Open

Implement deferred reference counting in free-threaded builds #117376

colesbury opened this issue Mar 29, 2024 · 0 comments
Assignees
Labels
topic-free-threading type-feature A feature request or enhancement

Comments

@colesbury
Copy link
Contributor

colesbury commented Mar 29, 2024

Feature or enhancement

@Fidget-Spinner has started implementing tagged pointers in the evaluation stack in #117139.

There are two other pieces needed for deferred reference counting support in the free-threaded build:

  1. We need a way to indicate that a PyObject uses deferred reference counting (in the PyObject header)
  2. We need to collect unreachable objects that use deferred reference counting in the GC

Object representation (done)

I think we should use a bit in ob_gc_bits 1 to mark objects that support deferred reference counting. This differs a bit from PEP 703, which says "The two most significant bits [of ob_ref_local] are used to indicate the object is immortal or uses deferred reference counting."

The flags in ob_gc_bits are, I think, a better marker than ob_ref_local because it avoids any concerns with underflow of a 32-bit field in 64-bit builds. This will make the check if an object is immortal or supports deferred reference counting a tiny bit more expensive (because it will need to check two fields), but I think it's still a better design choice.

Additionally, we don't want objects with deferred references to be destroyed when their refcount would reach zero. We can handle this by adding 1 to the refcount when we mark it as deferred, and account for that when we compute the "gc refs" in the GC.

What types of objects support deferred references to them?

  • code objects
  • descriptors
  • functions (only enabled for top-level functions, not closures)
  • modules and module dictionaries
  • heap types 2

Where can deferred references live?

Deferred references are stored in localsplus[] in frames: both the local variables and the evaluation stack can contain deferred references. This includes suspended generators, so deferred references may occur in the heap without being present in the frame stack.

GC algorithm changes part 1 (done)

We'll need to:

  1. account for the extra reference. Since we don't have a zero count table we've added one to each deferred RC object (see previous section). When we initialize the computed gc_refs from the refcount, we should subtract one for these objects.
  2. mark objects as no longer deferred before we finalize them. This ensures that finalized objects are freed promptly.
  3. mark objects as no longer deferred during shutdown. This ensures prompt destruction during shutdown.

GC algorithm changes part 2 (not yet implemented)

The GC needs special handling for frames with deferred references:

  • When computing gc_refs (i.e, visit_decref) the GC should skip deferred references. We don't want to "subtract one" from gc_refs in this step because deferred references don't have a corresponding "+1".
  • When marking objects as transitively reachable (i.e., visit_clear_unreachable/visit_reachable), the GC should treat deferred references in frames just like other references.

Note that deferred references might be "in the heap" (and possibly form cyclic trash) due to suspended generators or captured frame objects (e.g., from exceptions, sys._getframe(), etc.)

GC thread stack scanning (not yet implemented)

The GC needs to account for deferred references from threads' stacks. This requires an extra step at the start of each GC: scan each frame in each thread and add one to each object with a deferred reference in the frame in order to ensure that it's kept alive. This step is complicated by the fact that active frames generally do not have valid stacktop pointers on the frame. The true stacktop is stored in a local "register" variable in _PyEval_EvalFrameDefault that is not accessible to the GC.

In order to work around this limitation, we do the following:

  • When scanning thread stacks, the GC scans up to the maximum stack top for the frame (i.e., up to co_stacksize) rather than up to current stack top, which the GC can't determine.
  • The stack needs to be zero-initialized: it can't contain garbage data.
  • It's okay for the GC to consider "dead" deferred reference from the running frame. The pointed-to objects can't be dead because only the GC can free objects that use deferred reference counting. Note that the GC doesn't consider non-deferred reference in this step.

We also need to ensure that the frame does not contain dead references from previous executions. The simple way to deal with this is to zero out the frame's stack in _PyThreadState_PushFrame, but there are more efficient strategies that we should consider. For example, the responsibility can be shifted to the GC, which can clear anything above the top frame up to datastack_limit.

Relevant commits from the nogil-3.12 fork

Linked PRs

Footnotes

  1. ob_gc_bits has grown to encompass more than GC related-state, so we may want to consider renaming it.

  2. heap types will also need additional work beyond deferred reference counting so that creating instances scales well.

@colesbury colesbury added type-feature A feature request or enhancement topic-free-threading labels Mar 29, 2024
@colesbury colesbury self-assigned this Mar 29, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Apr 9, 2024
This marks objects as using deferred refrence counting using the
`ob_gc_bits` field in the free-threaded build and collects those objects
during GC.
colesbury added a commit to colesbury/cpython that referenced this issue Apr 11, 2024
This marks objects as using deferred refrence counting using the
`ob_gc_bits` field in the free-threaded build and collects those objects
during GC.
colesbury added a commit to colesbury/cpython that referenced this issue Apr 12, 2024
colesbury added a commit that referenced this issue Apr 12, 2024
…7696)

This marks objects as using deferred refrence counting using the
`ob_gc_bits` field in the free-threaded build and collects those objects
during GC.
colesbury added a commit to colesbury/cpython that referenced this issue Apr 12, 2024
We want code objects to use deferred reference counting in the
free-threaded build. This requires them to be tracked by the GC, so we
set `Py_TPFLAGS_HAVE_GC` in the free-threaded build, but not the default
build.
colesbury added a commit to colesbury/cpython that referenced this issue Apr 12, 2024
We want code objects to use deferred reference counting in the
free-threaded build. This requires them to be tracked by the GC, so we
set `Py_TPFLAGS_HAVE_GC` in the free-threaded build, but not the default
build.
colesbury added a commit to colesbury/cpython that referenced this issue Apr 12, 2024
colesbury added a commit that referenced this issue Apr 16, 2024
We want code objects to use deferred reference counting in the
free-threaded build. This requires them to be tracked by the GC, so we
set `Py_TPFLAGS_HAVE_GC` in the free-threaded build, but not the default
build.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
python#117696)

This marks objects as using deferred refrence counting using the
`ob_gc_bits` field in the free-threaded build and collects those objects
during GC.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…ython#117823)

We want code objects to use deferred reference counting in the
free-threaded build. This requires them to be tracked by the GC, so we
set `Py_TPFLAGS_HAVE_GC` in the free-threaded build, but not the default
build.
colesbury added a commit to colesbury/cpython that referenced this issue Aug 13, 2024
… build

`Py_DECREF` and `PyStackRef_CLOSE` are now implemented as macros in the
free-threaded build in ceval.c. There are two motivations;

 * MSVC has problems inlining functions in ceval.c in the PGO build.

 * We will want to mark escaping calls in order to spill the stack
   pointer in ceval.c and we will want to do this around `_Py_Dealloc`
   (or `_Py_MergeZeroLocalRefcount` or `_Py_DecRefShared`), not around
   the entire `Py_DECREF` or `PyStackRef_CLOSE` call.
colesbury added a commit to colesbury/cpython that referenced this issue Aug 21, 2024
colesbury added a commit that referenced this issue Aug 23, 2024
…#122975)

`Py_DECREF` and `PyStackRef_CLOSE` are now implemented as macros in the
free-threaded build in ceval.c. There are two motivations;

 * MSVC has problems inlining functions in ceval.c in the PGO build.

 * We will want to mark escaping calls in order to spill the stack
   pointer in ceval.c and we will want to do this around `_Py_Dealloc`
   (or `_Py_MergeZeroLocalRefcount` or `_Py_DecRefShared`), not around
   the entire `Py_DECREF` or `PyStackRef_CLOSE` call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant