Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The interaction of top-of-stack caching, deferred reference counting and Py_DECREF #711

Open
markshannon opened this issue Dec 23, 2024 · 0 comments

Comments

@markshannon
Copy link
Member

The is a lot of memory traffic in jitted code. Despite the JIT removing the dispatch overhead, there is still a lot of memory traffic from:

  • Pushing to, and popping from, the evaluation stack
  • Reference counting operations

We can reduce these with top-of-stack (TOS) caching and deferred reference counting, respectively.

TOS Caching

Top of stack caching promises significant performance improvements as we can considerably reduce
memory traffic. Ignoring refcounting, LOAD_FAST goes from a dependent load and store to just the load.
COPY and SWAP need no memory accesses.

(Partially) Deferred Reference Counting

For practical reasons, mainly immediate reclamation, we cannot do full deferred reference counting.
We can however reduce the cost of reference counting by tagging references.

With tagged references and TOS caching, a LOAD_FAST is just one load and an ALU operation.
COPY is just a single ALU operation.

Although the tagged version is not as effective as full deferred reference counting, it does mean we can relax
our guarantees about the state of the VM when GC occurs.
With full deferred reference counting it is vital that the GC be able to see the exact stack.
With tagged deferred reference counting, we only need to guarantee that it cannot
appear to the GC that an object not on the stack is on the stack.

We still want to keep the stack pointer up to date though, as it is important for GC performance.
The more stack references that the GC can see, the more it can cheaply mark as reachable, reducing the overhead of cycle collection.

Overall, this means that TOS caching is correct, should be a large positive effect on jitted code performance and a negligible negative impact on GC performance.

Escaping calls.

Escaping calls are important because:

  • GC can only occur during an escaping call.
  • Almost anything can change in an escaping call, which can invalidate JIT optimizations

For this reason the interpreter code generators understand escaping calls, and save the stack around them.
Since the C compiler would have had to spill registers around the call anyway, this is effectively free.

Py_DECREF

Py_DECREF(obj) does this:

if (is_mortal(obj)) {
    obj->ob_refcnt--;
    if (obj->ob_refcnt == 0) {
         Py_Dealloc(obj);
    }
}

Py_Dealloc escapes, which means that Py_DECREF can escape. This is a problem as Py_DECREF is ubiquitous
and escaping calls defeat many optimization and cause code bulk due the spilling around the call.

Deferred reference counting doesn't help, as it only makes the tests around Py_Dealloc faster. The escaping call to Py_Dealloc is still present.

So we have to treat Py_DECREF specially.

  1. Although Py_Dealloc escapes in general, it doesn't escape for many common objects, like int, float, str. For specialized bytecodes we can mark the Py_DECREF as not escaping.
  2. Do not save the stack pointer across Py_DECREF. This will enable TOS caching across Py_DECREFs. In theory this might reduce the performance of the GC a bit, it practice it should not make any difference.
  3. Py_DECREF can still escape though, so eliminating as many Py_DECREFs as possible is worthwhile. We can elimiate many decrefs in the JIT optimizer.

Correct collection of objects

Avoiding premature collection of objects

  • The reference counting mechanism will not collect an object if there is at least one counted reference to an object.
  • The cycle GC will not collect an object if there is at least one reference to it that the GC can detect and is not part of a cycle.

Noting that references on the stack are not part of cycles, that means that if there is at least one counted reference to an object on the stack, then that object will not be collected. Therefore it is safe for any other references to that object to not be visible to the GC or to be counted.

Therefore we can defer (tag) a reference on the stack, provided that:

  • That reference is on the stack, and there is at least one counted reference to the same object that will outlive this reference.

Deferred collection objects

Some objects are prone to contention. For the free-threaded (racy) build we want to tag references to these objects to avoid contention on the reference count, to do this we are willing to tolerate deferred collection of these objects.

These objects should be created with a reference count of one higher than the true value, and have a bit set in their flags to mark them
as deferred. These objects can then be only be collected by the cyclic GC.

We can defer (tag) a reference on the stack deferred even if there is no other reference to that object, provided that we can be guaranteed that the GC will be able to see this reference. This will be a problem for TOS caching which can hide the TOS from the GC.
This means that TOS caching will not be possible for the free-threaded build.

When can we tag references?

We can defer (tag) a reference on the stack, provided when that reference is on the stack, and there is at least one counted reference to the same object that will outlive this reference.

Note that this is transitive, we can tag a reference if there is another tagged reference is on the stack that will outlive this reference, or a transitive reference to the same object outlives this reference.

Any reference that is passed to a callee can remain tagged, as the caller will outlive the callee. However any reference passed up the stack by return or yield must be converted to a counted reference.

  • A reference can be deferred (tagged) only if it is:
    • Guaranteed to be visible to the GC, or
    • outlived by another reference that is visible to the GC and is not part of a cycle.
  • No object that is not on the stack can be seen as being on the stack by the GC (or any other code).

The above invariants ensure that the GC will not be able to collect non-garbage objects.

However, we cannot know, in general, whether a reference is part of a cycle. But we do know
whether a reference is on the stack and can choose a different set of invariants that can be enforced
and imply the more general invariants:

  • A reference can be deferred (tagged) only if it is:
    • On the stack, and
    • Guaranteed to be outlived by another reference, also on the stack, that transitively refers to the same object.
  • The stack pointer saved to memory must not point beyond the portion of the stack that is valid and written to memory

Weakening the stack invariant

We previously had of goal of implementing the invariant that the entire evaluation stack of all frames would
be in memory and precisely visible to the GC. This would have allowed full deferred reference counting,
but since we have discarded full deferred reference counting for other reasons we no longer need the stronger invariant.

Implementation

We should be able to implement TOS caching and deferred references independently, as long we can rely on the invariants above. The current implementation does support the invariants apart form marking Py_DECREF as escaping. Without that information we cannot be sure that the stack is spilled correctly.
For example, the generated code for POP_TOP (ignoring stats and such):

    _PyStackRef value;
    value = stack_pointer[-1];
    PyStackRef_CLOSE(value);
    stack_pointer += -1;

When PyStackRef_CLOSE(value) is called, value is visible on the stack.
The generated code would need to be:

    _PyStackRef value;
    value = stack_pointer[-1];
    stack_pointer += -1;
    PyStackRef_CLOSE(value);

Once that is done, we can implement TOS caching and deferred reference counting independently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant