gh-117139: Set up the tagged evaluation stack #117186

Fidget-Spinner · 2024-03-23T21:00:20Z

This PR is up mostly just for discussion. It compiles (with warnings) and passes (Ubuntu) tests as of now.

The main point of contention is how do we deal with bytecode that expect arrays of untagged objects? E.g. vectorcall takes an array from the stack of objects. There are two general approaches I can think of:

Untag all values used by the bytecode on the stack, then let them do the usual. This has really bad performance.
Secretly check for tagged values inside API. That is -- make them support tagged values internally, but don't change their signature on the outside. This way we don't break C API. This is kind of bad from a type safety standpoint (and needs us to cast things around), but it's the one with the least performance loss.

For example I currently have a _PyList_FromArraySteal and a _PyList_FromTaggedArraySteal. With the 2nd approach, I just need some bitwise operations in _PyList_FromArraySteal and it should be good to go. Then I can remove _PyList_FromTaggedArraySteal. What do y'all think?

However, we still have to untag everything if we call escaping functions that call 3rd party code (e.g. vectorcall). This is only safe right now for CPython's own C API.

Issue: Set up tagged pointers in the evaluation stack #117139

…tack

…python into tagged_evaluation_stack

colesbury · 2024-03-26T20:51:38Z

I'm a confused by your description of approach 2:

Secretly check for tagged values inside API. That is -- make them support tagged values internally, but don't change their signature on the outside.

That does not look like the approach in the PR, which adds new functions that take tagged values. The approach in the PR seems correct. We can change the signatures of internal-only APIs as needed.

However, we still have to untag everything if we call escaping functions that call 3rd party code (e.g. vectorcall)

If you untag in-place on the stack, you need to Py_INCREF() deferred references: if you mark them as non-deferred on the evaluation stack then they need to reflect that.

I think it's better to do the untagging off the evaluation stack. In the common case, you can have some fixed size PyObject *buf[N] on the C stack and untag into it, for some reasonable small fixed N. In the general case,

colesbury · 2024-03-26T20:56:05Z

Include/object.h

+typedef union {
+    PyObject *obj;
+    uintptr_t bits;
+} _Py_TaggedObject;
+
+#define Py_OBJECT_TAG (0b0)
+
+#ifdef Py_GIL_DISABLED
+#define Py_CLEAR_TAG(tagged) ((PyObject *)((tagged).bits & ~(Py_OBJECT_TAG)))
+#else
+#define Py_CLEAR_TAG(tagged) ((PyObject *)(uintptr_t)((tagged).bits))
+#endif
+
+#define Py_OBJ_PACK(obj) ((_Py_TaggedObject){.bits = (uintptr_t)(obj)})
+
+#define Py_TAG_CAST(o) ((_Py_TaggedObject){.obj = (o)})
+


We should not make any of these macros or data structures part of the public C API.

Fidget-Spinner · 2024-03-26T20:57:07Z

I'm a confused by your description of approach 2:

Secretly check for tagged values inside API. That is -- make them support tagged values internally, but don't change their signature on the outside.

That does not look like the approach in the PR, which adds new functions that take tagged values. The approach in the PR seems correct for internal-only APIs -- we can change their signatures as needed.

I did a mix of both -- for internal functions I converted them to tagged form. For stuff that might be used by other people eg. _PyList_FromArraySteal, I made them internally support tagged pointers.

I think it's better to do the untagging off the evaluation stack. In the common case, you can have some fixed size PyObject *buf[N] on the C stack and untag into it, for some reasonable small fixed N. In the general case,

Ok I will think about that. That does make sense -- to pass to a scratch buffer for now. If we exceed the buffer I assume we need to untag the C stack then. One thing we might want to be wary of is overflowing the C recursion stack with the current recursion limit. Since this will make _Py_EvalEvalFrameDefault a little larger.

Fidget-Spinner · 2024-03-26T21:03:53Z

Also to note: every Py_DECREF(Py_CLEAR_TAG(x)) and Py_INCREF(Py_CLEAR_TAG(x)) and so on is an opportunity to swap out with Py_DECREF_TAGGED(x) and so on in a future PR, when we implement the actual deferring. To keep this PR small and easier to review, I didn't do them for now. Though I think I might as well since this PR is quite bulky already, and to reduce code churn in the future.

colesbury · 2024-03-26T21:20:18Z

For stuff that might be used by other people eg. _PyList_FromArraySteal, I made them internally support tagged pointers.

That's undefined behavior in C -- let's avoid it if possible. If we're concerned about external users, add a new function that takes tagged references (and keep the old one unused, if desired).

If we exceed the buffer I assume we need to untag the C stack then...

If we exceed the fixed size buffer we should allocate enough space for a temporary buffer using PyMem_Malloc().

One thing we might want to be wary of is overflowing the C recursion stack...

I would structure this so that the small, temporary buffer is only used when we perform a vectorcall (and not to a Python function). For example, in pseudo-code:

#define N 5 

PyObject *
PyObject_VectorcallTagged(PyObject *callable, const _Py_TaggedObject *tagged, size_t nargs, PyObject *kwnames)
{
  PyObject *args[N];
  if (nargs >= N) { 
    return PyObject_VectorcallTaggedSlow(callable, tagged, nargs, kwnames);
  }
  untag(args, tagged, nargs);
  return PyObject_Vectorcall(callable, args, nargs, kwnames);
}

PyObject *
PyObject_VectorcallTaggedSlow(PyObject *callable, const _Py_TaggedObject *tagged, size_t nargs, PyObject *kwnames)
{
  PyObject *args = PyMem_Malloc(nargs * sizeof(PyObject*));
  if (args == NULL) ...
  untag(args, tagged, nargs);
  PyObject *res = PyObject_Vectorcall(callable, args, nargs, kwnames);
  PyMem_Free(args);
  return res;
}

Fidget-Spinner · 2024-03-26T21:48:05Z

Thanks for the clear explanation! I will address them tomorrow.

gvanrossum · 2024-03-27T00:57:16Z

Hm, I now recall that the problem of untagging arrays of arguments sunk my attempt (I got it to work, but it was too slow). Hope you fare better!

Python/gc_free_threading.c

…tack

Python/gc_free_threading.c

…tack

nineteendo · 2024-04-25T13:14:28Z

Could you also configure pre-commit? https://devguide.python.org/getting-started/setup-building/#install-pre-commit

index 2bd9c40..3aa7dea 100644
--- a/Tools/cases_generator/analyzer.py
+++ b/Tools/cases_generator/analyzer.py
@@ -359,7 +359,7 @@ def has_error_without_pop(op: parser.InstDef) -> bool:
     "Py_XDECREF_STACKREF",
     "Py_INCREF_STACKREF",
     "Py_XINCREF_TAGGED",
-    "Py_NewRef_StackRef",    
+    "Py_NewRef_StackRef",
     "Py_INCREF",
     "_PyManagedDictPointer_IsValues",
     "_PyObject_GetManagedDict",

…tack

Fidget-Spinner · 2024-04-26T13:43:40Z

!buildbot nogil

bedevere-bot · 2024-04-26T13:43:43Z

🤖 New build scheduled with the buildbot fleet by @Fidget-Spinner for commit d59145b 🤖

The command will test the builders whose names match following regular expression: nogil

The builders matched are:

x86-64 MacOS Intel ASAN NoGIL PR
AMD64 Ubuntu NoGIL Refleaks PR
ARM64 MacOS M1 NoGIL PR
AMD64 Windows Server 2022 NoGIL PR
AMD64 Ubuntu NoGIL PR
ARM64 MacOS M1 Refleaks NoGIL PR
x86-64 MacOS Intel NoGIL PR

Fidget-Spinner · 2024-04-26T15:28:32Z

Here are results from Sam's microbenchmark suite testing scalability. Note that I've only implemented deferred refcounting for method and function calls.

Specs:
20 virtual cores (hyper threading)
pyperf system tune

Main:
object_cfunction MEGA FAILED: 1.1x slower
cmodule_function MEGA FAILED: 2.1x slower
generator FAILED: 3.1x faster
pymethod MEGA FAILED: 1.4x slower
pyfunction MEGA FAILED: 2.7x slower
module_function MEGA FAILED: 1.9x slower
load_string_const MEGA FAILED: 2.1x slower
load_tuple_const MEGA FAILED: 1.9x slower
create_closure MEGA FAILED: 3.6x slower
create_pyobject MEGA FAILED: 2.9x slower

My branch:
object_cfunction FAILED: 9.3x faster
cmodule_function MEGA FAILED: 2.0x slower
generator FAILED: 2.9x faster
pymethod MEGA FAILED: 1.1x slower
pyfunction MEGA FAILED: 2.4x slower
module_function MEGA FAILED: 1.9x slower
load_string_const MEGA FAILED: 2.2x slower
load_tuple_const MEGA FAILED: 2.1x slower
create_closure MEGA FAILED: 3.6x slower
create_pyobject MEGA FAILED: 3.1x slower

Notice that object_cfunction went from 1.1x slower to 9.3x faster on a 20 thread workload!

All tests pass except ASAN. I am now happy with the state this PR is in. So I will close this and start upstreaming things in peices.

Fidget-Spinner · 2024-04-26T19:49:26Z

Benchmarks for the default build suggest a 1% speedup https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20240426-3.13.0a6%2B-d59145b/bm-20240426-linux-x86_64-Fidget%252dSpinner-tagged_evaluation_st-3.13.0a6%2B-d59145b-vs-base.md

Fidget-Spinner added 4 commits March 24, 2024 01:04

Tag objects in ceval, type, gen, frame

7ca9b11

partially convert cases

09dbeca

fix the rest

84142a5

fixups

25bf135

bedevere-app bot mentioned this pull request Mar 23, 2024

Set up tagged pointers in the evaluation stack #117139

Closed

Fidget-Spinner and others added 5 commits March 27, 2024 03:58

fix all remaining warnings

d484628

fix tests

744357e

fix frames

4cc9fb8

Merge remote-tracking branch 'upstream/main' into tagged_evaluation_s…

8423e75

…tack

📜🤖 Added by blurb_it.

6b9ad92

Fidget-Spinner marked this pull request as ready for review March 26, 2024 20:31

Fidget-Spinner requested review from gvanrossum, markshannon and methane as code owners March 26, 2024 20:31

bedevere-app bot added the awaiting core review label Mar 26, 2024

Fidget-Spinner requested a review from colesbury March 26, 2024 20:32

Fidget-Spinner added 4 commits March 27, 2024 04:33

fix mypy errors

0bb6def

Merge branch 'tagged_evaluation_stack' of github.com:Fidget-Spinner/c…

8fe65c1

…python into tagged_evaluation_stack

fix mypy for real this time

8412d46

fix windows builds

973c41c

colesbury reviewed Mar 26, 2024

View reviewed changes

fix JIT builds

4428279

Fidget-Spinner requested a review from brandtbucher as a code owner March 26, 2024 21:01

Fidget-Spinner changed the title ~~gh-117139: Tagged evaluation stack~~ gh-117139: Set up the tagged evaluation stack Mar 26, 2024

colesbury reviewed Apr 11, 2024

View reviewed changes

Python/gc_free_threading.c Outdated Show resolved Hide resolved

colesbury reviewed Apr 11, 2024

View reviewed changes

Python/gc_free_threading.c Outdated Show resolved Hide resolved

Fidget-Spinner added 4 commits April 13, 2024 02:28

fix bugs in frame pushing and popping

bc2a6ec

fix for shim frames

3bb3de2

stackref -> tagged

8afd8b6

Merge remote-tracking branch 'upstream/main' into tagged_evaluation_s…

87fda3e

…tack

colesbury reviewed Apr 12, 2024

View reviewed changes

Python/gc_free_threading.c Outdated Show resolved Hide resolved

Fidget-Spinner added 12 commits April 13, 2024 04:53

Fix _PyFrame_Copy

3269f20

fix memleak and address sam's comment

693b64b

all tests pass except references

03af3a4

Fix some refleaks

aef8e3c

fix another references

87112bb

clean up a little

a34cec8

undo gc changes

5b4ddb6

rename stack to stackref

019bc3d

defer stack for bound methods

b38e507

fix a bunch of ownership, and deferred methods

933b5b4

Merge remote-tracking branch 'upstream/main' into tagged_evaluation_s…

11af18d

…tack

fix upstream merge conflicts

1bbf021

Fidget-Spinner added 5 commits April 25, 2024 23:58

Merge remote-tracking branch 'upstream/main' into tagged_evaluation_s…

bd6889b

…tack

fix merge conflicts

bafd342

lint

752a49b

use old code, fix decref on traceback

c6be5aa

make bound methods work for keyword calls as well

d59145b

Fidget-Spinner closed this Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-117139: Set up the tagged evaluation stack #117186

gh-117139: Set up the tagged evaluation stack #117186

Fidget-Spinner commented Mar 23, 2024 •

edited by bedevere-app bot

Loading

colesbury commented Mar 26, 2024 •

edited

Loading

colesbury Mar 26, 2024

Fidget-Spinner commented Mar 26, 2024

Fidget-Spinner commented Mar 26, 2024 •

edited

Loading

colesbury commented Mar 26, 2024

Fidget-Spinner commented Mar 26, 2024

gvanrossum commented Mar 27, 2024

nineteendo commented Apr 25, 2024

Fidget-Spinner commented Apr 26, 2024

bedevere-bot commented Apr 26, 2024

Fidget-Spinner commented Apr 26, 2024 •

edited

Loading

Fidget-Spinner commented Apr 26, 2024 •

edited

Loading

gh-117139: Set up the tagged evaluation stack #117186

gh-117139: Set up the tagged evaluation stack #117186

Conversation

Fidget-Spinner commented Mar 23, 2024 • edited by bedevere-app bot Loading

colesbury commented Mar 26, 2024 • edited Loading

colesbury Mar 26, 2024

Choose a reason for hiding this comment

Fidget-Spinner commented Mar 26, 2024

Fidget-Spinner commented Mar 26, 2024 • edited Loading

colesbury commented Mar 26, 2024

Fidget-Spinner commented Mar 26, 2024

gvanrossum commented Mar 27, 2024

nineteendo commented Apr 25, 2024

Fidget-Spinner commented Apr 26, 2024

bedevere-bot commented Apr 26, 2024

Fidget-Spinner commented Apr 26, 2024 • edited Loading

Fidget-Spinner commented Apr 26, 2024 • edited Loading

Fidget-Spinner commented Mar 23, 2024 •

edited by bedevere-app bot

Loading

colesbury commented Mar 26, 2024 •

edited

Loading

Fidget-Spinner commented Mar 26, 2024 •

edited

Loading

Fidget-Spinner commented Apr 26, 2024 •

edited

Loading

Fidget-Spinner commented Apr 26, 2024 •

edited

Loading