Tagged deferred references on the stack #393
Replies: 2 comments
-
In the issue you referenced (#138) I did an experiment with tagged pointers on the stack and in fast locals. I managed to get it correct but not fast (#138 (comment) and further). That's not to say it can't be done, but it's a big project. |
Beta Was this translation helpful? Give feedback.
-
This is great! Having a way to reduce as much as possible the cost of reference counting in the interpreter loop will help mitigate the cost of immortal objects. Which reminds me, any other thoughts on where to use the extra 32bits from the refcount field?
This should be easily mitigated by specializing a BINARY_OP_ADD_SMALLINT right? It'll skip the call to |
Beta Was this translation helpful? Give feedback.
-
Using tagged values to avoid unboxing is discussed in #138
See #390 for higher level idea on reducing reference counting overhead.
A tagged reference works as follows:
Uses
Quicker
LOAD
instructionsLOAD_CONST
no longer needs to increment the reference count of the object, because we know that there is at least one strong reference elsewhere.So:
With tagged, deferred reference count:
We have removed a load and a dependent store.
Similarly for
LOAD_FAST
, provided that the compiler can guarantee that the local outlasts the value on the stack.No need to special case non object pointers on stack.
As long as the low bit of a word is zero, we can put anything on the stack and have stack cleanup code handle it safely. For example, when handling exceptions we sometimes need to push
f_lasti
to the stack. Currently we need to box it, which adds a new potential failure mode. With tagging we can push the tagged value to the stack, which cannot fail.There a lot of other tricks we use this for, e.g. #392
The costs
Having to check the tag of references before using them is potentially expensive.
Calling builtin functions
Builtin functions won't be expecting tagged pointers so making vectorcall calls will require an additional pass over the arguments.
This is less of an issue for generated wrappers, and may be an improvement as we can make the callee responsible for the references. [The potential improvement is in performance, not in complexity]
See #374
Binary operators
The cost of zeroing the low-bit to pass the arguments to binary operator functions is negligible.
We already do some reference count checking to see if we can re-use the left hand side of a value
Correctness
Tagged references should never be stored on the heap. By limiting their use to the stacks, both Python and C, we can use the lifetime guarantees of stacks to ensure correctness. If there is a tagged (strong) reference in a (transitive) caller, then we can have as many untagged (borrowed) references as we want.
Example
Consider the code snippet
x = a + b + c
this compiles to
We can compile this to (with stack shown)
We need to change both
BINARY_OP
andSTORE_FAST
to handle tagged references. In both cases they need to check the the tag before decrementing the reference count.The first
BINARY_OP
sees two borrowed references, so doesn't need to do any reference count modification. The secondBINARY_OP
needs toDECREF
the reference toa+b
, in theory it could reuse the memory fora+b
to storea+b+c
.The
STORE_FAST
needs check the tag of the reference stored inx
. This cost effectively nothing as we currently need to check forNULL
.STORE_FAST
goes fromto
Because any borrowed reference on the stack is guaranteed to be backed by a strong reference with a lifetime at least as long as any local variable in the current scope, it is safe to store borrowed references in a local variable.
Escaping references
Sometimes a reference will escape the stack and need to be made a strong reference. For example in
STORE_GLOBAL
.A tagged, deferred reference can be converted to a strong reference pointer as follows:
Branch free alternative:
Beta Was this translation helpful? Give feedback.
All reactions