Skip to content

Commit

Permalink
pythongh-98712: Clarify "readonly bytes-like object" semantics in C a…
Browse files Browse the repository at this point in the history
…rg-parsing docs (python#98710)
  • Loading branch information
encukou committed Dec 23, 2022
1 parent 88d565f commit 49f6ff7
Showing 1 changed file with 35 additions and 19 deletions.
54 changes: 35 additions & 19 deletions Doc/c-api/arg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,24 +34,39 @@ These formats allow accessing an object as a contiguous chunk of memory.
You don't have to provide raw storage for the returned unicode or bytes
area.

In general, when a format sets a pointer to a buffer, the buffer is
managed by the corresponding Python object, and the buffer shares
the lifetime of this object. You won't have to release any memory yourself.
The only exceptions are ``es``, ``es#``, ``et`` and ``et#``.

However, when a :c:type:`Py_buffer` structure gets filled, the underlying
buffer is locked so that the caller can subsequently use the buffer even
inside a :c:type:`Py_BEGIN_ALLOW_THREADS` block without the risk of mutable data
being resized or destroyed. As a result, **you have to call**
:c:func:`PyBuffer_Release` after you have finished processing the data (or
in any early abort case).

Unless otherwise stated, buffers are not NUL-terminated.

Some formats require a read-only :term:`bytes-like object`, and set a
pointer instead of a buffer structure. They work by checking that
the object's :c:member:`PyBufferProcs.bf_releasebuffer` field is ``NULL``,
which disallows mutable objects such as :class:`bytearray`.
There are three ways strings and buffers can be converted to C:

* Formats such as ``y*`` and ``s*`` fill a :c:type:`Py_buffer` structure.
This locks the underlying buffer so that the caller can subsequently use
the buffer even inside a :c:type:`Py_BEGIN_ALLOW_THREADS`
block without the risk of mutable data being resized or destroyed.
As a result, **you have to call** :c:func:`PyBuffer_Release` after you have
finished processing the data (or in any early abort case).

* The ``es``, ``es#``, ``et`` and ``et#`` formats allocate the result buffer.
**You have to call** :c:func:`PyMem_Free` after you have finished
processing the data (or in any early abort case).

* .. _c-arg-borrowed-buffer:

Other formats take a :class:`str` or a read-only :term:`bytes-like object`,
such as :class:`bytes`, and provide a ``const char *`` pointer to
its buffer.
In this case the buffer is "borrowed": it is managed by the corresponding
Python object, and shares the lifetime of this object.
You won't have to release any memory yourself.

To ensure that the underlying buffer may be safely borrowed, the object's
:c:member:`PyBufferProcs.bf_releasebuffer` field must be ``NULL``.
This disallows common mutable objects such as :class:`bytearray`,
but also some read-only objects such as :class:`memoryview` of
:class:`bytes`.

Besides this ``bf_releasebuffer`` requirement, there is no check to verify
whether the input object is immutable (e.g. whether it would honor a request
for a writable buffer, or whether another thread can mutate the data).

.. note::

Expand Down Expand Up @@ -89,7 +104,7 @@ which disallows mutable objects such as :class:`bytearray`.
Unicode objects are converted to C strings using ``'utf-8'`` encoding.

``s#`` (:class:`str`, read-only :term:`bytes-like object`) [const char \*, :c:type:`Py_ssize_t`]
Like ``s*``, except that it doesn't accept mutable objects.
Like ``s*``, except that it provides a :ref:`borrowed buffer <c-arg-borrowed-buffer>`.
The result is stored into two C variables,
the first one a pointer to a C string, the second one its length.
The string may contain embedded null bytes. Unicode objects are converted
Expand All @@ -108,8 +123,9 @@ which disallows mutable objects such as :class:`bytearray`.
pointer is set to ``NULL``.

``y`` (read-only :term:`bytes-like object`) [const char \*]
This format converts a bytes-like object to a C pointer to a character
string; it does not accept Unicode objects. The bytes buffer must not
This format converts a bytes-like object to a C pointer to a
:ref:`borrowed <c-arg-borrowed-buffer>` character string;
it does not accept Unicode objects. The bytes buffer must not
contain embedded null bytes; if it does, a :exc:`ValueError`
exception is raised.

Expand Down

0 comments on commit 49f6ff7

Please sign in to comment.