Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-111140: Fix edge case in PyLong_AsNativeBytes where large negative longs may require an extra byte #116053

Merged
merged 19 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 45 additions & 12 deletions Doc/c-api/long.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,24 +113,28 @@ distinguished from a number. Use :c:func:`PyErr_Occurred` to disambiguate.
retrieved from the resulting value using :c:func:`PyLong_AsVoidPtr`.


.. c:function:: PyObject* PyLong_FromNativeBytes(const void* buffer, size_t n_bytes, int endianness)
.. c:function:: PyObject* PyLong_FromNativeBytes(const void* buffer, size_t n_bytes, int flags)

Create a Python integer from the value contained in the first *n_bytes* of
*buffer*, interpreted as a two's-complement signed number.

*endianness* may be passed ``-1`` for the native endian that CPython was
compiled with, or else ``0`` for big endian and ``1`` for little.
*flags* are as for :c:func:`PyLong_AsNativeBytes`. Passing ``-1`` will select
the native endian that CPython was compiled with and assume that the
most-significant bit is a sign bit. Passing
``Py_ASNATIVEBYTES_UNSIGNED_BUFFER`` will produce the same result as calling
:c:func:`PyLong_FromUnsignedNativeBytes`. Other flags are ignored.

.. versionadded:: 3.13


.. c:function:: PyObject* PyLong_FromUnsignedNativeBytes(const void* buffer, size_t n_bytes, int endianness)
.. c:function:: PyObject* PyLong_FromUnsignedNativeBytes(const void* buffer, size_t n_bytes, int flags)

Create a Python integer from the value contained in the first *n_bytes* of
*buffer*, interpreted as an unsigned number.

*endianness* may be passed ``-1`` for the native endian that CPython was
compiled with, or else ``0`` for big endian and ``1`` for little.
*flags* are as for :c:func:`PyLong_AsNativeBytes`. Passing ``-1`` will select
the native endian that CPython was compiled with and assume that the
most-significant bit is not a sign bit. Flags other than endian are ignored.

.. versionadded:: 3.13

Expand Down Expand Up @@ -354,7 +358,7 @@ distinguished from a number. Use :c:func:`PyErr_Occurred` to disambiguate.
Returns ``NULL`` on error. Use :c:func:`PyErr_Occurred` to disambiguate.


.. c:function:: Py_ssize_t PyLong_AsNativeBytes(PyObject *pylong, void* buffer, Py_ssize_t n_bytes, int endianness)
.. c:function:: Py_ssize_t PyLong_AsNativeBytes(PyObject *pylong, void* buffer, Py_ssize_t n_bytes, int flags)

Copy the Python integer value to a native *buffer* of size *n_bytes*::

Expand Down Expand Up @@ -409,13 +413,42 @@ distinguished from a number. Use :c:func:`PyErr_Occurred` to disambiguate.
// ... use bignum ...
free(bignum);

*endianness* may be passed ``-1`` for the native endian that CPython was
compiled with, or ``0`` for big endian and ``1`` for little.
*flags* is a combination of the flags shown in the table below, or ``-1`` to
select defaults that behave most like a C cast (currently,
``Py_ASNATIVEBYTES_NATIVE_ENDIAN | Py_ASNATIVEBYTES_UNSIGNED_BUFFER``).
Note that ``Py_ASNATIVEBYTES_DEFAULTS`` cannot be combined with other flags.

============================================= ======
Flag Value
============================================= ======
.. c:macro:: Py_ASNATIVEBYTES_DEFAULTS ``-1``
.. c:macro:: Py_ASNATIVEBYTES_BIG_ENDIAN ``0``
.. c:macro:: Py_ASNATIVEBYTES_LITTLE_ENDIAN ``1``
.. c:macro:: Py_ASNATIVEBYTES_NATIVE_ENDIAN ``3``
.. c:macro:: Py_ASNATIVEBYTES_UNSIGNED_BUFFER ``4``
.. c:macro:: Py_ASNATIVEBYTES_REJECT_NEGATIVE ``8``
============================================= ======

Specifying ``Py_ASNATIVEBYTES_NATIVE_ENDIAN`` will override any other endian
flags. Passing ``2`` is reserved.

Specifying ``Py_ASNATIVEBYTES_UNSIGNED_BUFFER`` allows positive input values
that would set the most-significant bit to be converted. For example,
converting 128 into a single byte. Without this flag, these values request a
larger buffer in order to ensure a zero sign bit is included. If the
destination buffer is later treated as signed, a positive input value may
become negative.

Specifying ``Py_ASNATIVEBYTES_REJECT_NEGATIVE`` causes an exception to be set
if *pylong* is negative. Without this flag, negative values will be copied
provided there is enough space for at least one sign bit, regardless of
whether ``Py_ASNATIVEBYTES_UNSIGNED_BUFFER`` was specified.

Returns ``-1`` with an exception raised if *pylong* cannot be interpreted as
an integer. Otherwise, return the size of the buffer required to store the
value. If this is equal to or less than *n_bytes*, the entire value was
copied. ``0`` will never be returned.
an integer, or if *pylong* was negative and
``Py_ASNATIVEBYTES_REJECT_NEGATIVE`` was set. Otherwise, return the size of
the buffer required to store the value. If this is equal to or less than
*n_bytes*, the entire value was copied. ``0`` will never be returned.

Unless an exception is raised, all *n_bytes* of the buffer will always be
written. In the case of truncation, as many of the lowest bits of the value
Expand Down
24 changes: 19 additions & 5 deletions Include/cpython/longobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,24 @@

PyAPI_FUNC(PyObject*) PyLong_FromUnicodeObject(PyObject *u, int base);

#define Py_ASNATIVEBYTES_DEFAULTS -1
#define Py_ASNATIVEBYTES_BIG_ENDIAN 0
#define Py_ASNATIVEBYTES_LITTLE_ENDIAN 1
#define Py_ASNATIVEBYTES_NATIVE_ENDIAN 3
#define Py_ASNATIVEBYTES_UNSIGNED_BUFFER 4
#define Py_ASNATIVEBYTES_REJECT_NEGATIVE 8

/* PyLong_AsNativeBytes: Copy the integer value to a native variable.
buffer points to the first byte of the variable.
n_bytes is the number of bytes available in the buffer. Pass 0 to request
the required size for the value.
endianness is -1 for native endian, 0 for big endian or 1 for little.
flags is a bitfield of the following flags:
* 1 - little endian
* 2 - native endian
* 4 - unsigned destination (e.g. don't reject copying 255 into one byte)
* 8 - raise an exception for negative inputs
If flags is -1 (all bits set), native endian is used and value truncation
behaves most like C (allows negative inputs and allow MSB set).
Big endian mode will write the most significant byte into the address
directly referenced by buffer; little endian will write the least significant
byte into that address.
Expand All @@ -24,19 +37,20 @@ PyAPI_FUNC(PyObject*) PyLong_FromUnicodeObject(PyObject *u, int base);
calculate the bit length of an integer object.
*/
PyAPI_FUNC(Py_ssize_t) PyLong_AsNativeBytes(PyObject* v, void* buffer,
Py_ssize_t n_bytes, int endianness);
Py_ssize_t n_bytes, int flags);

/* PyLong_FromNativeBytes: Create an int value from a native integer
n_bytes is the number of bytes to read from the buffer. Passing 0 will
always produce the zero int.
PyLong_FromUnsignedNativeBytes always produces a non-negative int.
endianness is -1 for native endian, 0 for big endian or 1 for little.
flags is the same as for PyLong_AsNativeBytes, but only supports selecting
the endianness or forcing an unsigned buffer.

Returns the int object, or NULL with an exception set. */
PyAPI_FUNC(PyObject*) PyLong_FromNativeBytes(const void* buffer, size_t n_bytes,
int endianness);
int flags);
PyAPI_FUNC(PyObject*) PyLong_FromUnsignedNativeBytes(const void* buffer,
size_t n_bytes, int endianness);
size_t n_bytes, int flags);

PyAPI_FUNC(int) PyUnstable_Long_IsCompact(const PyLongObject* op);
PyAPI_FUNC(Py_ssize_t) PyUnstable_Long_CompactValue(const PyLongObject* op);
Expand Down
118 changes: 111 additions & 7 deletions Lib/test/test_capi/test_long.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,8 +483,12 @@ def test_long_asnativebytes(self):
(-MAX_USIZE, SZ + 1),
(2**255-1, 32),
(-(2**255-1), 32),
(2**255, 33),
(-(2**255), 33), # if you ask, we'll say 33, but 32 would do
(2**256-1, 33),
(-(2**256-1), 33),
(2**256, 33),
(-(2**256), 33),
]:
with self.subTest(f"sizeof-{v:X}"):
buffer = bytearray(b"\x5a")
Expand Down Expand Up @@ -523,15 +527,17 @@ def test_long_asnativebytes(self):
(-1, b'\xff' * 10, min(11, SZ)),
(-42, b'\xd6', 1),
(-42, b'\xff' * 10 + b'\xd6', min(11, SZ)),
# Extracts 255 into a single byte, but requests sizeof(Py_ssize_t)
(255, b'\xff', SZ),
# Extracts 255 into a single byte, but requests 2
# (this is currently a special case, and "should" request SZ)
(255, b'\xff', 2),
(255, b'\x00\xff', 2),
(256, b'\x01\x00', 2),
(0x80, b'\x00' * 7 + b'\x80', min(8, SZ)),
# Extracts successfully (unsigned), but requests 9 bytes
(2**63, b'\x80' + b'\x00' * 7, 9),
# "Extracts", but requests 9 bytes
(-2**63, b'\x80' + b'\x00' * 7, 9),
(2**63, b'\x00\x80' + b'\x00' * 7, 9),
# Extracts into 8 bytes, but if you provide 9 we'll say 9
(-2**63, b'\x80' + b'\x00' * 7, 8),
(-2**63, b'\xff\x80' + b'\x00' * 7, 9),

(2**255-1, b'\x7f' + b'\xff' * 31, 32),
Expand All @@ -548,10 +554,15 @@ def test_long_asnativebytes(self):
(-(2**256-1), b'\x00' * 31 + b'\x01', 33),
(-(2**256-1), b'\xff' + b'\x00' * 31 + b'\x01', 33),
(-(2**256-1), b'\xff\xff' + b'\x00' * 31 + b'\x01', 33),
# However, -2**255 precisely will extract into 32 bytes and return
# success. For bigger buffers, it will still succeed, but will
# return 33
(-(2**255), b'\x80' + b'\x00' * 31, 32),
(-(2**255), b'\xff\x80' + b'\x00' * 31, 33),

# The classic "Windows HRESULT as negative number" case
# HRESULT hr;
# PyLong_CopyBits(<-2147467259>, &hr, sizeof(HRESULT))
# PyLong_AsNativeBytes(<-2147467259>, &hr, sizeof(HRESULT), -1)
# assert(hr == E_FAIL)
(-2147467259, b'\x80\x00\x40\x05', 4),
]:
Expand All @@ -569,14 +580,102 @@ def test_long_asnativebytes(self):
f"PyLong_AsNativeBytes(v, buffer, {n}, <little>)")
self.assertEqual(expect_le, buffer[:n], "<little>")

# Test cases that do not request size for a sign bit when we pass the
# Py_ASNATIVEBYTES_UNSIGNED_BUFFER flag
for v, expect_be, expect_n in [
(255, b'\xff', 1),
# We pass a 2 byte buffer so it just uses the whole thing
(255, b'\x00\xff', 2),

(2**63, b'\x80' + b'\x00' * 7, 8),
# We pass a 9 byte buffer so it uses the whole thing
(2**63, b'\x00\x80' + b'\x00' * 7, 9),

(2**256-1, b'\xff' * 32, 32),
# We pass a 33 byte buffer so it uses the whole thing
(2**256-1, b'\x00' + b'\xff' * 32, 33),
]:
with self.subTest(f"{v:X}-{len(expect_be)}bytes-unsigned"):
n = len(expect_be)
buffer = bytearray(b"\xa5"*n)
self.assertEqual(expect_n, asnativebytes(v, buffer, n, 4),
f"PyLong_AsNativeBytes(v, buffer, {n}, <big|unsigned>)")
self.assertEqual(expect_n, asnativebytes(v, buffer, n, 5),
f"PyLong_AsNativeBytes(v, buffer, {n}, <little|unsigned>)")

# Ensure Py_ASNATIVEBYTES_REJECT_NEGATIVE raises on negative value
with self.assertRaises(OverflowError):
asnativebytes(-1, buffer, 0, 8)

# Check a few error conditions. These are validated in code, but are
# unspecified in docs, so if we make changes to the implementation, it's
# fine to just update these tests rather than preserve the behaviour.
with self.assertRaises(SystemError):
asnativebytes(1, buffer, 0, 2)
with self.assertRaises(TypeError):
asnativebytes('not a number', buffer, 0, -1)

def test_long_asnativebytes_fuzz(self):
import math
from random import Random
from _testcapi import (
pylong_asnativebytes as asnativebytes,
SIZE_MAX,
)

# Abbreviate sizeof(Py_ssize_t) to SZ because we use it a lot
SZ = int(math.ceil(math.log(SIZE_MAX + 1) / math.log(2)) / 8)

rng = Random()
# Allocate bigger buffer than actual values are going to be
buffer = bytearray(260)

for _ in range(1000):
n = rng.randrange(1, 256)
bytes_be = bytes([
# Ensure the most significant byte is nonzero
rng.randrange(1, 256),
*[rng.randrange(256) for _ in range(n - 1)]
])
bytes_le = bytes_be[::-1]
v = int.from_bytes(bytes_le, 'little')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use subTest here, to have the chosen value show up on failure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that subTest is a good idea for non-repeatable values? But I'll see if I can tweak the failure handling to include the value when not running verbose (currently it'll be printed in verbose mode).


expect_1 = expect_2 = (SZ, n)
if bytes_be[0] & 0x80:
# All values are positive, so if MSB is set, expect extra bit
# when we request the size or have a large enough buffer
expect_1 = (SZ, n + 1)
# When passing Py_ASNATIVEBYTES_UNSIGNED_BUFFER, we expect the
# return to be exactly the right size.
expect_2 = (n,)

try:
actual = asnativebytes(v, buffer, 0, -1)
self.assertIn(actual, expect_1)

actual = asnativebytes(v, buffer, len(buffer), 0)
self.assertIn(actual, expect_1)
self.assertEqual(bytes_be, buffer[-n:])

actual = asnativebytes(v, buffer, len(buffer), 1)
self.assertIn(actual, expect_1)
self.assertEqual(bytes_le, buffer[:n])

actual = asnativebytes(v, buffer, n, 4)
self.assertIn(actual, expect_2, bytes_be.hex())
actual = asnativebytes(v, buffer, n, 5)
self.assertIn(actual, expect_2, bytes_be.hex())
except AssertionError as ex:
value_hex = ''.join(reversed([
f'{b:02X}{"" if i % 8 else "_"}'
for i, b in enumerate(bytes_le, start=1)
])).strip('_')
if support.verbose:
print()
print(n, 'bytes')
print('hex =', value_hex)
print('int =', v)
raise
raise AssertionError(f"Value: 0x{value_hex}") from ex

def test_long_fromnativebytes(self):
import math
from _testcapi import (
Expand Down Expand Up @@ -617,6 +716,11 @@ def test_long_fromnativebytes(self):
self.assertEqual(expect_u, fromnativebytes(v_be, n, -1, 0),
f"PyLong_FromUnsignedNativeBytes(buffer, {n}, <native>)")

# Swap the unsigned request for tests and use the
# Py_ASNATIVEBYTES_UNSIGNED_BUFFER flag instead
self.assertEqual(expect_u, fromnativebytes(v_be, n, 4, 1),
f"PyLong_FromNativeBytes(buffer, {n}, <big|unsigned>)")


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Add additional flags to :c:func:`PyLong_AsNativeBytes` and
:c:func:`PyLong_FromNativeBytes` to allow the caller to determine how to handle
edge cases around values that fill the entire buffer.
14 changes: 7 additions & 7 deletions Modules/_testcapi/long.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ pylong_asnativebytes(PyObject *module, PyObject *args)
{
PyObject *v;
Py_buffer buffer;
Py_ssize_t n, endianness;
if (!PyArg_ParseTuple(args, "Ow*nn", &v, &buffer, &n, &endianness)) {
Py_ssize_t n, flags;
if (!PyArg_ParseTuple(args, "Ow*nn", &v, &buffer, &n, &flags)) {
return NULL;
}
if (buffer.readonly) {
Expand All @@ -66,7 +66,7 @@ pylong_asnativebytes(PyObject *module, PyObject *args)
PyBuffer_Release(&buffer);
return NULL;
}
Py_ssize_t res = PyLong_AsNativeBytes(v, buffer.buf, n, (int)endianness);
Py_ssize_t res = PyLong_AsNativeBytes(v, buffer.buf, n, (int)flags);
PyBuffer_Release(&buffer);
return res >= 0 ? PyLong_FromSsize_t(res) : NULL;
}
Expand All @@ -76,8 +76,8 @@ static PyObject *
pylong_fromnativebytes(PyObject *module, PyObject *args)
{
Py_buffer buffer;
Py_ssize_t n, endianness, signed_;
if (!PyArg_ParseTuple(args, "y*nnn", &buffer, &n, &endianness, &signed_)) {
Py_ssize_t n, flags, signed_;
if (!PyArg_ParseTuple(args, "y*nnn", &buffer, &n, &flags, &signed_)) {
return NULL;
}
if (buffer.len < n) {
Expand All @@ -86,8 +86,8 @@ pylong_fromnativebytes(PyObject *module, PyObject *args)
return NULL;
}
PyObject *res = signed_
? PyLong_FromNativeBytes(buffer.buf, n, (int)endianness)
: PyLong_FromUnsignedNativeBytes(buffer.buf, n, (int)endianness);
? PyLong_FromNativeBytes(buffer.buf, n, (int)flags)
: PyLong_FromUnsignedNativeBytes(buffer.buf, n, (int)flags);
PyBuffer_Release(&buffer);
return res;
}
Expand Down
Loading
Loading