Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99593: Add tests for Unicode C API (part 2) #99868

Merged
merged 5 commits into from
May 4, 2023

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 29, 2022

Add tests for lower-level functions.

Add tests for lower-level functions.
@@ -16,6 +16,286 @@ class Str(str):


class CAPITest(unittest.TestCase):
# TODO: Test the following function:
#
# PyUnicode_ClearFreeList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyUnicode_ClearFreeList was removed in Python 3.9!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, these tests were written when it was here.

Lib/test/test_capi/test_unicode.py Show resolved Hide resolved
self.assertEqual(new(0, maxchar), '')
self.assertEqual(new(5, maxchar), chr(maxchar)*5)
self.assertEqual(new(0, 0x110000), '')
self.assertRaises(SystemError, new, 5, 0x110000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this error should become a ValueError, but it can be changed outside this PR, since you seem to want to backport these news tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that SystemError is better exception type here. It is a misuse of the C API, you cannot get this error from Python code.

if (!result) {
return NULL;
}
if (size > 0 && maxchar <= 0x10ffff &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds dangerous to return a string to the "Python space" if characters are not initialized when maxchar is greated than 0x10ffff. Can you remove maxchar <= 0x10ffff condition? PyUnicode_Fill() must fail if the fill character is too big (greater than 0x10ffff).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a test for PyUnicode_New(), not for PyUnicode_Fill(). We should see exceptions raised by PyUnicode_New(), not PyUnicode_Fill().

It never happens, because PyUnicode_New() returns NULL if size > 0 and maxchar > 0x10ffff. If it will not return NULL, it is better to get a malformed string than get an exception raised by PyUnicode_Fill() and think that it was raised by PyUnicode_New().

}

result = PyUnicode_WriteChar(to_copy, index, (Py_UCS4)character);
if (result == -1 && PyErr_Occurred()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, -1 means an error. You shouldn't have to check if an exception was raised or not. Or it should become: assert(PyErr_Occurred()).

Same remark for other test wrapper like the one for PyUnicode_Resize().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception was raised, but result is not -1, then what? The code will return NULL, and we will never know that something wrong happened with PyUnicode_WriteChar().

I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.

C assert() are enabled on release (no-debug) builds: see the following code in Modules/_testcapi/parts.h:

// Always enable assertions
#undef NDEBUG

Many _testcapi tests are only implemented with assert().

Modules/_testcapi/unicode.c Show resolved Hide resolved
@@ -186,57 +536,126 @@ unicode_asucs4(PyObject *self, PyObject *args)
buffer[str_len] = 0xffffU;

if (!PyUnicode_AsUCS4(unicode, buffer, buf_len, copy_null)) {
PyMem_Free(buffer);
PyMem_FREE(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_FREE is a deprecated alias to PyMem_Free(). I would expect the opposite change, replace PyMem_FREE with PyMem_Free :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. The code in my branch was written a long time ago, when PyMem_FREE and PyMem_Free were different things. I just copied it over the current code.

Lib/test/test_capi/test_unicode.py Show resolved Hide resolved
self.assertRaises(SystemError, append, 'abc', NULL)
# TODO: Test PyUnicode_Append() with modifiable unicode
# and with NULL as the address.
# TODO: Check reference counts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's done automatically by Refleaks buildbots, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. But since this C API does unusual things with reference counts, it would be better to test it explicitly, if possible.

if SIZEOF_WCHAR_T == 2:
if sys.byteorder == 'little':
encoding = 'utf-16le'
elif sys.byteorder == 'little':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition looks wrong. Maybe just use else:? I don't think that Python supports other endianness. Same a few lines below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I though that explicit check can be better, but made an error in it.

@serhiy-storchaka
Copy link
Member Author

Thank you for your review Victor.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for my addressing my review.

}

result = PyUnicode_WriteChar(to_copy, index, (Py_UCS4)character);
if (result == -1 && PyErr_Occurred()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not use the C assert() in these tests, because I want to make tests working even in non-debug build.

C assert() are enabled on release (no-debug) builds: see the following code in Modules/_testcapi/parts.h:

// Always enable assertions
#undef NDEBUG

Many _testcapi tests are only implemented with assert().

@hugovk
Copy link
Member

hugovk commented Apr 7, 2023

@serhiy-storchaka Victor approved this, ready to merge?

(I removed the 3.10 backport label, it's now security only)

@vstinner vstinner enabled auto-merge (squash) May 4, 2023 14:59
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner
Copy link
Member

vstinner commented May 4, 2023

The CI test "DO-NOT-MERGE / unresolved review" was blocked on "Waiting for status to be reported". I tried to update the PR on main to see if it does unblock the PR.

@vstinner vstinner merged commit 2ba931f into python:main May 4, 2023
@miss-islington
Copy link
Contributor

Thanks @serhiy-storchaka for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @serhiy-storchaka and @vstinner, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2ba931ff727395cf89b290ed313a8e15db0bfcf1 3.11

@vstinner vstinner removed the needs backport to 3.11 only security fixes label May 4, 2023
@vstinner
Copy link
Member

vstinner commented May 4, 2023

Merged, thanks @serhiy-storchaka.

@serhiy-storchaka: If you consider that this change should be backported to Python 3.11, go ahead. But the automated backport failed for an unknown reason.

carljm added a commit to carljm/cpython that referenced this pull request May 5, 2023
* main: (61 commits)
  pythongh-64595: Argument Clinic: Touch source file if any output file changed (python#104152)
  pythongh-64631: Test exception messages in cloned Argument Clinic funcs (python#104167)
  pythongh-68395: Avoid naming conflicts by mangling variable names in Argument Clinic (python#104065)
  pythongh-64658: Expand Argument Clinic return converter docs (python#104175)
  pythonGH-103092: port `_asyncio` freelist to module state (python#104196)
  pythongh-104051: fix crash in test_xxtestfuzz with -We (python#104052)
  pythongh-104190: fix ubsan crash (python#104191)
  pythongh-104106: Add gcc fallback of mkfifoat/mknodat for macOS (pythongh-104129)
  pythonGH-104142: Fix _Py_RefcntAdd to respect immortality (pythonGH-104143)
  pythongh-104112: link from cached_property docs to method-caching FAQ (python#104113)
  pythongh-68968: Correcting message display issue with assertEqual (python#103937)
  pythonGH-103899: Provide a hint when accidentally calling a module (pythonGH-103900)
  pythongh-103963: fix 'make regen-opcode' in out-of-tree builds (python#104177)
  pythongh-102500: Add PEP 688 and 698 to the 3.12 release highlights (python#104174)
  pythonGH-81079: Add case_sensitive argument to `pathlib.Path.glob()` (pythonGH-102710)
  pythongh-91896: Deprecate collections.abc.ByteString (python#102096)
  pythongh-99593: Add tests for Unicode C API (part 2) (python#99868)
  pythongh-102500: Document PEP 688 (python#102571)
  pythongh-102500: Implement PEP 688 (python#102521)
  pythongh-96534: socketmodule: support FreeBSD divert(4) socket (python#96536)
  ...
@serhiy-storchaka serhiy-storchaka added the needs backport to 3.12 bug and security fixes label Jul 10, 2023
@miss-islington
Copy link
Contributor

Thanks @serhiy-storchaka for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @serhiy-storchaka and @vstinner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2ba931ff727395cf89b290ed313a8e15db0bfcf1 3.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants