-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-111089: Add PyUnicode_AsUTF8NoNUL() function #111688
Conversation
I'm not sure about For example, if the string is used to open a file, the function doesn't reject Maybe the name should be more explicitly about its purpose, such as: |
the clinic.py changes look fine to me; I have no opinion on the C changes :) |
Clinic changes are ok with me. I'll leave the C API discussion to the C API WG :) |
If this change is merged, it would be interesting to go through PyUnicode_AsUTF8() usage in the Python code base and check if switching PyUnicode_AsUTF8Safe() would be worth it. See #111656 (comment) list for example. |
The working group doesn't exist yet, it's still a draft PEP: https://discuss.python.org/t/pep-731-c-api-working-group-charter/36117 |
@encukou @gpshead @Yhg1s @serhiy-storchaka: Would you mind to review this change? |
Are there strong objections against simply making |
Tools/clinic/clinic.py
Outdated
@@ -4350,7 +4350,7 @@ def parse_arg(self, argname: str, displayname: str, *, limited_capi: bool) -> st | |||
{bad_argument} | |||
goto exit; | |||
}}}} | |||
{paramname} = PyUnicode_AsUTF8({argname}); | |||
{paramname} = PyUnicode_AsUTF8Safe({argname}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be different code for limited_capi
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I wanted to add PyUnicode_AsUTF8Safe() to the limited C API in a separated PR. Maybe it's more convenient to do it in the PR, since there is a lot of code generated by Argument Clinic impacted by these changes.
I suppose that it's too late to change it, same rationale than changing PyUnicode_AsUTF8(). Moreover, |
613243e
to
96b2b8b
Compare
I rebased the PR on the main branch to get the test_asyncio fix. I also squashed commits. |
Wait until the WG can discuss this. I agree that I wouldn't mind naming it |
If "Safe" is too generic, what about "AsUTF8NoNul"? I propose "Nul" instead of "Null" which looks like "NULL pointer". Or maybe "AsUTF8NoNullChars"? "Z" looks too short to me, it's not easy to guess its intent. |
Alternative short name: |
Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8Safe() function should be used instead.
96b2b8b
to
193c18b
Compare
@serhiy-storchaka @erlend-aasland: I renamed the function to PyUnicode_AsUTF8NoNUL(). Are you fine with this name? |
I didn't follow the discussion. My first impression is that I don't understand what that API is supposed to do, based on the name only. |
I am fine with added the embedded null character check in If people not fine with this, ask them with what are they fine. |
I don't know why we need a new public function. For a new function, if you feel like it needs a strlen check, that's fine. I don't think it's a sensible way to deal with C strings (everyone who works with C strings should know about the importance of NUL), but for new APIs I don't really care either way. However:
|
I close my issue: #111089 (comment) |
Revert PyUnicode_AsUTF8() change: it no longer rejects embedded null characters: the PyUnicode_AsUTF8NoNUL() function should be used instead.
📚 Documentation preview 📚: https://cpython-previews--111688.org.readthedocs.build/