-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedded null characters can lead to bugs or even security vulnerabilities #111656
Comments
Just curious, what action is proposed by this issue? |
Since this came up specifically about Even in this list of "things that were fixed", you're not showing actual security issues, although I'm sure a lot of the fixes were good for both security reasons and others. I just want to see concrete, actual, real problems the PyUnicode_AsUTF8 change is fixing. |
Multiple functions of my list of fixed functions (examples) use UTF-8 and so likely called PyUnicode_AsUTF8() before being fixed. My point is that blocking embedded null characters in PyUnicode_AsUTF8() would fix indirectly all of these examples. Some examples of Python 3.12 functions using directly PyUnicode_AsUTF8() without checking for embedded null characters. I doubt that developers using PyUnicode_AsUTF8() are aware of the embedded null issue and took it in account when writing the code. For most of these functions, truncating sounds between bad (surprising or wrong behavior) or very bad (security issue).
I didn't check for functions calling indirectly PyUnicode_AsUTF8(). The exhaustive list should be way longer. Have fun with null characters.
Audit are related to security, right? import sys
def hook(*args):
print(args)
sys.addaudithook(hook)
sys.audit("event\0ignored") Output:
|
The inputs of these APIs are dynamically created? |
I'm sure that there are legit use cases to truncate a string at the first null character on purpose. The question is what is the behavior expected by the majority of users when it comes to null characters? Do developers think ahead of null characters when they write C code? Is the majority of developers aware of this issue? If we add an hypothetical Should the C API be "safe" by default, or should developers opt-in to be safe? Apparently, it's still an open question. I thought that the question was already answered in the past since many functions have been already modified to reject embedded null characters. See the list at: #111089 (comment) |
My You can create a module with arbitrary name and filename: I didn't check every single example to see how easy it is to play with embedded null characters. Maybe it's not possible to "exploit" all examples. I just gave multiple examples to show the diversity of the issue and see that many functions are impacted. |
Java took a different way for this issue: DataOutputStream.writeUTF() encodes the null character as 2 bytes instead of 1 byte, and so doesn't respect the standard. DataInput.readUTF8() can read this format. |
More recent Python vulnerability related to NUL byte: CVE-2023-40587 |
Victor, since this issue seems to be a response to me asking what actual security issue you're fixing by changing |
It's directly related to gh-111089 issue where I changed PyUnicode_AsUTF8() to reject embedded null characters. I close this issue. I explained why in #111089 (comment). In short, |
I just modified PyUnicode_AsUTF8() of the C API to raise an exception if a string contains an embedded null character to reduce the risk of security vulnerabilities. PyUnicode_AsUTF8() caller expects a string terminated by a null byte. If the UTF-8 encoded string contains embedded null byte, the caller is likely to truncate the string without knowing that there are more bytes after "the first" null byte.
See: https://owasp.org/www-community/attacks/Embedding_Null_Code
It's not only about security issue, it can also just be seen as a bug: unwanted behavior.
Previous issues:
http.server
request handling (<=3.10) #103223Discussions:
Example with Python 3.12:
Output:
The
truncated string
part is silently ignored!Multiple functions were modified in the past to prevent this problem. Examples:
PyBytes_AsStringAndSize(str, NULL)
There are exceptions which accept embedded null bytes/characters:
The text was updated successfully, but these errors were encountered: