Embedded null characters can lead to bugs or even security vulnerabilities #111656

vstinner · 2023-11-02T17:10:38Z

I just modified PyUnicode_AsUTF8() of the C API to raise an exception if a string contains an embedded null character to reduce the risk of security vulnerabilities. PyUnicode_AsUTF8() caller expects a string terminated by a null byte. If the UTF-8 encoded string contains embedded null byte, the caller is likely to truncate the string without knowing that there are more bytes after "the first" null byte.

See: https://owasp.org/www-community/attacks/Embedding_Null_Code

It's not only about security issue, it can also just be seen as a bug: unwanted behavior.

Previous issues:

Discussions:

2014: https://mail.python.org/archives/list/python-dev@python.org/thread/MZDL7FZZMRSW5MTIHLSA6ANNMCV7EEZN/

Example with Python 3.12:

import ctypes

libc = ctypes.cdll.LoadLibrary('libc.so.6')
printf = libc.printf
PyUnicode_AsUTF8 = ctypes.pythonapi.PyUnicode_AsUTF8
PyUnicode_AsUTF8.argtypes = (ctypes.py_object,)
PyUnicode_AsUTF8.restype = ctypes.c_char_p

my_string = "World\0truncated string"
printf(b"Hello %s\n", PyUnicode_AsUTF8(my_string))

Output:

Hello World

The truncated string part is silently ignored!

Multiple functions were modified in the past to prevent this problem. Examples:

_dbm.open(): check filename
_gdbm.open(): check filename
PyBytes_AsStringAndSize(str, NULL)
grp.getgrnam(): check name
pwd.getpwnam(): check name
_locale.strxfrm(): check argument
path_converter() of the os module: basically any filename and path
PyUnicode_AsWideCharString()
os.putenv()
_posixsubprocess.fork_exec(): executable_list
_struct.Struct: check format
_tkinter SetVar() and varname_converter()
_winapi.CreateProcess() getenvironment()
PyUnicode_EncodeLocale()
PyUnicode_EncodeFSDefault()
unicode_decode_locale()
PyUnicode_FSConverter()
PyUnicode_DecodeLocale()
PyUnicode_DecodeLocaleAndSize()
PyUnicode_FSDecoder()
PyUnicode_AsUTF8() -- recently modified
_Py_stat(): check path
getargs.c: 's', 'y' and 'z' formats

There are exceptions which accept embedded null bytes/characters:

socket: AF_UNIX socket name

The text was updated successfully, but these errors were encountered:

terryjreedy · 2023-11-02T19:14:50Z

Just curious, what action is proposed by this issue?

Yhg1s · 2023-11-02T20:54:27Z

Since this came up specifically about PyUnicode_AsUTF8, do you have any examples of problems caused by that function in particular? I feel bad for hammering on this, but I have asked that very question what feels like ten times now, and I just get vague "embedded NULs can cause problems" as answer.

Even in this list of "things that were fixed", you're not showing actual security issues, although I'm sure a lot of the fixes were good for both security reasons and others. I just want to see concrete, actual, real problems the PyUnicode_AsUTF8 change is fixing.

vstinner · 2023-11-03T00:14:52Z

Multiple functions of my list of fixed functions (examples) use UTF-8 and so likely called PyUnicode_AsUTF8() before being fixed. My point is that blocking embedded null characters in PyUnicode_AsUTF8() would fix indirectly all of these examples.

Some examples of Python 3.12 functions using directly PyUnicode_AsUTF8() without checking for embedded null characters. I doubt that developers using PyUnicode_AsUTF8() are aware of the embedded null issue and took it in account when writing the code. For most of these functions, truncating sounds between bad (surprising or wrong behavior) or very bad (security issue).

Modules/getpath.c getpath_warn() truncates the message
_hmac.new(key, msg, digestmod) truncates digestmod -- related to security
os.confstr(name), os.sysconf(name): truncate name
socket.getaddrinfo(): truncates port string
syslog.openlog(ident): truncate ident
timemodule.c gettmarg() truncates tm_zone (timezone string)
_xxsubinterpretersmodule.c _copy_raw_string() truncates the string -- that sounds like a bad bug :-(
PyModule_FromDefAndSpec2(): truncate spec.name
PyModule_GetName() truncates the module name
PyModule_GetFilename() truncates the module filename
PyModule_GetAttr() truncates name in the tp->tp_getattr code path
sys.audit() truncates auditEvent -- this sound bad

I didn't check for functions calling indirectly PyUnicode_AsUTF8(). The exhaustive list should be way longer.

Have fun with null characters.

$ ./python -c 'import _imp, types; spec=types.SimpleNamespace(name="sys\0ignored"); _imp.create_builtin(spec)'
python: Python/import.c:939: hashtable_key_from_2_strings: Assertion `strlen(key) == size - 1' failed.
Abandon (core dumped)

$ ./python -c 'import _imp, types; spec=types.SimpleNamespace(name="sys\0ignored", origin=""); _imp.create_dynamic(spec)'
python: Python/import.c:939: hashtable_key_from_2_strings: Assertion `strlen(key) == size - 1' failed.

$ ./python -c 'import _imp, types; name="zipimport\0ignored"; print(_imp.find_frozen(name))'
(None, False, 'zipimport')

$ ./python -c 'import hmac; print(hmac.new(b"data", digestmod="md5\0ignored").hexdigest())'
b60acef36887ece81fffd558f7a62378
$ ./python -c 'import hmac; print(hmac.new(b"data", digestmod="md5").hexdigest())'
b60acef36887ece81fffd558f7a62378

$ ./python -c 'import os; print(os.sysconf("SC_NPROCESSORS_ONLN\0ignored"))'
12

$ ./python -c 'import socket; print(socket.getaddrinfo("google.com", "80\0ignored"))'
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('142.250.179.78', 80)), ...]

$ ./python -c 'import syslog; syslog.openlog("pythontest\0ignored"); syslog.syslog("test")'
$ journalctl -r|head -n1
nov. 03 01:10:41 mona pythontest[149433]: test

Audit are related to security, right?

import sys
def hook(*args):
    print(args)
sys.addaudithook(hook)
sys.audit("event\0ignored")

Output:

('event', ())

corona10 · 2023-11-03T00:20:18Z

PyModule_FromDefAndSpec2(): truncate spec.name
PyModule_GetName() truncates the module name
PyModule_GetFilename() truncates the module filename
PyModule_GetAttr() truncates name in the tp->tp_getattr code path

The inputs of these APIs are dynamically created?
Most of them are not fixed inputs?

vstinner · 2023-11-03T00:21:20Z

I'm sure that there are legit use cases to truncate a string at the first null character on purpose. The question is what is the behavior expected by the majority of users when it comes to null characters? Do developers think ahead of null characters when they write C code? Is the majority of developers aware of this issue?

If we add an hypothetical PyUnicode_AsUTF8Safe() function which raises an exception if a string contains a null character: should existing C extensions move towards this new function, at least for new code? Is it ok that existing C extensions (which are not updated) will continue truncating?

Should the C API be "safe" by default, or should developers opt-in to be safe? Apparently, it's still an open question. I thought that the question was already answered in the past since many functions have been already modified to reject embedded null characters. See the list at: #111089 (comment)

vstinner · 2023-11-03T00:28:24Z

The inputs of these APIs are dynamically created?

My imp.create_builtin() example above calls PyModule_FromDefAndSpec2() with an arbitrary string.

You can create a module with arbitrary name and filename: import types; mod = types.ModuleType("sys\0ignored"); mod.__file__ = "safe_path\0unsafe".

I didn't check every single example to see how easy it is to play with embedded null characters. Maybe it's not possible to "exploit" all examples. I just gave multiple examples to show the diversity of the issue and see that many functions are impacted.

vstinner · 2023-11-03T00:45:40Z

Java took a different way for this issue: DataOutputStream.writeUTF() encodes the null character as 2 bytes instead of 1 byte, and so doesn't respect the standard. DataInput.readUTF8() can read this format.

vstinner · 2023-11-03T00:51:23Z

More recent Python vulnerability related to NUL byte: CVE-2023-40587

Yhg1s · 2023-11-03T08:45:07Z

Victor, since this issue seems to be a response to me asking what actual security issue you're fixing by changing PyUnicode_AsUTF8()'s semantics, let me try to make sure I understand what you're saying here: "No, there are no known security issues caused by PyUnicode_AsUTF8()'s current behaviour." Is that right?

vstinner · 2023-11-03T11:04:54Z

@terryjreedy:

Just curious, what action is proposed by this issue?

It's directly related to gh-111089 issue where I changed PyUnicode_AsUTF8() to reject embedded null characters.

I close this issue. I explained why in #111089 (comment). In short, PyUnicode_AsUTF8() is "too broad" or "too generic" to be treated as "security sensitive".

vstinner added the type-security A security issue label Nov 2, 2023

vstinner mentioned this issue Nov 2, 2023

gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL #111091

Merged

vstinner closed this as completed Nov 3, 2023

This was referenced Nov 3, 2023

gh-111089: Add PyUnicode_AsUTF8NoNUL() function #111688

Closed

[C API] Change PyUnicode_AsUTF8() to return NULL on embedded null characters #111089

Closed

encukou mentioned this issue Nov 8, 2023

Soft-deprecate PyUnicode_AsUTF8 capi-workgroup/api-evolution#39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedded null characters can lead to bugs or even security vulnerabilities #111656

Embedded null characters can lead to bugs or even security vulnerabilities #111656

vstinner commented Nov 2, 2023

terryjreedy commented Nov 2, 2023

Yhg1s commented Nov 2, 2023 •

edited

Loading

vstinner commented Nov 3, 2023

corona10 commented Nov 3, 2023

vstinner commented Nov 3, 2023

vstinner commented Nov 3, 2023

vstinner commented Nov 3, 2023

vstinner commented Nov 3, 2023

Yhg1s commented Nov 3, 2023

vstinner commented Nov 3, 2023

Embedded null characters can lead to bugs or even security vulnerabilities #111656

Embedded null characters can lead to bugs or even security vulnerabilities #111656

Comments

vstinner commented Nov 2, 2023

terryjreedy commented Nov 2, 2023

Yhg1s commented Nov 2, 2023 • edited Loading

vstinner commented Nov 3, 2023

corona10 commented Nov 3, 2023

vstinner commented Nov 3, 2023

vstinner commented Nov 3, 2023

vstinner commented Nov 3, 2023

vstinner commented Nov 3, 2023

Yhg1s commented Nov 3, 2023

vstinner commented Nov 3, 2023

Yhg1s commented Nov 2, 2023 •

edited

Loading