Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.path.normpath truncates input on null bytes in 3.11, but not 3.10 #106242

Closed
chrisjbillington opened this issue Jun 29, 2023 · 9 comments
Closed
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error type-security A security issue

Comments

@chrisjbillington
Copy link

chrisjbillington commented Jun 29, 2023

Bug report

Looks like posix._path_normpath has slightly different behaviour to the python implementation of normpath defined in posixpath, as such os.path.normpath behaves differently on Python 3.11 (where posix._path_normpath is used if it exists) vs 3.10 on posix systems:

Python 3.10:

>>> import os.path
>>> os.path.normpath('hello\x00world')
'hello\x00world'
>>> os.path.normpath('\x00hello')
'\x00hello'

Python 3.11:

>>> import os.path
>>> os.path.normpath('hello\x00world')
'hello'
>>> os.path.normpath('\x00hello')
'.'

Obviously filepaths shouldn't have nulls in them, but the above means invalid input to a program could result in the wrong files or directories being used, rather than an error about embedded nulls once the filepaths are actually used for a system call. And I'm guessing the inconsistency between Python3.10 and 3.11, or between the Python and C implementations of normpath was not intended in any case.

Your environment

CPython 3.11.3, running on Arch Linux

Python 3.11.3 (main, Jun 5 2023, 09:32:32) [GCC 13.1.1 20230429] on linux

Linked PRs

@chrisjbillington chrisjbillington added the type-bug An unexpected behavior, bug, or error label Jun 29, 2023
@eryksun
Copy link
Contributor

eryksun commented Jun 29, 2023

I think this is a matter of documentation. I don't recall whether Steve Dower (@zooba), the author of the C implementation of normpath(), has commented before in regard to this behavior change.

os.path.normpath() is allowed to modify a pathname if doing so doesn't change how the operating system would parse it, with the exception that path parsing is allowed to differ if a symlink is removed by normalizing a ".." component (e.g. "symlink/.." -> ".')1. The documentation doesn't mention embedded null characters, but all supported platforms do use null-terminated strings.

Since the ValueError that gets raised in some cases if a pathname contains embedded null characters is from Python itself, not the operating system, then normpath() shouldn't be obligated to preserve embedded null characters. But the behavior change sneaked in under the radar. It should be documented in "Doc/library/os.path.rst". For example, add a version-changed notification.

   .. versionchanged:: 3.11
      The normalized result no longer retains null characters.  The result is
      based on a substring of the input pathname from the beginning up to but
      not including the first null.

Footnotes

  1. Another exceptional case, which unfortunately is undocumented, is the removal of a leading dot component (i.e. os.curdir) of a relative path, such as "./spam" -> "spam". This can change how the operating system handles the path in a search context. The pathname "./filename" is always resolved relative to the current working directory. On the other hand in a search context, "filename" without a leading "." component is resolved against the set of directories to be searched, which is typically from the PATH environment variable. On Windows, even "dirname/filename" is resolved by searching, in contrast to "./dirname/filename".

@eryksun eryksun added stdlib Python modules in the Lib dir extension-modules C modules in the Modules dir 3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes labels Jun 29, 2023
@zooba
Copy link
Member

zooba commented Jun 29, 2023

It's probably easiest just to make _path_normpath not truncate. Looking at the current code:

    PyObject *result = PyUnicode_FromWideChar(_Py_normpath(buffer, len), -1);

normpath is actually fine, but it needs to return the new length so that we can pass it to FromWideChar. _Py_normpath has the last character when it returns, so can easily calculate the size. It can be passed back through an (optional) pointer to a Py_ssize_t. _Py_normpath is exported (not sure why?), so we really ought to add a new name (_Py_normpathAndSize seems appropriate).

I think that fix can be backported. As noted, the path is invalid anyway, but probably best to preserve behaviour rather than double-down on changing it.

@chrisjbillington
Copy link
Author

chrisjbillington commented Jun 30, 2023

Thanks!

I think maintaining the 3.10 behaviour makes sense, but that if one wanted to double down on a change instead, then a ValueError would be preferred to truncation.

FWIW I notice on Windows, we get truncation on Python 3.11, no truncation on Python3.10, and a ValueError on Python 3.8 (didn't test 3.9). So there is some precedent for a ValueError

My use case is that I'm passing filepaths over a socket to another process, and since null bytes can't appear in filepaths, I'm using nulls as the delimiter. I was converting the paths to absolute paths with os.path.abspath() and then checking for nulls before writing them to the socket. Nulls would result in truncation and my code would write invalid filepaths to the socket instead of raising its own ValueError upon finding the null. I'm now checking for nulls before calling abspath.

@eryksun
Copy link
Contributor

eryksun commented Jun 30, 2023

Raising ValueError is the last thing I'd want normpath() to do for embedded null characters. It wouldn't be consistently handled across all functions in os.path without introducing new behavior.

FWIW I notice on Windows, we get truncation on Python 3.11, no truncation on Python3.10, and a ValueError on Python 3.8 (didn't test 3.9). So there is some precedent for a ValueError

On Windows, os.path.normpath() in Python 3.8 doesn't raise ValueError for embedded null characters. Which input pathnames did you test in which version (e.g. CPython 3.8.10)?

os.path.abspath() is partially based on nt._getfullpathname() on Windows. The latter does raise ValueError if a pathname has embedded null characters because WinAPI GetFullPathNameW() takes a null-terminated string. Prior to 3.11, in this case abspath() has to fall back on an inferior implementation. In 3.11, abspath() calls normpath() first, which avoids having to use the fallback implementation. For example, the fallback implementation lacks support for per-drive working directories.

Python 3.10:

>>> os.getcwd()
'C:\\Windows'
>>> os.path.abspath('E:spam')
'E:\\Temp\\spam'
>>> os.path.abspath('E:spam\0')
'E:spam\x00'

The result "E:spam\0" is wrong because it's a relative path that depends on the working directory on drive "E:". The result should never be a relative path. That's a bug in the fallback implementation, but fixing it isn't on the top of anyone's to-do list.

Python 3.11:

>>> os.getcwd()
'C:\\Windows'
>>> os.path.abspath('E:spam')
'E:\\Temp\\spam'
>>> os.path.abspath('E:spam\0')
'E:\\Temp\\spam'

The new behavior of normpath() is better in this case, though such a weird edge case isn't a strong argument in its favor.

My use case is that I'm passing filepaths over a socket to another process, and since null bytes can't appear in filepaths, I'm using nulls as the delimiter. I was converting the paths to absolute paths with os.path.abspath() and then checking for nulls before writing them to the socket. Nulls would result in truncation and my code would write invalid filepaths to the socket instead of raising its own ValueError upon finding the null. I'm now checking for nulls before calling abspath.

Your workaround is how I would have implemented it from the start. abspath() can't add null characters to the result, so it's more efficient to check the path beforehand.

@chrisjbillington
Copy link
Author

On Windows, os.path.normpath() in Python 3.8 doesn't raise ValueError for embedded null characters. Which input pathnames did you test in which version (e.g. CPython 3.8.10)?

Apologies, I'm an idiot - for some reason my fingers typed realpath() for some of my tests. The path checked was b'asd\x00asd'. This is CPython 3.8.10 from the Windows app store, and CPythons 3.10.11 and 3.11.4 downloaded from Python.org, all 64-bit.

Being more careful with my typing:

I see realpath() raising ValueError for embedded nulls on Python 3.8 and 3.10, and truncating on Python 3.11.

I see abspath() and normpath() preserving nulls on Python 3.8 and 3.10, and truncating on 3.11.

I can see that as you said, on Python 3.8 and 3.10, abspath() is catching the ValueError from _getfullpathname() and using the fallback, which preserves nulls. In 3.11 the use of normpath() before calling _getfullpathname() truncates the nulls and so there is no ValueError. Makes sense.

Similarly, with realpath, the ValueError is coming from a call to _getfinalpathname(), but an earlier call to normpath() truncates the nulls on Python 3.11, and so the result is truncation rather than a ValueError.

So it's realpath() where there is precedent for a ValueError. But fair enough if that was bad and raising more ValueErrors would be the more chaotic option.

@eryksun
Copy link
Contributor

eryksun commented Jul 1, 2023

The behavior of os.path.realpath() with embedded null characters looks like an oversight that was introduced on Windows in Python 3.8 and on POSIX in Python 3.10. Prior to 3.8, realpath() on Windows was an alias for abspath(). The implementation in 3.8+ doesn't handle a ValueError from nt._getfinalpathname(). Prior to 3.10, realpath() on POSIX used islink(), which handles a ValueError if it's raised by os.lstat() by returning False. In 3.10+, the new strict parameter of realpath() was implemented on POSIX by calling os.lstat() directly, but ValueError is no longer handled in non-strict mode. These behavior changes crept in because there are no regression tests for realpath() that check pathnames that contain null characters.

zooba added a commit to zooba/cpython that referenced this issue Aug 15, 2023
zooba added a commit to zooba/cpython that referenced this issue Aug 15, 2023
zooba added a commit to zooba/cpython that referenced this issue Aug 15, 2023
ambv pushed a commit that referenced this issue Aug 15, 2023
zooba added a commit to zooba/cpython that referenced this issue Aug 16, 2023
Yhg1s pushed a commit that referenced this issue Aug 16, 2023
…#107981)

* gh-106242: Fix path truncation in os.path.normpath (GH-106816)
* gh-106242: Minor fixup to avoid compiler warnings

---------

Co-authored-by: Finn Womack <flan313@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@gpshead gpshead added the type-security A security issue label Aug 21, 2023
@gpshead
Copy link
Member

gpshead commented Aug 21, 2023

is there anything left to be done here? additional testing and possible other API corner case fixes based on Eryksun's comment?

zooba added a commit to zooba/cpython that referenced this issue Aug 21, 2023
@zooba
Copy link
Member

zooba commented Aug 21, 2023

Yes, I think ntpath.realpath needs some error handling updates to be consistent with abspath. Since this fix is already changing its error behaviour (back), we may as well take the opportunity to change it to match documentation.

zooba added a commit that referenced this issue Aug 22, 2023
…here are embedded nulls (GH-108248)

* gh-106242: Make ntpath.realpath errors consistent with abspath when there are embedded nulls

* Update 2023-08-22-00-36-57.gh-issue-106242.q24ITw.rst

mention Windows and the former incorrect ValueError.

---------

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 22, 2023
…when there are embedded nulls (pythonGH-108248)

* pythongh-106242: Make ntpath.realpath errors consistent with abspath when there are embedded nulls

* Update 2023-08-22-00-36-57.gh-issue-106242.q24ITw.rst

mention Windows and the former incorrect ValueError.

---------

(cherry picked from commit de33b5c)

Co-authored-by: Steve Dower <steve.dower@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 22, 2023
…when there are embedded nulls (pythonGH-108248)

* pythongh-106242: Make ntpath.realpath errors consistent with abspath when there are embedded nulls

* Update 2023-08-22-00-36-57.gh-issue-106242.q24ITw.rst

mention Windows and the former incorrect ValueError.

---------

(cherry picked from commit de33b5c)

Co-authored-by: Steve Dower <steve.dower@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Yhg1s pushed a commit that referenced this issue Aug 22, 2023
… when there are embedded nulls (GH-108248) (#108251)

gh-106242: Make ntpath.realpath errors consistent with abspath when there are embedded nulls (GH-108248)

* gh-106242: Make ntpath.realpath errors consistent with abspath when there are embedded nulls

* Update 2023-08-22-00-36-57.gh-issue-106242.q24ITw.rst

mention Windows and the former incorrect ValueError.

---------

(cherry picked from commit de33b5c)

Co-authored-by: Steve Dower <steve.dower@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
zooba added a commit that referenced this issue Aug 22, 2023
… when there are embedded nulls (GH-108248)

gh-106242: Make ntpath.realpath errors consistent with abspath when there are embedded nulls (GH-108248)

---------

(cherry picked from commit de33b5c)

Co-authored-by: Steve Dower <steve.dower@python.org>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@zooba zooba closed this as completed Aug 22, 2023
@samueloph
Copy link

CVE-2023-41105 was assigned to this.

There's also CVE-2023-40587 which is a CVE against pyramid, but it's about triggering this same issue.

I wasn't involved in these assignments, posting here so it's logged in the GitHub issue.

(CVE-2023-41105 is already tracked at https://github.com/psf/advisory-database)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

No branches or pull requests

5 participants