-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C implementation of os.path.splitroot() #102511
Comments
I'd recommend making this a |
Sounds good. We can use Note that existing use of |
I'm going to attempt to implement this, though my C is rather basic so it will take me some time. Rough plan:
Let me know if that doesn't sound right! |
Here's an overview of what I did to experiment with this idea. I started by adding a new PyAPI_FUNC(wchar_t *) _Py_skiproot(wchar_t *path, Py_ssize_t size,
wchar_t **root); I moved the existing root-skipping implementation from I retained the old My initial comparison shows that, when splitting "//server/share/spam/eggs", the new builtin |
My C isn't good enough for this, so someone else please feel free to take a stab at it! |
Reopening and assigning to myself because I'm holding up the backport until we have confidence in the change. My note from the PR: I want to hold this for a bit until we're confident in the change in later versions. Virtually every time we touch this kind of code, we introduce new failures or vulnerabilities, so let's keep it restricted to prerelease versions for now. I'd say after 3.13.0b3 is released we'll reconsider, but that's because I'm hopeful we'll get a decent amount of people trying out those betas. If it seems like nobody is using them, we'll push this further. |
(cherry picked from commit 4765e1f) Co-authored-by: Nice Zombies <nineteendo19d0@gmail.com>
FWIW, we're seeing this affect pip's test suite starting 3.13.0b2 -- based on the changelog, #118263 was the thing that switched stuff over to using the C implementation, but I'm making a comment here since it seems relevant to the C implementation being adopted overall (IIUC). The specific behaviour change we're seeing is that UNC paths are normalised differently.
On Windows with 3.13.0b2, we're seeing failures like the following in CI:
This passes on 3.13.0b1, at least based on the last run yesterday (I don't have a windows dev machine handy). This might be a difference in behaviour between the Python implementation and C implementation. |
This pull request wasn't back ported to 3.12, but the bug also occurs there: Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun 6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from os.path import abspath, normpath
>>> from urllib.parse import urljoin
>>> from urllib.request import pathname2url
>>> path_to_url = lambda path: urljoin("file:", pathname2url(normpath(abspath(path))))
>>> path_to_url(r"\\unc\as\path")
'file:////unc/as/path' |
@pradyunsg, it's caused by #113563: import urllib.parse
from urllib.parse import _coerce_args, urljoin, uses_netloc
def urlunsplit(components):
scheme, netloc, url, query, fragment, _coerce_result = (_coerce_args(*components))
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
if url and url[:1] != '/': url = '/' + url
url = '//' + (netloc or '') + url
if scheme:
url = scheme + ':' + url
if query:
url = url + '?' + query
if fragment:
url = url + '#' + fragment
return _coerce_result(url)
print(urljoin("file:", "////unc/as/path")) # file:////unc/as/path
urllib.parse.urlunsplit = urlunsplit
print(urljoin("file:", "////unc/as/path")) # file://unc/as/path |
Yes, in #113563, >>> r = urllib.parse.urlsplit('file:////unc/as/path')
>>> urllib.parse.urlunsplit(r)
'file:////unc/as/path'
>>> r = urllib.parse.urlsplit('file://unc/as/path')
>>> urllib.parse.urlunsplit(r)
'file://unc/as/path' Previously, this didn't work correctly for 4-slash paths: >>> r = urllib.parse.urlsplit('file:////unc/as/path')
>>> urllib.parse.urlunsplit(r)
'file://unc/as/path'
>>> r = urllib.parse.urlsplit('file://unc/as/path')
>>> urllib.parse.urlunsplit(r)
'file://unc/as/path' |
Eryk, is >>> from nturl2path import pathname2url
>>> pathname2url('//unc/as/path')
'//unc/as/path'
>>> pathname2url(r'\\unc\as\path')
'////unc/as/path' |
@nineteendo, I couldn't find an issue related to this. The standard library doesn't use Also, currently nothing else in the standard library uses the |
…m `path` back to `p` (pythonGH-124097) (cherry picked from commit 3b6bfa7) Co-authored-by: sobolevn <mail@sobolevn.me>
Closing the issue as completed, as the OP feature request seems to have been implemented. |
Technically there's still #119394. |
Feature or enhancement
Speed up
os.path.splitroot()
by implementing it in C.Pitch
Per @eryksun:
Previous discussion
Linked PRs
os.path.splitroot()
#118089os.path.normpath()
for UNC paths on Windows. #119394os.path.splitroot
param name frompath
back top
#124097os.path.splitroot
param name frompath
back top
(GH-124097) #124919The text was updated successfully, but these errors were encountered: