Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-101000: Add os.path.splitroot() #101002

Merged
merged 31 commits into from
Jan 27, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
21c0ba9
gh-101000: Add os.path.splitroot()
barneygale Jan 12, 2023
836b85d
Use splitroot() from pathlib
barneygale Jan 12, 2023
bc2d1f9
Use splitroot() from posixpath
barneygale Jan 12, 2023
ecdc40d
Use splitroot() from ntpath
barneygale Jan 12, 2023
6592b27
Optimizations
barneygale Jan 12, 2023
78f4227
Correct and expand examples in splitroot() docstring
barneygale Jan 13, 2023
9726ca4
Update Lib/ntpath.py
barneygale Jan 13, 2023
7a6613c
Use splitroot() from pathlib.PurePath.with_name()
barneygale Jan 14, 2023
26a8dba
Reduce ntpath.normpath() diff noise
barneygale Jan 15, 2023
0c237d4
Simplify ntpath.commonpath() now that 'isabs' is unused.
barneygale Jan 15, 2023
11ed3eb
Reduce posixpath.normpath() diff noise
barneygale Jan 15, 2023
2c9eed8
Improve documentation
barneygale Jan 15, 2023
8299e96
Add whatsnew entry.
barneygale Jan 15, 2023
27ffe37
Simplify ntpath.splitroot() slightly
barneygale Jan 15, 2023
9beff2a
Apply suggestions from code review
barneygale Jan 16, 2023
bacdee1
Update Doc/library/os.path.rst
barneygale Jan 16, 2023
4ebe545
Note that drive may be empty on Windows
barneygale Jan 16, 2023
2927afe
Re-order drive example
barneygale Jan 16, 2023
b0aa73e
Update Doc/library/os.path.rst
barneygale Jan 16, 2023
19777d6
Adjust docstring examples
barneygale Jan 18, 2023
32e212e
Apply suggestions from code review
barneygale Jan 19, 2023
37cded3
Update Doc/library/os.path.rst
barneygale Jan 19, 2023
5a8dfce
Change example username in docs to 'Sam'
barneygale Jan 19, 2023
0e75a55
Adjust first paragraph to use prose
barneygale Jan 19, 2023
3663237
Update Doc/library/os.path.rst
barneygale Jan 22, 2023
053729d
Add tests for bytes (POSIX only) and path-like objects (both platforms)
barneygale Jan 22, 2023
694f093
Add tests for mixed path separators (Windows only)
barneygale Jan 22, 2023
e99e3cd
Remove errant newline.
barneygale Jan 22, 2023
f618a00
Move most test cases from `test_splitdrive` to `test_splitroot`
barneygale Jan 22, 2023
1c522c9
Mention pathlib performance improvement in news entry.
barneygale Jan 22, 2023
df17269
Merge branch 'main' into gh-101000-splitroot
AlexWaygood Jan 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions Doc/library/os.path.rst
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,26 @@ the :mod:`glob` module.)
Accepts a :term:`path-like object`.


.. function:: splitroot(path)

Split the pathname *path* into a triad ``(drive, root, tail)`` where:
barneygale marked this conversation as resolved.
Show resolved Hide resolved

1. *drive* is an optional mount point, exactly like :func:`splitdrive`;
barneygale marked this conversation as resolved.
Show resolved Hide resolved
2. *root* is an optional sequence of separators following the drive; and
3. *tail* is anything after the root.

On Posix, *drive* is always empty. The *root* may be empty (relative path),
barneygale marked this conversation as resolved.
Show resolved Hide resolved
a single forward slash (absolute path), or two forward slashes
barneygale marked this conversation as resolved.
Show resolved Hide resolved
(implementation-defined per the POSIX standard).
barneygale marked this conversation as resolved.
Show resolved Hide resolved

On Windows, *drive* may be a UNC sharepoint or a traditional DOS drive. The
barneygale marked this conversation as resolved.
Show resolved Hide resolved
*root* may be empty, a forward slash, or a backward slash.

In all cases, ``drive + root + tail`` will be the same as *path*.

barneygale marked this conversation as resolved.
Show resolved Hide resolved
.. versionadded:: 3.12
barneygale marked this conversation as resolved.
Show resolved Hide resolved


.. function:: splitext(path)

Split the pathname *path* into a pair ``(root, ext)`` such that ``root + ext ==
Expand Down
126 changes: 73 additions & 53 deletions Lib/ntpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
from genericpath import *


__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
__all__ = ["normcase","isabs","join","splitdrive","splitroot","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime", "islink","exists","lexists","isdir","isfile",
"ismount", "expanduser","expandvars","normpath","abspath",
Expand Down Expand Up @@ -117,19 +117,21 @@ def join(path, *paths):
try:
if not paths:
path[:0] + sep #23780: Ensure compatible data type even if p is null.
result_drive, result_path = splitdrive(path)
result_drive, result_root, result_path = splitroot(path)
for p in map(os.fspath, paths):
p_drive, p_path = splitdrive(p)
if p_path and p_path[0] in seps:
p_drive, p_root, p_path = splitroot(p)
if p_root:
# Second path is absolute
if p_drive or not result_drive:
result_drive = p_drive
result_root = p_root
result_path = p_path
continue
elif p_drive and p_drive != result_drive:
if p_drive.lower() != result_drive.lower():
# Different drives => ignore the first path entirely
result_drive = p_drive
result_root = p_root
result_path = p_path
continue
# Same drive in different case
Expand All @@ -139,10 +141,10 @@ def join(path, *paths):
result_path = result_path + sep
result_path = result_path + p_path
## add separator between UNC and non-absolute path
if (result_path and result_path[0] not in seps and
if (result_path and not result_root and
result_drive and result_drive[-1:] != colon):
return result_drive + sep + result_path
return result_drive + result_path
return result_drive + result_root + result_path
except (TypeError, AttributeError, BytesWarning):
genericpath._check_arg_types('join', path, *paths)
raise
Expand All @@ -169,35 +171,61 @@ def splitdrive(p):

Paths cannot contain both a drive letter and a UNC path.

"""
drive, root, tail = splitroot(p)
return drive, root + tail


def splitroot(p):
"""Split a pathname into drive, root and tail. The drive is defined
exactly as in splitdrive(). On Windows, the root may be a single path
separator or an empty string. The tail contains anything after the root.
For example:

splitroot('//server/share/') == ('//server/share', '/', '')
splitroot('C:/Users/Barney') == ('C:', '/', 'Users/Barney')
splitroot('C:///spam///ham') == ('C:', '/', '//spam///egg')
barneygale marked this conversation as resolved.
Show resolved Hide resolved
splitroot('Windows/notepad') == ('', '', 'Windows/notepad')
"""
p = os.fspath(p)
if len(p) >= 2:
if isinstance(p, bytes):
sep = b'\\'
altsep = b'/'
colon = b':'
unc_prefix = b'\\\\?\\UNC\\'
else:
sep = '\\'
altsep = '/'
colon = ':'
unc_prefix = '\\\\?\\UNC\\'
normp = p.replace(altsep, sep)
if normp[0:2] == sep * 2:
if isinstance(p, bytes):
sep = b'\\'
altsep = b'/'
colon = b':'
unc_prefix = b'\\\\?\\UNC\\'
empty = b''
else:
sep = '\\'
altsep = '/'
colon = ':'
unc_prefix = '\\\\?\\UNC\\'
empty = ''
normp = p.replace(altsep, sep)
if normp[:1] == sep:
if normp[1:2] == sep:
# UNC drives, e.g. \\server\share or \\?\UNC\server\share
# Device drives, e.g. \\.\device or \\?\device
start = 8 if normp[:8].upper() == unc_prefix else 2
index = normp.find(sep, start)
if index == -1:
return p, p[:0]
return p, empty, empty
index2 = normp.find(sep, index + 1)
if index2 == -1:
return p, p[:0]
return p[:index2], p[index2:]
if normp[1:2] == colon:
# Drive-letter drives, e.g. X:
return p[:2], p[2:]
return p[:0], p
return p, empty, empty
return p[:index2], p[index2:index2 + 1], p[index2 + 1:]
else:
# Relative path with root, e.g. \Windows
return empty, p[:1], p[1:]
elif normp[1:2] == colon:
if normp[2:3] == sep:
# Absolute drive-letter path, e.g. X:\Windows
return p[:2], p[2:3], p[3:]
AlexWaygood marked this conversation as resolved.
Show resolved Hide resolved
else:
# Relative path with drive, e.g. X:Windows
return p[:2], empty, p[2:]
else:
# Relative path, e.g. Windows
return empty, empty, p


# Split a path in head (everything up to the last '/') and tail (the
Expand All @@ -212,15 +240,13 @@ def split(p):
Either part may be empty."""
p = os.fspath(p)
seps = _get_bothseps(p)
d, p = splitdrive(p)
d, r, p = splitroot(p)
# set i to index beyond p's last slash
i = len(p)
while i and p[i-1] not in seps:
i -= 1
head, tail = p[:i], p[i:] # now tail has no slashes
# remove trailing slashes from head, unless it's all slashes
head = head.rstrip(seps) or head
return d + head, tail
return d + r + head.rstrip(seps), tail
barneygale marked this conversation as resolved.
Show resolved Hide resolved


# Split a path in root and extension.
Expand Down Expand Up @@ -311,10 +337,10 @@ def ismount(path):
path = os.fspath(path)
seps = _get_bothseps(path)
path = abspath(path)
root, rest = splitdrive(path)
if root and root[0] in seps:
return (not rest) or (rest in seps)
if rest and rest in seps:
drive, root, rest = splitroot(path)
if drive and drive[0] in seps:
return not rest
if root and not rest:
return True

if _getvolumepathname:
Expand Down Expand Up @@ -525,14 +551,9 @@ def normpath(path):
curdir = '.'
pardir = '..'
path = path.replace(altsep, sep)
prefix, path = splitdrive(path)

# collapse initial backslashes
if path.startswith(sep):
prefix += sep
path = path.lstrip(sep)
drive, root, path = splitroot(path)

comps = path.split(sep)
comps = path.lstrip(sep).split(sep)
barneygale marked this conversation as resolved.
Show resolved Hide resolved
i = 0
while i < len(comps):
if not comps[i] or comps[i] == curdir:
Expand All @@ -541,16 +562,16 @@ def normpath(path):
if i > 0 and comps[i-1] != pardir:
del comps[i-1:i+1]
i -= 1
elif i == 0 and prefix.endswith(sep):
elif i == 0 and root:
del comps[i]
else:
i += 1
else:
i += 1
# If the path is now empty, substitute '.'
if not prefix and not comps:
if not drive and not root and not comps:
comps.append(curdir)
return prefix + sep.join(comps)
return drive + root + sep.join(comps)

else:
def normpath(path):
Expand Down Expand Up @@ -765,8 +786,8 @@ def relpath(path, start=None):
try:
start_abs = abspath(normpath(start))
path_abs = abspath(normpath(path))
start_drive, start_rest = splitdrive(start_abs)
path_drive, path_rest = splitdrive(path_abs)
start_drive, _, start_rest = splitroot(start_abs)
path_drive, _, path_rest = splitroot(path_abs)
barneygale marked this conversation as resolved.
Show resolved Hide resolved
if normcase(start_drive) != normcase(path_drive):
raise ValueError("path is on mount %r, start on mount %r" % (
path_drive, start_drive))
Expand Down Expand Up @@ -816,21 +837,21 @@ def commonpath(paths):
curdir = '.'

try:
drivesplits = [splitdrive(p.replace(altsep, sep).lower()) for p in paths]
split_paths = [p.split(sep) for d, p in drivesplits]
drivesplits = [splitroot(p.replace(altsep, sep).lower()) for p in paths]
split_paths = [p.split(sep) for d, r, p in drivesplits]

try:
isabs, = set(p[:1] == sep for d, p in drivesplits)
isabs, = set(r for d, r, p in drivesplits)
except ValueError:
raise ValueError("Can't mix absolute and relative paths") from None

# Check that all drive letters or UNC paths match. The check is made only
# now otherwise type errors for mixing strings and bytes would not be
# caught.
if len(set(d for d, p in drivesplits)) != 1:
if len(set(d for d, r, p in drivesplits)) != 1:
barneygale marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError("Paths don't have the same drive")

drive, path = splitdrive(paths[0].replace(altsep, sep))
drive, root, path = splitroot(paths[0].replace(altsep, sep))
common = path.split(sep)
common = [c for c in common if c and c != curdir]

Expand All @@ -844,8 +865,7 @@ def commonpath(paths):
else:
common = common[:len(s1)]

prefix = drive + sep if isabs else drive
return prefix + sep.join(common)
return drive + root + sep.join(common)
except (TypeError, AttributeError):
genericpath._check_arg_types('commonpath', *paths)
raise
Expand Down
18 changes: 4 additions & 14 deletions Lib/pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,19 +271,6 @@ def __reduce__(self):
# when pickling related paths.
return (self.__class__, tuple(self._parts))

@classmethod
def _split_root(cls, part):
sep = cls._flavour.sep
rel = cls._flavour.splitdrive(part)[1].lstrip(sep)
anchor = part.removesuffix(rel)
if anchor:
anchor = cls._flavour.normpath(anchor)
drv, root = cls._flavour.splitdrive(anchor)
if drv.startswith(sep):
# UNC paths always have a root.
root = sep
return drv, root, rel

@classmethod
def _parse_parts(cls, parts):
if not parts:
Expand All @@ -293,7 +280,10 @@ def _parse_parts(cls, parts):
path = cls._flavour.join(*parts)
if altsep:
path = path.replace(altsep, sep)
drv, root, rel = cls._split_root(path)
drv, root, rel = cls._flavour.splitroot(path)
if drv.startswith(sep):
# pathlib assumes that UNC paths always have a root.
root = sep
unfiltered_parsed = [drv + root] + rel.split(sep)
parsed = [sys.intern(x) for x in unfiltered_parsed if x and x != '.']
return drv, root, parsed
Expand Down
48 changes: 35 additions & 13 deletions Lib/posixpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
import genericpath
from genericpath import *

__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
__all__ = ["normcase","isabs","join","splitdrive","splitroot","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime","islink","exists","lexists","isdir","isfile",
"ismount", "expanduser","expandvars","normpath","abspath",
Expand Down Expand Up @@ -135,6 +135,37 @@ def splitdrive(p):
return p[:0], p
barneygale marked this conversation as resolved.
Show resolved Hide resolved


def splitroot(p):
"""Split a pathname into drive, root and tail. On Posix, drive is always
empty; the root may be empty, a single slash, or two slashes. The tail
contains anything after the root. For example:

splitroot('foo/bar') == ('', '', 'foo/bar')
splitroot('/foo/bar') == ('', '/', 'foo/bar')
splitroot('//foo/bar') == ('', '//', 'foo/bar')
splitroot('///foo/bar') == ('', '/', '//foo/bar')
"""
p = os.fspath(p)
if isinstance(p, bytes):
sep = b'/'
empty = b''
else:
sep = '/'
empty = ''
if p[:1] != sep:
# Relative path, e.g.: 'foo'
return empty, empty, p
elif p[1:2] != sep:
# Absolute path, e.g.: '/foo'
return empty, p[:1], p[1:]
elif p[2:3] != sep:
# Implementation defined per POSIX standard, e.g.: '//foo'
return empty, p[:2], p[2:]
else:
# Absolute path with extraneous slashes, e.g.: '///foo', '////foo', etc.
return empty, p[:1], p[1:]


# Return the tail (basename) part of a path, same as split(path)[1].

def basename(p):
Expand Down Expand Up @@ -372,27 +403,18 @@ def normpath(path):
dotdot = '..'
if path == empty:
return dot
initial_slashes = path.startswith(sep)
# POSIX allows one or two initial slashes, but treats three or more
# as single slash.
# (see https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13)
if (initial_slashes and
path.startswith(sep*2) and not path.startswith(sep*3)):
initial_slashes = 2
_, root, path = splitroot(path)
comps = path.split(sep)
new_comps = []
for comp in comps:
if comp in (empty, dot):
continue
if (comp != dotdot or (not initial_slashes and not new_comps) or
if (comp != dotdot or (not root and not new_comps) or
(new_comps and new_comps[-1] == dotdot)):
new_comps.append(comp)
elif new_comps:
new_comps.pop()
comps = new_comps
path = sep.join(comps)
if initial_slashes:
path = sep*initial_slashes + path
path = root + sep.join(new_comps)
return path or dot

else:
Expand Down
Loading