Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-102511: Add C implementation of os.path.splitroot() #118089

Merged
merged 29 commits into from
Apr 25, 2024

Conversation

nineteendo
Copy link
Contributor

@nineteendo nineteendo commented Apr 19, 2024

Benchmark

ntpath.py

PS C:\Users\wanne\cpython> main\python -m timeit -s "import os" "os.path.splitroot('//server/share/spam/eggs')"; speedup-os.path.splitroot\python -m timeit -s "import os" "os.path.splitroot('//server/share/spam/eggs')"
200000 loops, best of 5: 1.14 usec per loop # before
500000 loops, best of 5: 433 nsec per loop # after
# -> 2.63x faster

@nineteendo nineteendo marked this pull request as ready for review April 19, 2024 12:03
@nineteendo nineteendo changed the title gh-102511: Add C implementation of ntpath.splitroot() gh-102511: Add C implementation of os.path.splitroot() Apr 20, 2024
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great looking change! As Erlend pointed out, a couple of robustness things to check, and there's a potential behaviour change that's worth validating and describing in Doc\whatsnew\3.13.rst and possibly in other parts of the documentation.

Python/fileutils.c Show resolved Hide resolved
Python/fileutils.c Outdated Show resolved Hide resolved
Python/fileutils.c Outdated Show resolved Hide resolved
Lib/importlib/_bootstrap_external.py Outdated Show resolved Hide resolved
Modules/posixmodule.c Show resolved Hide resolved
nineteendo and others added 4 commits April 22, 2024 17:09
…e-102511.qDEB66.rst

Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Modules/posixmodule.c Outdated Show resolved Hide resolved
@nineteendo
Copy link
Contributor Author

Oh, come on, why must argclinic raise an error here? That isn't useful:

ValueError: _path_splitroot_ex: embedded null character in p

@zooba
Copy link
Member

zooba commented Apr 22, 2024

_path_splitroot_ex: embedded null character in p

Oh right, I forgot about this behaviour of argument clinic 😞 Might be a good reason to go to unicode after all.

Alternatively, you could handle the ValueError in the Python wrapper, split at the null, and then append it back onto the last element? Strictly speaking we're in the "invalid path --> invalid output" place again, as we don't actually support embedded nulls in paths at all, but there's no reason we need to go ahead and raise the error this early.

@eryksun
Copy link
Contributor

eryksun commented Apr 24, 2024

@eryksun, sorry for the ping, but could you compare this with your implementation?

My implementation returned a pointer to the rest of the path, plus an optional out parameter pointer to the root, and _path_splitroot_ex() used pointer arithmetic. You took a more direct approach, with required out parameters for the drive size and root size. I think the performance should be about the same from what I see in your implementation, but I was seeing a 4x improvement on Windows, and you're only seeing 2.78x improvement. I'll check the performance of your PR versus the current main.

@nineteendo
Copy link
Contributor Author

It might just be my laptop or because I'm comparing with 3.12. (there doesn't seem to be a way to have 2 builds at the same time)
Otherwise we need to check what's different from your implementation. (if you haven't lost it, of course)

@nineteendo
Copy link
Contributor Author

OK, how's the build going?

@eryksun
Copy link
Contributor

eryksun commented Apr 24, 2024

It might just be my laptop or because I'm comparing with 3.12. (there doesn't seem to be a way to have 2 builds at the same time)

You can use worktrees. Anyway, I still have my implementation, but there's no need to delve further into the comparison because the reason is obvious. I was testing the builtin function directly, not a wrapper function that called os.fspath() before calling the builtin. My plan was to switch to using path_t internally instead of a wrapper, but I never implemented that. Here's the performance I get for your PR with and without the wrapper.

> .\python_work -m timeit -s "import os" "os.path.splitroot('//server/share/spam/eggs')"
1000000 loops, best of 5: 270 nsec per loop

> .\python_work -m timeit -s "import nt" "nt._path_splitroot_ex('//server/share/spam/eggs')"
2000000 loops, best of 5: 153 nsec per loop

It's 1.76 times faster without the wrapper.

@nineteendo
Copy link
Contributor Author

nineteendo commented Apr 24, 2024

You can use worktrees.

Could you explain how I can use them?

I was testing the builtin function directly, not a wrapper function

Yeah, that matches the performance when I used path_t before. But I switched back to using a Python wrapper because NULL bytes and surrogates weren't handled properly. The code was quite ugly too with having to convert the input path to bytes on Unix.

I tried to work around the first issue by modifying path_converter, but I couldn't get surrogates to work, so I abandoned that idea and went for a simpler implementation.

@eryksun
Copy link
Contributor

eryksun commented Apr 24, 2024

Here's a brief overview of worktrees in the developer's guide:

https://devguide.python.org/getting-started/git-boot-camp/#git-worktree

I tried to work around the first issue by modifying path_converter, but I couldn't get surrogates to work, so I abandoned that idea and went for a simpler implementation.

In another issue, the implementation of path_t could be generalized to support fields to configure the converter to use wide regardless of platform, to allow null characters, to allow arbitrary length paths (e.g. no 32767 length limit on Windows), and a new field such as bytes_input to determine whether a path result has to be converted back to bytes. The option to always use a wide-character path is a generalization of the current behavior on Windows. Argument Clinic would be extended to support the new options. The implementations of _path_splitroot_ex(), _path_normpath(), and _path_abspath() (if adopted) would benefit, and also the _path_is*() helpers on Windows.

Modules/posixmodule.c Outdated Show resolved Hide resolved
Co-authored-by: Eryk Sun <eryksun@gmail.com>
@nineteendo
Copy link
Contributor Author

Let's leave path_t for a followup pull request, such that it can be implemented without a Python wrapper.

Copy link
Contributor

@eryksun eryksun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, nineteendo.

@zooba zooba merged commit 10bb90e into python:main Apr 25, 2024
36 checks passed
@zooba
Copy link
Member

zooba commented Apr 25, 2024

That's quorum, thanks @nineteendo for bearing with us, and congrats on landing the patch!

@nineteendo nineteendo deleted the speedup-os.path.splitroot branch April 25, 2024 09:09
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot iOS ARM64 Simulator 3.x has failed when building commit 10bb90e.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/1380/builds/87) and take a look at the build logs.
  4. Check if the failure is related to this commit (10bb90e) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/1380/builds/87

Failed tests:

  • test_os

Failed subtests:

  • test_fpathconf - test.test_os.TestInvalidFD.test_fpathconf

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/Users/buildbot/Library/Developer/XCTestDevices/46780DAB-3371-45CA-A1E6-5178548A2DAA/data/Containers/Bundle/Application/2E63ACC7-4F0C-40C2-B867-2CAE5EAFD03D/iOSTestbed.app/python/lib/python3.13/test/test_os.py", line 2369, in test_fpathconf
    self.check(os.pathconf, "PC_NAME_MAX")
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/46780DAB-3371-45CA-A1E6-5178548A2DAA/data/Containers/Bundle/Application/2E63ACC7-4F0C-40C2-B867-2CAE5EAFD03D/iOSTestbed.app/python/lib/python3.13/test/test_os.py", line 2293, in check
    f(os_helper.make_bad_fd(), *args, **kwargs)
    ~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: unrecognized configuration name

@nineteendo
Copy link
Contributor Author

Hmm, it didn't fail in the previous commit: https://buildbot.python.org/all/#/builders/1380/builds/86
Is this random?

@zooba
Copy link
Member

zooba commented Apr 25, 2024

This seems unrelated, and if it's only iOS, then we can ignore it. That buildbot has only just started running, and will be flushing out random issues for a little while.

@nineteendo
Copy link
Contributor Author

Eryk, can we backport this to 3.12 to fix the bug with ntpath.normpath()?

@bedevere-app
Copy link

bedevere-app bot commented May 22, 2024

GH-119394 is a backport of this pull request to the 3.12 branch.

@gpshead
Copy link
Member

gpshead commented May 31, 2024

Please add a unittest coverage of both the Python and C versions so that we ensure their behavior is the same.

@eryksun
Copy link
Contributor

eryksun commented May 31, 2024

@gpshead, they both get tested as long as tests run on at least one POSIX system and one Windows system. The Python fallback implementation is only defined if the C version can't be imported. Thus the fallback for ntpath.splitroot() is only defined on POSIX, and the fallback for posixpath.splitroot() is only defined on Windows.

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also changes first parameter name. It is now:

  • p for python version
  • path for C version

@hugovk
Copy link
Member

hugovk commented Sep 15, 2024

Is this intentional? Does it need addressing before 3.13.0 in two weeks?

@sobolevn
Copy link
Member

sobolevn commented Sep 15, 2024

I would prefer to address this, because:

  • It can break a user's code that used: os.path.splitroot(p="...")
  • It changes the typeshed typings, basically it makes the first param implicitly pos-only, because you cannot be sure which name it has: p or path

@sobolevn
Copy link
Member

I will open a PR in a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants