Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

PEP 519: Path protocol support #110

Closed
henryiii opened this issue Jul 31, 2018 · 15 comments
Closed

PEP 519: Path protocol support #110

henryiii opened this issue Jul 31, 2018 · 15 comments

Comments

@henryiii
Copy link
Member

henryiii commented Jul 31, 2018

Uproot currently does not support pathlib, so something like this will fail:

from pathlib import Path
import uproot
p = Path('/data/file.root')
f = uproot.open(p) # Error

The path and similar functions should use name = os.fspath(name) if os has fspath (depends slightly on what sort of objects uproot accepts here). (This is the correct solution for Python 3.6+, and will also support other path objects in libraries like Plumbum; anything that provides a __fspath__() method. It won't fix the pathlib backport on Py2, or Python < 3.6.)

See https://www.python.org/dev/peps/pep-0519 .

@jpivarski
Copy link
Member

Something new for me to learn about! I'll be doing an OS upgrade soon, which would update my personal Python from 3.5 to 3.6, where I'll learn about that language feature and add support for it in uproot. If delayed, I'll just use Miniconda for testing, which I think is 3.6. is this urgent? (It should be easy to do a pull request, which Travis would test against 3.6...)

@henryiii
Copy link
Member Author

No, not urgent; it can be bypassed by the user by converting to strings; it just something I'd like to avoid having to teach students to do in the future.

I could do a pull request but I'm not sure what other objects you might expect here. Can you pass in streams, for example?

@jpivarski
Copy link
Member

It takes a string or an iterable of strings and expands each by glob pattern (if not a URL) into a list of strings. I'm not sure what pathlib does, but it's probably a strongly typed version of that.

Filename handling happens in two places: uproot.open and uproot.iterate, and I'm pretty sure they both call the same internal function.

jpivarski added a commit that referenced this issue Aug 7, 2018
….open, uproot.iterate, uproot.numentries, and uproot.lazyarrays
@jpivarski
Copy link
Member

jpivarski commented Aug 7, 2018

I think I fixed it. The following methods should now recognize a pathlib.Path object when given:

  • uproot.open
  • uproot.iterate
  • uproot.numentries
  • uproot.lazyarrays

Travis is verifying that I can still accept plain strings and that everything still works, all the way back to Python 2.6 (yes, I'm still supporting that). Can you please stress-test this with Python 3? Verify that it accepts pathlib Paths in all the ways that you expect— I might have missed some cases because I'm unfamiliar with the library— and I'll pull-request this update into the mainstream branch. Thanks!

(Oh, I'm not asking you to test each of the four functions above: I'm asking you to check special features of pathlib, in case there are any I was unaware of. uproot.open and uproot.iterate are the only two independent cases; the last three all share the same filename-recognizing function.)

@henryiii
Copy link
Member Author

henryiii commented Aug 7, 2018

Great, thanks! I'll give it a test within a day or so, ping me if I don't respond in that timeframe!

@henryiii
Copy link
Member Author

henryiii commented Aug 7, 2018

PS: This should also work with Pathlib2, for example, in Python 2.6+; I'd recommend that you not check for the exact name (and there are other path libraries). Can I suggest the folllowing change:

- if path.__class__.__module__ == "pathlib":
-          import pathlib
-          if isinstance(path, pathlib.Path):
-              path = str(path)
+ if hasattr(path, '__fspath__'):
+     path = path.__fspath__()

@jpivarski
Copy link
Member

The Python 3 pathlib has no __fspath__(), so I shouldn't substitute it entirely. Should I check both?

By the way, I check path.__class__.__module__ == "pathlib" because I can't import it (it might not exist because I might be in Python 2) and this string check is exactly as safe as import pathlib when it does exist— that also checks for the module by name!

jpivarski added a commit that referenced this issue Aug 7, 2018
@jpivarski
Copy link
Member

Tell me what you think of the above change. We can test for path.__fspath__ for pathlib libraries that put a __fspath__ method on the path object and then afterward check for a class in module "pathlib" that is a subclass of pathlib.Path.

@jpivarski
Copy link
Member

At least it's not wrong. I'm closing this now so that I have an easier time refactoring for uproot 3 (that's why I was handling all of these issues, to clear the board).

@henryiii
Copy link
Member Author

henryiii commented Aug 7, 2018

I would use elif to save one check, but otherwise that looks fine. pathlib does have __fspath__, but not on older Python 3's (Pre 3.6, I think).

@jpivarski
Copy link
Member

Because I wasn't sure, I didn't want to force the two checks to be mutually exclusive. Perhaps the output of __fspath__ can never be a pathlib.Path; using elif would always make that assumption. I guess you're saying it's a safe assumption; I can change that later.

Also, I decided that having an __fspath__ should trump being a pathlib.Path, since there could be third-party libraries satisfying that interface.

I can confirm that pre Python 3.6 (mine is 3.4), pathlib.Path doesn't have a __fspath__.

My reason for pushing this through is to not delay the uproot 3 refactoring. I'd really like to hear your feedback about my e-mail to Scikit-HEP Admins (you're on that, right?), since it involves hepvector. (If you're not on it, I'll forward you the e-mail.)

@henryiii
Copy link
Member Author

henryiii commented Aug 7, 2018

The output of __fspath__ is Union[str, bytes], so it cannot be a Path, so that saves a check.

Before Python 3.6 users pretty much were always required to wrap Path in str to use it, since it doesn't even work in the builtin functions like open() without adding str. That's why that PEP was written, and that's why I wasn't really that worried about Python < 3.6 pathlib; the behavior would at least be consistent. After 3.6, though, it is assumed that you don't have to add the str() anymore, since the __fspath__() method was added and supported throughout the standard library.

I'm on the email list, just haven't had time to write a proper response. Overall, it looks great, and I like the modularization of functionality. I'll reply tommorrow!

@henryiii
Copy link
Member Author

henryiii commented Aug 9, 2018

FYI: This is the way matplotlib is going to do it: https://github.com/matplotlib/matplotlib/pull/11307/files

@jpivarski
Copy link
Member

I'm reopening the issue to remind myself to change the code this way. For the record, I'll be changing

if hasattr(path, "__fspath__"):
    path = path.__fspath__()
if path.__class__.__module__ == "pathlib":
    import pathlib
    if isinstance(path, pathlib.Path):
         path = str(path)

(in both places) to

if isinstance(fname, getattr(os, "PathLike", ())):
    fname = os.fspath(fname)
elif hasattr(path, "__fspath__"):
    path = path.__fspath__()
elif path.__class__.__module__ == "pathlib":
    import pathlib
    if isinstance(path, pathlib.Path):
         path = str(path)

I've tested Pythons 3.4 (Ubuntu 14.04) and 3.6 (Miniconda): 3.4 has no os.PathLike or os.fspath, so it would miss the first block. 3.4's pathlib.Path also lacks a __fspath__ method, so it would miss the second block as well. The third block only works for libraries named pathlib with a pathlib.Path superclass, which might not include the Pathlib2 you mentioned. This should maximize the coverage matrix of Python versions and standard vs. third party libraries enough to say that "uproot supports pathlib" and any exceptions can be reported as bugs and dealt with as additional cases.

Given the way things are going, I'll probably sneak this change in with other uproot 3.0 refactorings, without going through the usual branch-merge-deploy cycle.

@henryiii
Copy link
Member Author

henryiii commented Aug 9, 2018

By the way, for the record, check 1 and 2 are identical on Python 3.6+, PathLike is an Interface that simply requires that one method be implemented, __fspath__(). However, pre-Python 3.6, they will be different if you are using a separate path library like pathlib2, plumbum, etc.; so I don't think it hurts to have it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants