-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: pathlib -> universal_pathlib to enable cloud (and other filesystem) support #1074
Conversation
- make use of universal_pathlib to enable cloud/remote paths in pybids - remove uri prefix when passing to validator Co-authored-by: KarlHanEdn <tianming@ualberta.ca>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1074 +/- ##
=======================================
Coverage 89.80% 89.80%
=======================================
Files 63 63
Lines 7177 7179 +2
Branches 1374 1374
=======================================
+ Hits 6445 6447 +2
Misses 532 532
Partials 200 200 ☔ View full report in Codecov by Sentry. |
I'll be honest, this makes me kind of nervous. PyBIDS is full of moving parts. That said, the tests passing on pre-3.12 Python is surprising and suggests the authors really know what they're doing. @adelavega thoughts? |
This is pretty neat! I share some of Chris's concerns though. I wonder if we could hedge, by only using |
IDK how they did it, but on Python 3.8: Python 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.3 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from pathlib import Path
In [2]: from upath import UPath
In [3]: isinstance(UPath('/tmp'), Path)
Out[3]: True
In [4]: Path
Out[4]: pathlib.Path
In [5]: Path('/tmp')
Out[5]: PosixPath('/tmp')
In [6]: UPath('/tmp')
Out[6]: PosixUPath('/tmp') I'm kind of inclined to merge, release and then roll back if bug reports start coming in. |
I'm happy with that. It seems like an easy win
…On Thu, Jul 25, 2024, 2:59 PM Chris Markiewicz ***@***.***> wrote:
IDK how they did it, but on Python 3.8:
Python 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) Type 'copyright', 'credits' or 'license' for more informationIPython 8.12.3 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from pathlib import Path
In [2]: from upath import UPath
In [3]: isinstance(UPath('/tmp'), Path)Out[3]: True
In [4]: PathOut[4]: pathlib.Path
In [5]: Path('/tmp')Out[5]: PosixPath('/tmp')
In [6]: UPath('/tmp')Out[6]: PosixUPath('/tmp')
I'm kind of inclined to merge, release and then roll back if bug reports
start coming in.
—
Reply to this email directly, view it on GitHub
<#1074 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAVFLMA7INA653ENLTHQFOLZOFKKJAVCNFSM6AAAAABLKYTBSCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJRGI4TKNBYGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Version 0.17.0 (July 29, 2024) Feature release in the 0.17.0 series. This release includes experimental support for MRS (BEP22) datasets and `universal_pathlib <https://github.com/fsspec/universal_pathlib>`__. * ENH: pathlib -> universal_pathlib to enable cloud (and other filesystem) support (#1074) * ENH: Add MRS entities, path patterns (#1075) * STY: No parentheses after assert (#1065) * MNT: Bump bids-examples from `47c4da1` to `401f4cf` (#1073) * MNT: Fix typo newly found by codespell (#1066)
This PR makes a relatively small change to the code that enables pybids to parse any filesystem backend that is supported by universal_pathlib (which is a subset of those supported by fsspec), by dropping in UPath instead of Path.
The list of filesystems currently supported (from https://github.com/fsspec/universal_pathlib) is:
file:
Local filesystemmemory:
Ephemeral filesystem in RAMaz:
,adl:
,abfs:
andabfss:
Azure Storage (requiresadlfs
)data:
RFC 2397 style data URLs (requiresfsspec>=2023.12.2
)github:
GitHub repository filesystemhttp:
andhttps:
HTTP(S)-based filesystemhdfs:
Hadoop distributed filesystemgs:
andgcs:
Google Cloud Storage (requiresgcsfs
)s3:
ands3a:
AWS S3 (requiress3fs
to be installed)webdav
,webdav+http:
andwebdav+https:
WebDAV-based filesystem on top ofHTTP(S) (requires
webdav4[fsspec]
)I realize there has been a previous spin-off to enable S3 support (https://github.com/nrdg/cloud_bids_layout) but that has been inactive for ~4 years, and the proposed solution here would do the job more seamlessly, by simply being able to consume paths that include a uri prefix to designate the filesystem (e.g.
s3://
orgcs://
). There was also an issue raised to request S3 support a while back that became stale and was closed #45.E.g. one specific use-cases this PR could enable is applying pybids directly to openneuro datasets using the s3 uri:
layout = BIDSLayout('s3://openneuro.org/ds005360')
Filenames from the layout could then be consumed by any tool that support fsspec (pandas, dask, etc). Note: it looks like nibabel supports loading image from URLs, but does not yet support fsspec (nipy/nibabel#1029), so use-cases for nifti data might be limited currently.
My personal use-case is for BIDS microscopy data with OME-Zarr datasets, to allow BIDS apps to consume them directly from the cloud (e.g. using Dask or Zarr libraries).
Re: this PR, I believe it shouldn't change how pybids works if a prefix is not used (eg the tests currently pass), but someone more familiar with the pybids internals would have to confirm.
Keen to hear what others think!