Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: pathlib -> universal_pathlib to enable cloud (and other filesystem) support #1074

Merged
merged 2 commits into from
Jul 26, 2024

Conversation

akhanf
Copy link
Contributor

@akhanf akhanf commented Jul 23, 2024

This PR makes a relatively small change to the code that enables pybids to parse any filesystem backend that is supported by universal_pathlib (which is a subset of those supported by fsspec), by dropping in UPath instead of Path.

The list of filesystems currently supported (from https://github.com/fsspec/universal_pathlib) is:

  • file: Local filesystem
  • memory: Ephemeral filesystem in RAM
  • az:, adl:, abfs: and abfss: Azure Storage (requires adlfs)
  • data: RFC 2397 style data URLs (requires fsspec>=2023.12.2)
  • github: GitHub repository filesystem
  • http: and https: HTTP(S)-based filesystem
  • hdfs: Hadoop distributed filesystem
  • gs: and gcs: Google Cloud Storage (requires gcsfs)
  • s3: and s3a: AWS S3 (requires s3fs to be installed)
  • webdav, webdav+http: and webdav+https: WebDAV-based filesystem on top of
    HTTP(S) (requires webdav4[fsspec])

I realize there has been a previous spin-off to enable S3 support (https://github.com/nrdg/cloud_bids_layout) but that has been inactive for ~4 years, and the proposed solution here would do the job more seamlessly, by simply being able to consume paths that include a uri prefix to designate the filesystem (e.g. s3:// or gcs://). There was also an issue raised to request S3 support a while back that became stale and was closed #45.

E.g. one specific use-cases this PR could enable is applying pybids directly to openneuro datasets using the s3 uri:
layout = BIDSLayout('s3://openneuro.org/ds005360')

Filenames from the layout could then be consumed by any tool that support fsspec (pandas, dask, etc). Note: it looks like nibabel supports loading image from URLs, but does not yet support fsspec (nipy/nibabel#1029), so use-cases for nifti data might be limited currently.

My personal use-case is for BIDS microscopy data with OME-Zarr datasets, to allow BIDS apps to consume them directly from the cloud (e.g. using Dask or Zarr libraries).

Re: this PR, I believe it shouldn't change how pybids works if a prefix is not used (eg the tests currently pass), but someone more familiar with the pybids internals would have to confirm.

Keen to hear what others think!

 - make use of universal_pathlib to enable cloud/remote paths in pybids
 - remove uri prefix when passing to validator
Co-authored-by: KarlHanEdn <tianming@ualberta.ca>
Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.80%. Comparing base (d442c8b) to head (80b9391).
Report is 5 commits behind head on master.

Files Patch % Lines
bids/layout/models.py 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1074   +/-   ##
=======================================
  Coverage   89.80%   89.80%           
=======================================
  Files          63       63           
  Lines        7177     7179    +2     
  Branches     1374     1374           
=======================================
+ Hits         6445     6447    +2     
  Misses        532      532           
  Partials      200      200           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@effigies
Copy link
Collaborator

I'll be honest, this makes me kind of nervous. PyBIDS is full of moving parts.

That said, the tests passing on pre-3.12 Python is surprising and suggests the authors really know what they're doing.

@adelavega thoughts?

pyproject.toml Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@adelavega
Copy link
Collaborator

This is pretty neat! I share some of Chris's concerns though.

I wonder if we could hedge, by only using UPath for non-standard paths. Thus ensuring legacy behavior for 99% of users but enabling new paths. Maybe that's overkill though, not sure.

@effigies
Copy link
Collaborator

effigies commented Jul 25, 2024

IDK how they did it, but on Python 3.8:

Python 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.3 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from pathlib import Path

In [2]: from upath import UPath

In [3]: isinstance(UPath('/tmp'), Path)
Out[3]: True

In [4]: Path
Out[4]: pathlib.Path

In [5]: Path('/tmp')
Out[5]: PosixPath('/tmp')

In [6]: UPath('/tmp')
Out[6]: PosixUPath('/tmp')

I'm kind of inclined to merge, release and then roll back if bug reports start coming in.

@adelavega
Copy link
Collaborator

adelavega commented Jul 25, 2024 via email

@effigies effigies merged commit 47ddf00 into bids-standard:master Jul 26, 2024
24 checks passed
@effigies effigies mentioned this pull request Jul 26, 2024
effigies added a commit that referenced this pull request Jul 29, 2024
Version 0.17.0 (July 29, 2024)

Feature release in the 0.17.0 series.

This release includes experimental support for MRS (BEP22) datasets
and `universal_pathlib <https://github.com/fsspec/universal_pathlib>`__.

* ENH: pathlib -> universal_pathlib to enable cloud (and other filesystem) support (#1074)
* ENH: Add MRS entities, path patterns (#1075)
* STY: No parentheses after assert (#1065)
* MNT: Bump bids-examples from `47c4da1` to `401f4cf` (#1073)
* MNT: Fix typo newly found by codespell (#1066)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add S3 support
3 participants