Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotADirectoryError: MultiplexedPath only supports directories #287

Closed
ReagTea opened this issue Aug 9, 2023 · 4 comments · Fixed by #290
Closed

NotADirectoryError: MultiplexedPath only supports directories #287

ReagTea opened this issue Aug 9, 2023 · 4 comments · Fixed by #290
Assignees

Comments

@ReagTea
Copy link

ReagTea commented Aug 9, 2023

Hi,

I'm trying to get a resource file from a namespace zipped package with importlib_resources. I know it works, at least in a specific use case of mine, but in another setting I can't make it work anymore, whatever I try :'(

I have this kind of folder structure in my pkg, which I zip with, e.g. zip -r -0 namespace.zip namespace/, for testing purposes:

namespace/
namespace/pkg1/
namespace/pkg1/subpkg/
namespace/pkg1/subpkg/__init__.py
namespace/pkg1/subpkg/module.py
# ... mutiple other pkgs and supkgs
# Just data in `data`, no modules. It's a shared data folder for all pkgs in the namespace package
namespace/data/
namespace/data/configs/
namespace/data/configs/config1.json

I work with Spark and in the case the namespace package is pip installed and imported, I'm able to use importlib-metadata and zipfile to zip the pkgs I need and send them to Spark workers with addPyFile (which essentially copies the zip wherever on the nodes and adds it to sys.path, so zipimport mechanism is involved).

But in another use case, I need to send the zip directly, still relying on adding the path to sys.path and later using importlib_resources.files (or importlib.resources.files for that matter, which gives the same error) to retrieve say config1.json.

In the first case I can do `imr.files('namespace.data.configs').joinpath('config1.json') without problem, but in the second case I cannot and always get this error:

>>> sys.path.insert(1, str(Path('namespace.zip').resolve())) # essentially what `addPyFile` is doing
>>> imr.files("gcorrelator.data")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.conda/envs/ml_py310/lib/python3.10/site-packages/importlib_resources/_common.py", line 46, in wrapper
    return func(anchor)
  File "/home/user/.conda/envs/ml_py310/lib/python3.10/site-packages/importlib_resources/_common.py", line 56, in files
    return from_package(resolve(anchor))
  File "/home/user/.conda/envs/ml_py310/lib/python3.10/site-packages/importlib_resources/_common.py", line 113, in from_package
    reader = spec.loader.get_resource_reader(spec.name)
  File "/home/user/.conda/envs/ml_py310/lib/python3.10/site-packages/importlib_resources/_compat.py", line 79, in get_resource_reader
    _namespace_reader(self.spec)
  File "/home/user/.conda/envs/ml_py310/lib/python3.10/site-packages/importlib_resources/_compat.py", line 56, in _namespace_reader
    return readers.NamespaceReader(spec.submodule_search_locations)
  File "/home/user/.conda/envs/ml_py310/lib/python3.10/site-packages/importlib_resources/readers.py", line 133, in __init__
    self.path = MultiplexedPath(*list(namespace_path))
  File "/home/user/.conda/envs/ml_py310/lib/python3.10/site-packages/importlib_resources/readers.py", line 70, in __init__
    raise NotADirectoryError('MultiplexedPath only supports directories')
NotADirectoryError: MultiplexedPath only supports directories

I've still no clue what's going wrong. I can get imr.files("namespace.pkg1") however, as before, but I don't know why I can't get to namespace/data.

setup: py310, importlib_resources 6.0.1

Thanks !

@ReagTea
Copy link
Author

ReagTea commented Aug 11, 2023

update: Ok, so it seems importlib.resources.files (or importlib_metadata.files) works fine when package is pip installed, but fails when it's zipimported, with the folder structure as is.

But adding an __init__.py in the namespace/data folder seems to do the trick when using zipimport and adding zip to sys.path. Don't know why, but I'll use that for now.

@jaraco
Copy link
Member

jaraco commented Sep 10, 2023

This issue seems related to a similar issue reported in CPython. In this case, however, you legitimately have a namespace package and thus probably want it multiplexed.

It's similar in that the use of importlib_resources.files('gcorrelator.data') is probably not correct. That is, you've already acknowledged that data is not a module but just a directory for data. The parameter to files is meant to be a module or package, so probably the best usage here is to use files('gcorrelator').joinpath('data'). The fact that files('gcorrelator.data)works in some cases is probably an artifact of behavior that directories are namespace packages by default. That also might explain why adding ainit.pytonamespace/data`, making it a non-namespace package, may be working around any issues.

That said, it's not obvious to me there's not a bug here. I think there very well may be. Let me see if I can repro the issue.

@jaraco
Copy link
Member

jaraco commented Sep 10, 2023

Yes, I can

 draft @ mkdir -p namespace/pkg1/subpkg
 draft @ touch namespace/pkg1/subpkg/__init__.py
 draft @ mkdir -p namespace/data/configs/config1.json
 draft @ zip -r -0 namespace.zip namespace/
  adding: namespace/ (stored 0%)
  adding: namespace/pkg1/ (stored 0%)
  adding: namespace/pkg1/subpkg/ (stored 0%)
  adding: namespace/pkg1/subpkg/__init__.py (stored 0%)
  adding: namespace/data/ (stored 0%)
  adding: namespace/data/configs/ (stored 0%)
  adding: namespace/data/configs/config1.json/ (stored 0%)
 draft @ rm -r namespace
 draft @ env PYTHONPATH=namespace.zip pip-run importlib_resources -- -i -c 'import importlib_resources as imr'
>>> imr.files('namespace.data')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/pip-run-qe1iolnw/importlib_resources/_common.py", line 46, in wrapper
    return func(anchor)
           ^^^^^^^^^^^^
  File "/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/pip-run-qe1iolnw/importlib_resources/_common.py", line 56, in files
    return from_package(resolve(anchor))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/pip-run-qe1iolnw/importlib_resources/_common.py", line 113, in from_package
    reader = spec.loader.get_resource_reader(spec.name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/pip-run-qe1iolnw/importlib_resources/_compat.py", line 79, in get_resource_reader
    _namespace_reader(self.spec)
  File "/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/pip-run-qe1iolnw/importlib_resources/_compat.py", line 56, in _namespace_reader
    return readers.NamespaceReader(spec.submodule_search_locations)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/pip-run-qe1iolnw/importlib_resources/readers.py", line 133, in __init__
    self.path = MultiplexedPath(*list(namespace_path))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/folders/sx/n5gkrgfx6zd91ymxr2sr9wvw00n8zm/T/pip-run-qe1iolnw/importlib_resources/readers.py", line 70, in __init__
    raise NotADirectoryError('MultiplexedPath only supports directories')
NotADirectoryError: MultiplexedPath only supports directories

Breaking in to debug, I can see that MultiplexedPath is ending up with a PosixPath that doesn't exist:

(Pdb) l
 65             self._paths = list(map(pathlib.Path, remove_duplicates(paths)))
 66             if not self._paths:
 67                 message = 'MultiplexedPath must contain at least one path'
 68                 raise FileNotFoundError(message)
 69             if not all(path.is_dir() for path in self._paths):
 70  ->             raise NotADirectoryError('MultiplexedPath only supports directories')
 71  
 72         def iterdir(self):
 73             children = (child for path in self._paths for child in path.iterdir())
 74             by_name = operator.attrgetter('name')
 75             groups = itertools.groupby(sorted(children, key=by_name), key=by_name)
(Pdb) self._paths
[PosixPath('/Users/jaraco/draft/namespace.zip/namespace/data')]

That is, it does exist, but not as a PosixPath. I think the issue is visible in that listing - namely the casting of paths to pathlib.Path on line 65. That casting doesn't make sense if one of the paths is a zipfile.

I changed the invocation to simply imr.files('namespace') and the error still occurs, so the issue isn't due to presence or approach of data at all.

@jaraco jaraco self-assigned this Sep 12, 2023
@jaraco
Copy link
Member

jaraco commented Sep 12, 2023

Looking through the test suite, there's no coverage of namespace packages in zip files, so I'll address that first (and expect it to capture the failed expectation).

jaraco added a commit that referenced this issue Sep 18, 2023
jaraco added a commit that referenced this issue Sep 19, 2023
jaraco added a commit that referenced this issue Sep 19, 2023
jaraco added a commit that referenced this issue Sep 19, 2023
jaraco added a commit that referenced this issue Sep 19, 2023
jaraco added a commit that referenced this issue Sep 19, 2023
clrpackages pushed a commit to clearlinux-pkgs/pypi-importlib_resources that referenced this issue Sep 26, 2023
….0.1 to version 6.1.0

Jason R. Coombs (19):
      Add API docs. Closes #245.
      Pin against sphinx 7.2.5 as workaround for sphinx/sphinx-doc#11662. Closes jaraco/skeleton#88.
      Allow GITHUB_* settings to pass through to tests.
      Remove spinner disablement. If it's not already fixed upstream, that's where it should be fixed.
      Clean up 'color' environment variables.
      Add diff-cover check to Github Actions CI. Closes jaraco/skeleton#90.
      Add descriptions to the tox environments. Closes jaraco/skeleton#91.
      Add FORCE_COLOR to the TOX_OVERRIDE for GHA. Requires tox 4.11.1. Closes jaraco/skeleton#89.
      Replace static zip fixtures with dynamically generated zip fixtures built from the same modules as found on the file system.
      Separate 'disk' concern of namespace tests.
      Prefer ``pass_env`` in tox config. Preferred failure mode for tox-dev/tox#3127 and closes jaraco/skeleton#92.
      In zip namespace fixtures, explicitly generate the directory entries implied by children. Workaround for python/cpython#59110.
      Add xfail tests for namespace packages in a zip, capturing missed expectation reported in python/importlib_resources#287.
      Update MultiplexedPath to expect Traversable and add a compatibility shim with deprecation warning.
      Update tests for MultiplexedPath to pass traversables, addressing some deprecation warnings.
      Update changelog
      When constructing a MultiplexedPath, resolve submodule_search_locations to Traversable objects. Closes python/importlib_resources#287.
      Honor backslashes in inner paths as found in submodule_search_locations.
      Finalize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants