Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fs] support hfs.ls on a bucket #14176

Merged
merged 14 commits into from
Feb 6, 2024
Merged

Commits on Jan 22, 2024

  1. [fs] support hfs.ls on a bucket

    Teaches `hfs.ls('gs://bucket/')` to list the files and directories at the top-level of the bucket.
    
    In `main` that command raises because this line of `_ls_no_glob` raises:
    
    ```python3
    maybe_sb_and_t, maybe_contents = await asyncio.gather(
        self._size_bytes_and_time_modified_or_none(path), ls_as_dir()
    )
    ```
    
    In particular, `statfile` raises a cloud-specific, esoteric error about a malformed URL or empty
    object names:
    
    ```python3
    async def _size_bytes_and_time_modified_or_none(self, path: str) -> Optional[Tuple[int, float]]:
        try:
            # Hadoop semantics: creation time is used if the object has no notion of last modification time.
            file_status = await self.afs.statfile(path)
            return (await file_status.size(), file_status.time_modified().timestamp())
        except FileNotFoundError:
            return None
    ```
    
    I decided to add a sub-class of `FileNotFoundError` which is self-describing: `IsABucketError`.
    
    I changed most methods to raise that error when given a bucket URL. The two interesting cases:
    
    1. `isdir`. This raises an error but I could also see this returning `True`. A bucket is like a
       directory whose path/name is empty.
    
    2. `isfile`. This returns False but I could also see this raising an error. This just seems
       convenient, we know the bucket is not a file so we should say so.
    
    ---
    
    Apparently `hfs.ls` had no current tests because the globbing system doesn't work with Azure
    https:// URLs. I fixed it to use `AsyncFSURL.with_new_path_component` which is resilient to Azure
    https weirdness. However, I had to change `with_new_path_component` to treat an empty path in a
    special way. I wanted this to hold:
    
    ```
    actual = str(afs.parse_url('gs://bucket').with_new_path_component('bar'))
    expected = 'gs://bucket/bar'
    assert actual == expected
    ```
    
    But `with_new_path_component` interacts badly with `GoogleAsyncFSURL.__str__` to return this:
    
    ```
    'gs://bucket//bar'
    ```
    Dan King committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    7543069 View commit details
    Browse the repository at this point in the history
  2. add and user with_new_path_components

    Dan King committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    5fa7f85 View commit details
    Browse the repository at this point in the history
  3. repr for FSURL

    Dan King committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    4052520 View commit details
    Browse the repository at this point in the history
  4. handle bucket lsiting

    Dan King committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    d37da3f View commit details
    Browse the repository at this point in the history
  5. add AsyncFSURL.with_root_path

    Dan King committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    b5cb06f View commit details
    Browse the repository at this point in the history
  6. Add repr to LocalAsyncFSURL

    Dan King committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    3bf27b9 View commit details
    Browse the repository at this point in the history
  7. use with_root_path in router_fs

    Dan King committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    2ace87b View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2024

  1. use exit stacks and context managers for RouterFS

    Dan King committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    75bf0b9 View commit details
    Browse the repository at this point in the history
  2. use error_if_bucket=True

    Dan King committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    b24b709 View commit details
    Browse the repository at this point in the history
  3. bucket not path

    Dan King committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    c94fa65 View commit details
    Browse the repository at this point in the history
  4. fix override: error_if_bucket is keyword only

    Dan King committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    61c0a2c View commit details
    Browse the repository at this point in the history
  5. gcs parse-url

    Dan King committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    9ab8302 View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2024

  1. also error if we try to create a bucket

    Dan King committed Feb 5, 2024
    Configuration menu
    Copy the full SHA
    19e49b4 View commit details
    Browse the repository at this point in the history
  2. need to ensure something exists in that bucket

    Dan King committed Feb 5, 2024
    Configuration menu
    Copy the full SHA
    c459a33 View commit details
    Browse the repository at this point in the history