Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sketch of a base class for ArchiveOperations #372

Closed
wants to merge 12 commits into from

Conversation

mih
Copy link
Member

@mih mih commented May 24, 2023

Related to #184 but different in scope and aims.

@mih mih force-pushed the archives-the-3rd branch from 274ff17 to b410ea6 Compare May 24, 2023 09:11
@codecov
Copy link

codecov bot commented May 24, 2023

Codecov Report

Patch coverage: 95.06% and project coverage change: +0.05 🎉

Comparison is base (85250d5) 92.27% compared to head (487cc58) 92.32%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #372      +/-   ##
==========================================
+ Coverage   92.27%   92.32%   +0.05%     
==========================================
  Files          98      107       +9     
  Lines        7820     8005     +185     
==========================================
+ Hits         7216     7391     +175     
- Misses        604      614      +10     
Impacted Files Coverage Δ
datalad_next/iter_collections/tarfile.py 100.00% <ø> (ø)
datalad_next/archive_operations/__init__.py 75.86% <75.86%> (ø)
datalad_next/archive_operations/tarfile.py 95.00% <95.00%> (ø)
datalad_next/archive_operations/zipfile.py 95.12% <95.12%> (ø)
datalad_next/archive_operations/tests/common.py 100.00% <100.00%> (ø)
...perations/tests/test_tarfile_archive_operations.py 100.00% <100.00%> (ø)
...perations/tests/test_zipfile_archive_operations.py 100.00% <100.00%> (ø)
...lad_next/commands/tests/test_ls_file_collection.py 100.00% <100.00%> (ø)
...atalad_next/iter_collections/tests/test_iterdir.py 100.00% <100.00%> (ø)
...atalad_next/iter_collections/tests/test_itertar.py 100.00% <100.00%> (ø)
... and 3 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

This commit adds a zipfile collection iterator
including some basic tests.

This is a preparation for the implementation of
zip archive operations
This commit adds a first version of
ZipArchiveOperations, i.e. an implementation
of archive-operations on ZIP files.

TODO: tests
This commit changes comments to reflect the
current functionality of the code. In addition
it reduces the cognitive complexity of
iter_zip, as measured by "code-climate".
This commit changes a few more comments
that were lying about the code.
This commit adds tests for archive operations.
In addition it refactors the iter_collection tests
and improves the sample_<collection/archive>
fixtures.
datalad_next/archive_operations/__init__.py Outdated Show resolved Hide resolved
)

from . import ArchiveOperations
from ..config import ConfigManager
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The must not be relative importing across datalad_next subpackages. Only within subpackage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from __future__ import annotations

from .. import ArchiveOperations
from ...iter_collections.tests.utils import TestCollection
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be absolute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


from .common import check_basic_functionality
from ..tarfile import TarArchiveOperations
from ...iter_collections.tests.utils import (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be absolute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


from .common import check_basic_functionality
from ..zipfile import ZipArchiveOperations
from ...iter_collections.tests.utils import (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be absolute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)

from . import ArchiveOperations
from ..config import ConfigManager
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be absolute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

christian-monch and others added 2 commits May 25, 2023 08:26
Co-authored-by: Michael Hanke <michael.hanke@gmail.com>
This commit converts relative import across
datalad_next subpackages into absolute imports

@contextmanager
@abstractmethod
def open(self, item: Any) -> IO:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely needs to become

Suggested change
def open(self, item: Any) -> IO:
def open(self, item: PurePosixPath) -> IO:

to set concrete expectations. The archivist/datalad-archives use case will always come with such paths. I am not sure of particular archive formats use non-POSIX paths internally, but knowing the type of input path should help make any transformations that are necessary.

Are there any archives that use non-path item/member identification schemes?

raise NotImplementedError

@abstractmethod
def __iter__(self) -> Generator[FileSystemItem, None, None]:
Copy link
Member Author

@mih mih May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that this is a good choice, but maybe it is. We are operating under the assumption that all iterator will produce that. I have no good feeling whether that is valid.

Maybe being too strict is actually better -- need to think about this more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share a "slightly weird" feeling. It is better than TarfileItem. I think we have to answer a general question:

  • Should the yield-type of ArchiveOperations.__iter__ and of iter_<x> be in the same class hierarchy?

For example:

# The name below is stupid and just used to illustrate the point 
@dataclass
class CollectionOrArchiveItem:
    name: PurePosixPath




@dataclass
class FilesystemItem(CollectionOrArchiveItem):
    type: FileSystemItemType
    size: int
    mtime: float | None = None
    mode: int | None = None
    uid: int | None = None
    gid: int | None = None
    link_target: PurePath | None = None

Pro: it would make sense because both are members of some of set that distinguishes its elements by their name (although the elements usually have additional properties, the names are the unique distinction criteria). So there could be a common base-class for those types that basically contains a name.

Con: there are distinctions between the names in archives and names in collections. Names in archives are specified by the archive specs and should be identical across platforms. Names in collections might not be identical across platforms, i.e. Windows-dirs vs. Unix-dirs, unless they are represented as strings or a sequence of bytes (which we are currently doing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I arrived at similar conclusions. Please check #350 and specifically 63d418e for my approach from 1-2 weeks ago.

This was referenced May 26, 2023
@mih
Copy link
Member Author

mih commented May 29, 2023

I am closing this PR now. The actual "Sketch of a base class for ArchiveOperations" has already been merged into main with #384. Other aspects of the diff here have been proposed at separate PR(s):

The remaining pieces also need to go down this path.

@mih mih closed this May 29, 2023
@mih mih deleted the archives-the-3rd branch December 18, 2023 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants