-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sketch of a base class for ArchiveOperations
#372
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #372 +/- ##
==========================================
+ Coverage 92.27% 92.32% +0.05%
==========================================
Files 98 107 +9
Lines 7820 8005 +185
==========================================
+ Hits 7216 7391 +175
- Misses 604 614 +10
☔ View full report in Codecov by Sentry. |
This commit adds a zipfile collection iterator including some basic tests. This is a preparation for the implementation of zip archive operations
This commit adds a first version of ZipArchiveOperations, i.e. an implementation of archive-operations on ZIP files. TODO: tests
This commit changes comments to reflect the current functionality of the code. In addition it reduces the cognitive complexity of iter_zip, as measured by "code-climate".
This commit changes a few more comments that were lying about the code.
This commit adds tests for archive operations. In addition it refactors the iter_collection tests and improves the sample_<collection/archive> fixtures.
) | ||
|
||
from . import ArchiveOperations | ||
from ..config import ConfigManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The must not be relative importing across datalad_next
subpackages. Only within subpackage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
from __future__ import annotations | ||
|
||
from .. import ArchiveOperations | ||
from ...iter_collections.tests.utils import TestCollection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be absolute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
from .common import check_basic_functionality | ||
from ..tarfile import TarArchiveOperations | ||
from ...iter_collections.tests.utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be absolute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
from .common import check_basic_functionality | ||
from ..zipfile import ZipArchiveOperations | ||
from ...iter_collections.tests.utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be absolute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
) | ||
|
||
from . import ArchiveOperations | ||
from ..config import ConfigManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be absolute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Co-authored-by: Michael Hanke <michael.hanke@gmail.com>
This commit converts relative import across datalad_next subpackages into absolute imports
|
||
@contextmanager | ||
@abstractmethod | ||
def open(self, item: Any) -> IO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely needs to become
def open(self, item: Any) -> IO: | |
def open(self, item: PurePosixPath) -> IO: |
to set concrete expectations. The archivist
/datalad-archives
use case will always come with such paths. I am not sure of particular archive formats use non-POSIX paths internally, but knowing the type of input path should help make any transformations that are necessary.
Are there any archives that use non-path item/member identification schemes?
raise NotImplementedError | ||
|
||
@abstractmethod | ||
def __iter__(self) -> Generator[FileSystemItem, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced that this is a good choice, but maybe it is. We are operating under the assumption that all iterator will produce that. I have no good feeling whether that is valid.
Maybe being too strict is actually better -- need to think about this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share a "slightly weird" feeling. It is better than TarfileItem
. I think we have to answer a general question:
- Should the yield-type of
ArchiveOperations.__iter__
and ofiter_<x>
be in the same class hierarchy?
For example:
# The name below is stupid and just used to illustrate the point
@dataclass
class CollectionOrArchiveItem:
name: PurePosixPath
@dataclass
class FilesystemItem(CollectionOrArchiveItem):
type: FileSystemItemType
size: int
mtime: float | None = None
mode: int | None = None
uid: int | None = None
gid: int | None = None
link_target: PurePath | None = None
Pro: it would make sense because both are members of some of set that distinguishes its elements by their name (although the elements usually have additional properties, the names are the unique distinction criteria). So there could be a common base-class for those types that basically contains a name.
Con: there are distinctions between the names in archives and names in collections. Names in archives are specified by the archive specs and should be identical across platforms. Names in collections might not be identical across platforms, i.e. Windows-dirs vs. Unix-dirs, unless they are represented as strings or a sequence of bytes (which we are currently doing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am closing this PR now. The actual "Sketch of a base class for ArchiveOperations" has already been merged into The remaining pieces also need to go down this path. |
Related to #184 but different in scope and aims.