Double check and document (if needed) `.name` conventions for `iter_collections` #554

mih · 2023-12-06T09:20:23Z

I cannot put my finger on it, but I feel the nature of the value of .name for a reported item is non-uniform.

I am not aware of a documentation for the convention, if how iterators should behave.

The .name should always be a PurePath|Path objects.

However, should it carry the base path (key argument to the iterator) in it? Or not?

It would make sense to me to never have it. Rational:

There is no reason while a (file) collection should be on a file system. It could have a very non-Path base address and no hierarchical organization inside (think object store: some URL, plus some object ID).

We need to go through and verify that we did not overfit to a filesystem use case in an implementation.

The text was updated successfully, but these errors were encountered:

mih · 2024-01-05T11:00:23Z

#581 brought up a related issue. It makes sense to have the name attribute be as cheap as possible, and only provide access to more complex/expensive types via lazily evaluated properties.

Previously, the `name` attribute was some kind of `PurePath` instance. The precise nature varied across item types and corresponding iterators. This led to two problems: - Performance (see datalad#581): the creation of a Path instance is not particularly cheap. If a consumer does not need such an instance, and unconditional creation is a waste of resources - Abstraction (see datalad#430): the Path representation of a path-like name does not always match the exact semantics of the original name (it might have a trailing slash that has a purpose, etc) Both problems are now addressed by relaxing the type of `.name` of a `PathBasedItem`. It can now be anything. Consumers that require an actual Path instance can use the `.path` attribute. An analog access is also implemented for `.link_target` (which now remains literal), that is accompanied by `.link_target_path`, where it was necessary. This change in approach removed the need for the `_ZipFileDirPath` workaround that was used to re-establish compatibility of a Path-based `.name` with the conventions in a `ZipFile` collection/container. With the `.name` attribute retaining its native semantics, the workaround is no longer needed as was removed. In order to document the (lack of) implemented homogeneous conventions for path-based items, the docstrings of all iterators have been amended with information on the nature of the `.name` attribute yielded by them. The corresponding data classes have received docstrings for the newly added properties for Path access. These properties uniformly use the `cached_property` decorator. For the lifetime of an item, the nature of such a path should never change, and caching it automatically addresses the significant creation cost on accessing a path representation repeatedly. Closes datalad#554 Closes datalad#581

mih mentioned this issue Jan 5, 2024

Implement iter_gittree() and support it in ls_file_collections #580

Merged

1 task

mih mentioned this issue Jan 5, 2024

Change semantics of PathBasedItem names #583

Merged

mih closed this as completed in #583 Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double check and document (if needed) `.name` conventions for `iter_collections` #554

Double check and document (if needed) `.name` conventions for `iter_collections` #554

mih commented Dec 6, 2023

mih commented Jan 5, 2024

Double check and document (if needed) .name conventions for iter_collections #554

Double check and document (if needed) .name conventions for iter_collections #554

Comments

mih commented Dec 6, 2023

mih commented Jan 5, 2024

Double check and document (if needed) `.name` conventions for `iter_collections` #554

Double check and document (if needed) `.name` conventions for `iter_collections` #554