-
Notifications
You must be signed in to change notification settings - Fork 41
Name permanence #116
Comments
The reason fixing this is complicated is that Implementing a similar solution with I'm trying to think about whether that design might cause any other consistency problems though... |
@shoyer wondering if you have an opinions on this? Do you think this name behaviour is a subtle but important feature of xarray or a unimportant detail? |
I do think this behavior is important. I would definitely consider alternative data structures for internal storage. |
Okay I think a design like this would work, where Notice that you could technically have an entire tree of only from typing import TypeVar, Generic, Optional, OrderedDict, Dict, Mapping, Hashable, Any, Union
from xarray.core.variable import Variable
from xarray.core.utils import Frozen
Tree = TypeVar("Tree", bound="TreeNode")
class TreeNode(Generic[Tree]):
"""Unnamed tree node"""
_parent: Optional[Tree]
_children: OrderedDict[str, Tree]
@property
def children(self) -> Mapping[str, Tree]:
return Frozen(self._children)
class NamedNodeMixin:
"""
Enables a node to know its own name.
Implements path-like relationships to other nodes in its tree.
"""
_name: Optional[str]
@property
def name(self) -> Union[str, None]:
"""The name of this node."""
return self._name
@property
def path(self) -> str:
...
def relative_to(self: NamedNodeMixin, other: NamedNodeMixin) -> str:
...
class DataNode(TreeNode, Generic[Tree]):
"""Unnamed tree node which also stores data."""
# TODO this could potentially inherit from Dataset?
_parent: Optional[DataNode]
_children: OrderedDict[str, DataNode]
_variables: Dict[Hashable, Variable]
...
class DataTree(DataNode, NamedNodeMixin, Generic[Tree]):
_name: Optional[str]
_parent: Optional[DataTree]
_children: OrderedDict[str, DataNode]
_variables: Dict[Hashable, Variable]
...
@property
def children(self) -> Mapping[str, DataTree]:
children_as_datatrees = {
name: DataTree._construct_from_datanode(
name=name, datanode=child,
)
for name, child in self._children
}
return Frozen(children_as_datatrees)
@classmethod
def _construct_direct(
cls,
name: str | None = None,
parent: DataTree | None = None,
children: OrderedDict[str, DataNode] = None,
variables: dict[Any, Variable] = None,
) -> DataTree:
"""Shortcut around __init__ for internal use when we want to skip costly validation."""
...
@classmethod
def _construct_from_datanode(cls, name: str, datanode: DataNode) -> DataTree:
return DataTree._construct_direct(
name=name,
parent=datanode._parent,
children=datanode._children,
variables=datanode._variables,
)
def _construct_datatree(self, name: str) -> DataTree:
"""Used when a single child node is requested"""
child_datanode = self._children[name]
return DataTree._construct_from_datanode(name=name, datanode=child_datanode) |
Alternatively Stephan suggested today that this might be achieved just by ensuring that group insertion occurs by making a shallow copy instead. |
Xarray Datasets don't alter the names of the objects they store:
After #41 (and #115 ensures it) then
DataTree
objects behave similarly for data variables they storeHowever, currently
DataTree
objects do alter the name of childDataTree
nodes that they store.I noticed this in #115, but fixing it might be a bit complex.
The text was updated successfully, but these errors were encountered: