-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider indicating inherited coordinates & dimensions in DataTree repr #9463
Comments
Consider the following DataTree with inherited coordinates: tree = DataTree.from_dict({
'/': Dataset(coords={'x': [1]}),
'/first_child': None,
'/second_child': Dataset({'foo': ('x', [0])}),
}) Here is the current repr of the root and child nodes:
I think these could be improved by removing inherited coordinate/dimensions, but only if they already already showed on their parent in the same repr. In cases where we are printing a child node only, we should also indicate inherited coordinates/dimensions (because they determine what is valid on the child node). These could look something like:
|
I think I generally like this suggestion, but the "Inherited dimensions" gets messy if you have a child node with some dimensions inherited and some not. Since dimensions are not in fact inherited (coordinate variables are inherited, which may have new dimensions), I think we should display "Dimensions: (x: 1)" regardless of whether or not the dimension came from an inherited coordinate or a local coordinate. |
Dimensions do get inherited. Consider:
|
My understanding from #9457 (comment) was that inheritance of dimensions during parent-child alignment is an internal implementation detail, and your example above is literally the only possible time that dimension will ever be displayed to the user. For all other intents and purposes I would have said that dimensions are not inherited - they are built from inheritable variables. |
I would consider the invalid Dataset object (to make the error message) an internal implementation detail, but the inherited dimension should be considered part of the data model of the child DataTree. |
Maybe for developers, but I'm still not convinced that users should have to care about this distinction. After all, we do not allow users any public way to create a Dataset that has a dimension with no corresponding variable, so from an API perspective I would argue therefore dimensions are not part of the public data model. To be concrete, if we have a child with both "inherited" and non-inherited dimensions like this tree = DataTree.from_dict({
'/': Dataset(coords={'x': [1]}),
'/child': Dataset({'foo': ('y', [0])}),
}) your suggestion would display the dimensions by separating them into inherited and non-inherited (I've chosen to display them adjacent to the lists of coordinate variables): In [9]: tree['child']
Out[9]:
<xarray.DataTree 'child'>
Group: /child
Inherited Dimensions: (x: 1)
Inherited Coordinates:
* x (x) int64 8B 1
Dimensions: (y: 1)
Dimensions without coordinates: y
Data variables:
foo (y) int64 8B 0 whereas mine would simply list them all under a common In [9]: tree['child']
Out[9]:
<xarray.DataTree 'child'>
Group: /child
Dimensions: (x: 1, y: 1)
Inherited Coordinates:
* x (x) int64 8B 1
Dimensions without coordinates: y
Data variables:
foo (y) int64 8B 0 I think one way the latter is nice is that now you can still see all the accessible dimensions on the entire child dataset on one line. EDIT: If I had 100 inherited coordinate variables, the former layout means I have to scroll down a long way to find out the dimensions of the node's data variables. But if I put everything at the top like this (where the node could have additional non-inherited coordinates in general), In [9]: tree['child']
Out[9]:
<xarray.DataTree 'child'>
Group: /child
Inherited Dimensions: (x: 1)
Dimensions: (y: 1)
Inherited Coordinates:
* x (x) int64 8B 1
...
Coordinates:
...
Dimensions without coordinates: y
Data variables:
foo (y) int64 8B 0
... now |
I agree, users don't need to userstand the distinction between dimensions & inherited dimensions. From a user perspective, neither can be edited directly. In that case, do we always display inherited dimensions as part of |
Good question. I think that if our general approach above is motivated by a principle of "only show dimensions immediately adjacent to any variables that actually depend on those dimensions", then for consistency we should only show inherited dimensions if not already shown on a parent. |
* Update DataTree repr to indicate inheritance Fixes pydata#9463 * fix whitespace * add more repr tests, fix failure * fix failure on windows * fix repr for inherited dimensions
What is your issue?
Showing these in the repr makes it harder to understand the DataTree at a glance. In particular, it makes it impossible to immediately see if coordinates are duplicated or not, e.g., consider these two cases, which have the same repr but would serialize to Zarr differently:
The simplest way to indicate inherited coordinates/dimensions would be to not display them at all. But maybe there is a different way we could indicate such dimensions/variables (less prominently).
The text was updated successfully, but these errors were encountered: