-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppor dimensional metadata #62
base: main
Are you sure you want to change the base?
Conversation
This seems fine to me? @bkamins any concerns? |
I think it is OK to add. I assume @Tokazama has some concrete use case (i.e. I would avoid adding functionalities that will not be used in practice). I have went through the code and left some small comments. |
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
Besides the use cases in the original post I have images that have metadata per dimension in the form of name, per dim machine acquisition information, and additional metadata along axes (similar to colmetadata). |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #62 +/- ##
==========================================
+ Coverage 96.36% 96.77% +0.41%
==========================================
Files 1 1
Lines 55 62 +7
==========================================
+ Hits 53 60 +7
Misses 2 2 ☔ View full report in Codecov by Sentry. |
@quinnj - all looks good. OK to merge? (I would make a release next) |
Anything needed on my end? |
Needs a second approval (which might require some changes, but most likely not) before being merged and released. |
@nalimilan - do you have an opinion on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
The column metadata tests use |
dimmetadatasupport(T::Type, dim::Int) | ||
|
||
Return a `NamedTuple{(:read, :write), Tuple{Bool, Bool}}` indicating whether | ||
values of type `T` support metadata correspond to dimension `dim`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this substantial point: colmetadatasupport
takes a single argument, i.e. it asks whether column metadata is supported at all, without considering a particular column. Shouldn't we do the same here? Once a type supports dimension metadata, it can return nothing
for dimensions that do not have any metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is could be OK as is proposed here. colmetadatasupport
takes one argument, because there is exactly one dimension always.
But what @nalimilan is also OK - it depends on what @Tokazama thinks is better (i.e. do we imagine that only selected dimensions would have metadata?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "there is exactly one dimension always"? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I get it now, you mean that columns are the second dimension so colmetadatasupport(T)
is similar to dimmetadatasupport(T, 2)
.
I still think there's a problem though: dimmetadata(x)
returns a tuple of metadata for each dimension. But how can you check whether you can call dimmetadata(x)
without getting an error? For that you would need to be able to call dimmetadatasupport(T)
.
Question on the intended use: is this PR is about storing information about a dimension itself (e.g. its name or unit), or about its elements (e.g. row names for dimension 1), or both? If the second option, I wonder how this interacts with the existence of column metadata, as columns are usually considered as dimension 2. Also, how does this relate to AxisArrays, NamedArrays, NamedDims, etc.? |
I assumed this is completely detached. I.e. |
The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata. |
@bkamins Sure. But e.g. for a
@Tokazama I guess that would work. Then you mean that the intended use case for this PR is both metadata about a dimension itself and metadata about its elements (rows, columns...), right? |
Exactly! I figured the alternative was having an additional metadata set of methods (e.g., |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
src/DataAPI.jl
Outdated
function dimmetadatakeys(x, dim::Int) | ||
dimmetadatasupport(typeof(x), dim).read || return () | ||
throw(MethodError(dimmetadatakeys, (x, dim))) | ||
end | ||
function dimmetadatakeys(x) | ||
Dict(dim => dimmetadatakeys(x, dim) for dim in 1:ndims(x)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the discussions offhand, but for colmetadatakeys
we decided not to provide any fallback definition, in particular not to return ()
when metadata is not supported. Better be consistent for dimmetadatakeys
, otherwise the API will be confusing (or, again, also adapt colmetadatakeys
if necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I had the fall back is because the docs still say "If x
does not support column metadata return ()
". Should the docs for both of these be changed then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sentence is in the paragraph regarding the case when col
is omitted. When it's passed, the docstring says that we throw an error. We cannot change this for colmetadatakeys
anyway as it would be breaking, so better make dimmetadatakeys
consistent with that.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@nalimilan, is this okay now? |
dimmetadatasupport(T::Type, dim::Int) | ||
|
||
Return a `NamedTuple{(:read, :write), Tuple{Bool, Bool}}` indicating whether | ||
values of type `T` support metadata correspond to dimension `dim`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I get it now, you mean that columns are the second dimension so colmetadatasupport(T)
is similar to dimmetadatasupport(T, 2)
.
I still think there's a problem though: dimmetadata(x)
returns a tuple of metadata for each dimension. But how can you check whether you can call dimmetadata(x)
without getting an error? For that you would need to be able to call dimmetadatasupport(T)
.
if DataAPI.dimmetadatasupport(TestMeta, dim).read | ||
return keys(x.dims[dim]) | ||
else | ||
return () | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring says we should throw an error when dim
doesn't exist. Can you test this behavior?
@bkamins We should probably also change colmetadatakeys
?
if DataAPI.dimmetadatasupport(TestMeta, dim).read | |
return keys(x.dims[dim]) | |
else | |
return () | |
end | |
return keys(x.dims[dim]) |
|
||
If `dim` is not passed return an iterator of `dim => dimmetadatakeys(x, dim)` | ||
pairs for all dimensions that have metadata. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do the same as for colmetadatakeys
. Can you update the implementation accordingly?
If `x` does not support dimension metadata return `()`. | |
src/DataAPI.jl
Outdated
function dimmetadatakeys(x, dim::Int) | ||
dimmetadatasupport(typeof(x), dim).read || return () | ||
throw(MethodError(dimmetadatakeys, (x, dim))) | ||
end | ||
function dimmetadatakeys(x) | ||
Dict(dim => dimmetadatakeys(x, dim) for dim in 1:ndims(x)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sentence is in the paragraph regarding the case when col
is omitted. When it's passed, the docstring says that we throw an error. We cannot change this for colmetadatakeys
anyway as it would be breaking, so better make dimmetadatakeys
consistent with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, looks like I missed your last push. I've made suggestions to ensure docstrings have the same structure as existing ones. There's also one decision to make about dimmetadatasupport
(see discussion).
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
The core metadata methods (
metadata
,metadata!
, etc.) provide an intuitive method of communicating that a key-value pair provides some information about the instance of the object it is attached to. The column metadata methods similarly provide an intuitive method of communicating the a key-value pair provides some information about the column of a particular object. However, data with multiple dimensions (array, mesh, coordinates, etc.) often have metadata stored per dimension.This PR provides support for metadata corresponding to specific dimensions. New method implemented here follow the name pattern of column metadata methods (
colmetadata
todimmetadata
).