Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppor dimensional metadata #62

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Conversation

Tokazama
Copy link

@Tokazama Tokazama commented Jul 7, 2023

The core metadata methods (metadata, metadata!, etc.) provide an intuitive method of communicating that a key-value pair provides some information about the instance of the object it is attached to. The column metadata methods similarly provide an intuitive method of communicating the a key-value pair provides some information about the column of a particular object. However, data with multiple dimensions (array, mesh, coordinates, etc.) often have metadata stored per dimension.

This PR provides support for metadata corresponding to specific dimensions. New method implemented here follow the name pattern of column metadata methods (colmetadata to dimmetadata).

@quinnj
Copy link
Member

quinnj commented Jul 10, 2023

This seems fine to me? @bkamins any concerns?

test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Jul 10, 2023

I think it is OK to add. I assume @Tokazama has some concrete use case (i.e. I would avoid adding functionalities that will not be used in practice). I have went through the code and left some small comments.

Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
@Tokazama
Copy link
Author

I think it is OK to add. I assume @Tokazama has some concrete use case (i.e. I would avoid adding functionalities that will not be used in practice). I have went through the code and left some small comments.

Besides the use cases in the original post I have images that have metadata per dimension in the form of name, per dim machine acquisition information, and additional metadata along axes (similar to colmetadata).

@codecov
Copy link

codecov bot commented Jul 12, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b131356) 96.36% compared to head (e1b08c5) 96.77%.
Report is 1 commits behind head on main.

❗ Current head e1b08c5 differs from pull request most recent head 6a0af43. Consider uploading reports for the commit 6a0af43 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #62      +/-   ##
==========================================
+ Coverage   96.36%   96.77%   +0.41%     
==========================================
  Files           1        1              
  Lines          55       62       +7     
==========================================
+ Hits           53       60       +7     
  Misses          2        2              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Tokazama Tokazama requested a review from bkamins July 13, 2023 20:28
@bkamins
Copy link
Member

bkamins commented Jul 15, 2023

@quinnj - all looks good. OK to merge? (I would make a release next)

@Tokazama
Copy link
Author

Anything needed on my end?

@bkamins
Copy link
Member

bkamins commented Jul 18, 2023

Needs a second approval (which might require some changes, but most likely not) before being merged and released.

@bkamins
Copy link
Member

bkamins commented Aug 5, 2023

@nalimilan - do you have an opinion on this PR?

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@Tokazama
Copy link
Author

Tokazama commented Aug 7, 2023

Why not write 1 instead of using a variable? That sounds clearer to me.

The column metadata tests use :col as the argument for which column is being accessed. I thought using the variable dim instead of just 1 would make it easier to quickly read and understand what exactly was being tested.

Comment on lines +519 to +522
dimmetadatasupport(T::Type, dim::Int)

Return a `NamedTuple{(:read, :write), Tuple{Bool, Bool}}` indicating whether
values of type `T` support metadata correspond to dimension `dim`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this substantial point: colmetadatasupport takes a single argument, i.e. it asks whether column metadata is supported at all, without considering a particular column. Shouldn't we do the same here? Once a type supports dimension metadata, it can return nothing for dimensions that do not have any metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is could be OK as is proposed here. colmetadatasupport takes one argument, because there is exactly one dimension always.
But what @nalimilan is also OK - it depends on what @Tokazama thinks is better (i.e. do we imagine that only selected dimensions would have metadata?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "there is exactly one dimension always"? :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I get it now, you mean that columns are the second dimension so colmetadatasupport(T) is similar to dimmetadatasupport(T, 2).

I still think there's a problem though: dimmetadata(x) returns a tuple of metadata for each dimension. But how can you check whether you can call dimmetadata(x) without getting an error? For that you would need to be able to call dimmetadatasupport(T).

src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
@nalimilan
Copy link
Member

Question on the intended use: is this PR is about storing information about a dimension itself (e.g. its name or unit), or about its elements (e.g. row names for dimension 1), or both? If the second option, I wonder how this interacts with the existence of column metadata, as columns are usually considered as dimension 2.

Also, how does this relate to AxisArrays, NamedArrays, NamedDims, etc.?

@bkamins
Copy link
Member

bkamins commented Aug 14, 2023

I wonder how this interacts with the existence of column metadata, as columns are usually considered as dimension 2.

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

@Tokazama
Copy link
Author

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

@nalimilan
Copy link
Member

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

@bkamins Sure. But e.g. for a DataFrame we could imagine implementing dimmetadata at some point given that we define ndims, axes, getindex... So I wanted to make sure we consider whether that would make sense.

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

@Tokazama I guess that would work. Then you mean that the intended use case for this PR is both metadata about a dimension itself and metadata about its elements (rows, columns...), right?

@Tokazama
Copy link
Author

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

@bkamins Sure. But e.g. for a DataFrame we could imagine implementing dimmetadata at some point given that we define ndims, axes, getindex... So I wanted to make sure we consider whether that would make sense.

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

@Tokazama I guess that would work. Then you mean that the intended use case for this PR is both metadata about a dimension itself and metadata about its elements (rows, columns...), right?

Exactly! I figured the alternative was having an additional metadata set of methods (e.g., axesmetadata) which seemed a bit over the top since one can still accomplish similar functionality with the dim methods here. The overlap with column metadata could also be seen as redundant but tables have consistently proven to be very unique since each column can be conceptualized as its own entity.

src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated
Comment on lines 589 to 595
function dimmetadatakeys(x, dim::Int)
dimmetadatasupport(typeof(x), dim).read || return ()
throw(MethodError(dimmetadatakeys, (x, dim)))
end
function dimmetadatakeys(x)
Dict(dim => dimmetadatakeys(x, dim) for dim in 1:ndims(x))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the discussions offhand, but for colmetadatakeys we decided not to provide any fallback definition, in particular not to return () when metadata is not supported. Better be consistent for dimmetadatakeys, otherwise the API will be confusing (or, again, also adapt colmetadatakeys if necessary).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I had the fall back is because the docs still say "If x does not support column metadata return ()". Should the docs for both of these be changed then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sentence is in the paragraph regarding the case when col is omitted. When it's passed, the docstring says that we throw an error. We cannot change this for colmetadatakeys anyway as it would be breaking, so better make dimmetadatakeys consistent with that.

src/DataAPI.jl Outdated Show resolved Hide resolved
Tokazama and others added 3 commits August 16, 2023 10:25
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@Tokazama
Copy link
Author

@nalimilan, is this okay now?

src/DataAPI.jl Outdated Show resolved Hide resolved
Comment on lines +519 to +522
dimmetadatasupport(T::Type, dim::Int)

Return a `NamedTuple{(:read, :write), Tuple{Bool, Bool}}` indicating whether
values of type `T` support metadata correspond to dimension `dim`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I get it now, you mean that columns are the second dimension so colmetadatasupport(T) is similar to dimmetadatasupport(T, 2).

I still think there's a problem though: dimmetadata(x) returns a tuple of metadata for each dimension. But how can you check whether you can call dimmetadata(x) without getting an error? For that you would need to be able to call dimmetadatasupport(T).

src/DataAPI.jl Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
src/DataAPI.jl Outdated Show resolved Hide resolved
Comment on lines +117 to +121
if DataAPI.dimmetadatasupport(TestMeta, dim).read
return keys(x.dims[dim])
else
return ()
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says we should throw an error when dim doesn't exist. Can you test this behavior?
@bkamins We should probably also change colmetadatakeys?

Suggested change
if DataAPI.dimmetadatasupport(TestMeta, dim).read
return keys(x.dims[dim])
else
return ()
end
return keys(x.dims[dim])

test/runtests.jl Show resolved Hide resolved
src/DataAPI.jl Show resolved Hide resolved

If `dim` is not passed return an iterator of `dim => dimmetadatakeys(x, dim)`
pairs for all dimensions that have metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the same as for colmetadatakeys. Can you update the implementation accordingly?

Suggested change
If `x` does not support dimension metadata return `()`.

src/DataAPI.jl Outdated
Comment on lines 589 to 595
function dimmetadatakeys(x, dim::Int)
dimmetadatasupport(typeof(x), dim).read || return ()
throw(MethodError(dimmetadatakeys, (x, dim)))
end
function dimmetadatakeys(x)
Dict(dim => dimmetadatakeys(x, dim) for dim in 1:ndims(x))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sentence is in the paragraph regarding the case when col is omitted. When it's passed, the docstring says that we throw an error. We cannot change this for colmetadatakeys anyway as it would be breaking, so better make dimmetadatakeys consistent with that.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, looks like I missed your last push. I've made suggestions to ensure docstrings have the same structure as existing ones. There's also one decision to make about dimmetadatasupport (see discussion).

Tokazama and others added 15 commits December 24, 2023 14:27
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants