Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5.Dataset as an AbstractArray #930

Open
mkitti opened this issue May 4, 2022 · 7 comments
Open

HDF5.Dataset as an AbstractArray #930

mkitti opened this issue May 4, 2022 · 7 comments

Comments

@mkitti
Copy link
Member

mkitti commented May 4, 2022

Should HDF5.Dataset be an AbstractArray?

Alternatively, should we build a wrapper?

julia> struct DatasetArrayWrapper{T, N} <: AbstractArray{T,N}
           parent::HDF5.Dataset
       end

julia> DatasetArrayWrapper(parent::HDF5.Dataset) = DatasetArrayWrapper{eltype(parent), ndims(parent)}(parent)
DatasetArrayWrapper

julia> Base.size(w::DatasetArrayWrapper) = size(w.parent)

julia> Base.getindex(w::DatasetArrayWrapper, I...) = getindex(w.parent, I...)

julia> Base.getindex(w::DatasetArrayWrapper, I::Vararg{Int}) = getindex(w.parent, I...)

We almost have the entire AbstractArray interface implemented for HDF5.Dataset:
https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-array

@musm
Copy link
Member

musm commented May 18, 2022

How does the wrapper benefit over a direct subtyping of Dataset <: AbstractArray

Logically, makes sense that it would be a subtype. I think the original rationale might have been to limit it's interface.
Are there any potential unintended side effects of this change?

@mkitti
Copy link
Member Author

mkitti commented May 20, 2022

We probably should implement AbstractDiskArray.jl. Perhaps stealing this, but querying more of the information rather than caching it like this implementation: https://github.com/AStupidBear/HDF5Utils.jl/blob/master/src/diskarrays.jl

@musm
Copy link
Member

musm commented May 20, 2022

Do you think it would be best to make it a direct subtype:

HDF5.Dataset <: AbstractDiskArray 

vs. what HDF5Utils is doing where

HDF5DiskArray <: AbstractDiskArray

and then defining

function HDF5DiskArray(dset::HDF5Dataset)
...

Which I think they are only doing since they don't own the HDF5.Dataset type.

@mkitti
Copy link
Member Author

mkitti commented May 20, 2022

Yes, we should look into making it a direct subtype. This is a 0.17 topic though.

In contrast to the HDF5Utils.jl implementation, I think our implementation should be lazy.

Another issue if we make it a direct subtype is that we will need to parameterize it. We will need to consider how to mitigate the compilation latency in this case.

@rafaqz
Copy link
Contributor

rafaqz commented Jun 7, 2022

Please do subtype AbstractDiskArray. You can also just use the macros to apply the interface to you own types without subtyping, but subtyping is simpler for dispatch.

Myself and @meggart have wanted this to happen for quite a while, so if you have any problems feel free to ping.

@mkitti mkitti added this to the 0.17.0 milestone Jun 29, 2023
@mkitti mkitti modified the milestones: 0.17.0, 0.18.0 Aug 30, 2023
@mkitti
Copy link
Member Author

mkitti commented Aug 30, 2023

I'm going to let this slip to 0.18 since 0.17 already contains other breaking changes.

@meggart
Copy link

meggart commented Jan 12, 2024

Just linking my original implementation suggestion for a HDF5DiskArray: #615

In case it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants