Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing for arbitrary elements of dimensions #107

Open
Artur-man opened this issue Aug 26, 2024 · 7 comments
Open

Indexing for arbitrary elements of dimensions #107

Artur-man opened this issue Aug 26, 2024 · 7 comments

Comments

@Artur-man
Copy link

Artur-man commented Aug 26, 2024

User story

Hey there,

Is it possible to access arbitrary elements of dimensions (like it is done in Rarr with index arguement) instead of using slice. Is this already implemented or not available at the moment ?

zarr.array <- pizzarr::zarr_open(store = "data/mat.zarr")
mat <- array(1:350, c(10, 5, 7))
zarr.array$create_dataset("assay", data = mat, shape = dim(mat))
zarr.array$get_item("assay")$get_item(list(slice(1,6,2), slice(1, 2), slice(1, 1)))$data
, , 1

     [,1] [,2]
[1,]    1   11
[2,]    3   13
[3,]    5   15

It is possible to access a single element.

zarr.array$get_item("assay")$get_item(c(1, 2, 1))$data
, , 1

     [,1]
[1,]   72

But with multiple elements, it doesnt work.

zarr.array$get_item("assay")$get_item(c(1:2, 2, 1))$data
Error in check_selection_length(selection, shape) : TooManyIndicesError
zarr.array$get_item("assay")$get_item(list(c(1,6), 2, 1))$data
Error in if (is.na(stop)) { : the condition has length > 1
@keller-mark
Copy link
Owner

$get_item(c(1:2, 2, 1))

Indexing with numeric vectors is difficult since the elements become flattened by default, unlike with lists

> c(1:2, 2, 1)
[1] 1 2 2 1
> list(1:2, 2, 1)
[[1]]
[1] 1 2

[[2]]
[1] 2

[[3]]
[1] 1

Perhaps you can do something fancy with rlang https://rlang.r-lib.org/reference/topic-defuse.html and prevent the flattening behavior / intercept prior to flattening.

The vector vs. list issue aside, there is this outstanding need to support integer indexing: #43

However at the moment, you could turn lists of integers into lists of slices in order to work around this:

to_slice <- function(i) {
  if(length(i) == 1) {
    return(slice(i, i))
  }
  if(length(i) == 2) {
    return(slice(i[1], i[2]))
  }
  if(length(i) == 3) {
    return(slice(i[1], i[2], i[3]))
  }
  stop("Received indexing vector with too many elements")
}
selection <- z$get_item(lapply(x, to_slice))

@keller-mark
Copy link
Owner

keller-mark commented Aug 26, 2024

We also have this bracket indexing function which may be relevant:

`[` = function(...) {

z[2, 5]

Example in test here: https://github.com/keller-mark/pizzarr/blob/main/tests/testthat/test-s3.R#L47

@Artur-man
Copy link
Author

Here is have to updated and implemented further right ? I will attempt if you guys haven't planned yet.

pizzarr/R/indexing.R

Lines 88 to 100 in f84355d

iter = function() {
# TODO: use generator/yield features from async package
dim_chunk_index <- floor(self$dim_sel / self$dim_chunk_len)
dim_offset <- dim_chunk_index * self$dim_chunk_len
dim_chunk_sel <- self$dim_sel - dim_offset
dim_out_sel <- NA
return(list(
ChunkDimProjection$new(
dim_chunk_index,
dim_chunk_sel,
dim_out_sel
)
))

@Artur-man
Copy link
Author

Artur-man commented Aug 30, 2024

I like the fact that this repo is functionally an R replica of the original zarr-python implementation. I was able to implement IntArrayDimIndexer and OrthogonalIndexer classes to get get_item to accept orthogonal selection. There are still a few bugs I need to take care of, otherwise the DelayedArray assumption of random index access is satisfied.

Here is more info on our DelayedArray extension:
https://github.com/BIMSBbioinfo/ZarrArray

Here are some examples:

# write
zarr.array <- pizzarr::zarr_open(store = "data/mat_example.zarr", mode = "w")
mat_test <- matrix(1:100, nrow = 10)
zarr.array$create_dataset("assay", data = mat_test, shape = dim(mat_test), chunks = c(2,2))
# read
zarr.array <- pizzarr::zarr_open(store = "data/mat_example.zarr", mode = "r")
a <- zarr.array$get_item("assay")
a[c(1,6,7),c(2,8,9)]$data
     [,1] [,2] [,3]
[1,]   11   71   81
[2,]   16   76   86
[3,]   17   77   87

Would you guys like a PR on this once everything is tidy ?

@dblodgett-usgs
Copy link
Collaborator

@keller-mark has the final say, but I'd be happy to get the contribution!

@keller-mark
Copy link
Owner

I agree with @dblodgett-usgs, the contribution is welcome! Compatibility with DelayedArray would be great!

@Artur-man
Copy link
Author

Awesome guys, thanks for the quick response, I will let you know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants