Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Arrow schema evolution with enumeration on existing column #591

Merged
merged 6 commits into from
Sep 25, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ jobs:
run: cat $HOME/work/TileDB-R/TileDB-R/tiledb.Rcheck/00install.out
if: failure()

- name: Show test log
run: cat $HOME/work/TileDB-R/TileDB-R/tiledb.Rcheck/00check.out
if: failure()

#- name: Coverage
# if: ${{ matrix.os == 'ubuntu-latest' }}
# run: ./.github/r-ci.sh coverage
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## Improvements

* Array schema evolution has been extended to support enumerations (#590)
* Array schema evolution has been extended to support enumerations (#590, #591)


# tiledb 0.21.0
Expand Down
16 changes: 16 additions & 0 deletions R/TileDBArray.R
Original file line number Diff line number Diff line change
Expand Up @@ -923,6 +923,22 @@ setMethod("[", "tiledb_array",
if (use_arrow) {
rl <- libtiledb_to_arrow(abptr, qryptr, dictionaries)
at <- .as_arrow_table(rl)

## special case from schema evolution could have added twice so correcting
for (n in colnames(at)) {
v <- at[[n]]$as_vector()
lvls <- levels(v)
if (inherits(v, "factor")) {
vec <- as.integer(v)
if (min(vec, na.rm=TRUE) == 2 && max(vec, na.rm=TRUE) == length(lvls) + 1) {
vec <- vec - 1L
attr(vec, "levels") <- attr(v, "levels")
class(vec) <- class(v)
at[[n]] <- vec
}
}
}

## if dictionaries are to be injected at the R level, this does it
#for (n in names(dictionaries)) {
# if (!is.null(dictionaries[[n]])) {
Expand Down
13 changes: 11 additions & 2 deletions inst/tinytest/test_arrayschemaevolution.R
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,18 @@ attr <- tiledb_attribute_set_enumeration_name(attr, "frobo")
ase <- tiledb_array_schema_evolution_add_attribute(ase, attr)
tiledb_array_schema_evolution_array_evolve(ase, uri)

## check
arr <- tiledb_array(uri, return_as="data.table")
## check as data.frame
arr <- tiledb_array(uri, return_as="data.frame")
res <- arr[]
expect_true(is.factor(res$val))
expect_equal(levels(res$val), enums)
expect_equal(as.integer(res$val), c(1:5,5:1))

## check as arrow
if (!requireNamespace("arrow", quietly=TRUE)) exit_file("No 'arrow' package.")
arr <- tiledb_array(uri, return_as="arrow")
res <- arr[]
v <- res[["val"]]$as_vector()
expect_true(is.factor(v))
expect_equal(levels(v), enums)
expect_equal(as.integer(v), c(1:5,5:1))
2 changes: 2 additions & 0 deletions inst/tinytest/test_tiledbarray.R
Original file line number Diff line number Diff line change
Expand Up @@ -1442,6 +1442,8 @@ if (v[["major"]] == 2L && v[["minor"]] %in% c(4L, 10L, 11L, 12L, 14L)) exit_file
## CI issues at GitHub for r-release on Windows Server 2019
if (getRversion() < "4.3.0" && Sys.info()[["sysname"]] == "Windows") exit_file("Skip remainder for R 4.2.* on Windows")

if (Sys.info()[["sysname"]] == "Darwin") exit_file("Skip remainder on macOS")

## check for incomplete status on unsuccessful query -- this no longer fails following some changes made
#set_allocation_size_preference(128) # too low for penguins to query fully
#array <- tiledb_array(uri, return_as="data.frame", query_layout="ROW_MAJOR")
Expand Down
Loading