Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++/python] Fixes for dense arrays and core 2.27/dev [WIP] #3244

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

johnkerl
Copy link
Member

@johnkerl johnkerl commented Oct 25, 2024

Issue and/or context: As tracked on issue #2407 / [sc-51048].

Changes:

Some findings from TileDB-Inc/centralized-tiledb-nightlies#25. There is more to do though.

Notes for Reviewer:

This PR is a work in progress. It is not ready for review.

Python read-OOM repro

#!/usr/bin/env python

import tiledbsoma
import pyarrow as pa
import sys
import os, shutil

# Any dense 2D array with small current domain but huge (~ 2**63) core domain
uri = 'dense-dnda-227'

print("RD-DNDA URI", uri)

dnda = tiledbsoma.open(uri)
# This next line will slowly OOM -- watch htop
o = dnda.read()

R write-fail repro

$ cat wr-lldb.r
#!/usr/bin/env Rscript

library(tiledbsoma)
library(arrow)

uri <- "data-dnda-r"
element_type = arrow::float32()
shape = c(10, 20)

cat("CR-DNDA URI     ", uri, "\n")
cat("CR-DNDA SHAPE   ", shape, "\n")

if (dir.exists(uri)) unlink(uri, recursive=TRUE)
dnda <- tiledbsoma::SOMADenseNDArrayCreate(uri, type=element_type, shape=shape)

cat("WR-DNDA URI", uri, "\n")
dnda$write(matrix(1:12, nrow=3,ncol=4))

dnda$close()
$ ./wr-lldb.r
CR-DNDA URI      data-dnda-r
CR-DNDA SHAPE    10 20
WR-DNDA URI data-dnda-r
Error: WriterBase: Buffer sizes check failed; Invalid number of cells given for attribute 'soma_data' (12 != 18446744073709551615)
Execution halted

@johnkerl johnkerl marked this pull request as draft October 25, 2024 22:08
@johnkerl johnkerl changed the title [c++/python] Fixes for dense arrays and core 2.27 [WIP] [c++/python] Fixes for dense arrays and core 2.27/dev [WIP] Oct 25, 2024
@johnkerl
Copy link
Member Author

johnkerl commented Oct 27, 2024

Investigation so far shows that Python dense-array write calls C++ set_array_data as well as set_dim_points, while R dense-array write calls only the former and not the latter.

cc @mojaveazure and your observation the other day that the dense-array writer doesn't handle coords. Furthermore, as presently coded, it can only write the full core domain of the array. And now that with the new-shape feature flag enabled, the core domain (soma maxdomain) is huge and the core current domain (soma domain) is small.

Here is a repro that fails even with the new-shape feature flag disabled, proving that this bug is older than the new-shape mod:

#!/usr/bin/env Rscript

library(tiledbsoma)
library(arrow)

uri <- "data-dnda-r"
element_type = arrow::float32()
shape = c(10, 20)

cat("CR-DNDA URI     ", uri, "\n")
cat("CR-DNDA SHAPE   ", shape, "\n")

if (dir.exists(uri)) unlink(uri, recursive=TRUE)

dnda <- tiledbsoma::SOMADenseNDArrayCreate(uri, type=element_type, shape=shape)

cat("WR-DNDA URI", uri, "\n")

dnda$write(matrix(1:12, nrow=3,ncol=4))

dnda$close()
CR-DNDA URI      data-dnda-r 
CR-DNDA SHAPE    10 20 
WR-DNDA URI data-dnda-r 
Error: WriterBase: Buffer sizes check failed; Invalid number of cells given for attribute 'soma_data' (12 != 200)
Execution halted

Also fails with

...
m <- matrix(1:12, nrow=3,ncol=4)
coords <- list( 4:6, 7:10)
dnda$write(m, coords)
...

Same error message.

@johnkerl johnkerl force-pushed the kerl/dense-227-fixes branch 3 times, most recently from 0c00c0f to a1e5b1c Compare October 27, 2024 06:13
@johnkerl
Copy link
Member Author

The typeguard-related failures are new and weird. I created #3245 to check.

@johnkerl
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant