-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on basilisk env for spatialDE wrapper #20
Comments
I have seen this. The solution is to find a suitable libstdc++.so.6 in your cache and put it in /usr/lib/x86_64-linux-gnu. I will provide some instructions in the next comment. |
I should have said "a solution". I don't like this at all but it works. On linux I used
That's the file to place in /usr/lib/x86_64-linux-gnu. There may be better solutions involving a system update (perhaps to the C++ infrastructure) but I did not find any. |
This is, unfortunately, a fairly difficult problem. Here is my diagnosis: Normally, the conda installation comes packaged with its own libstdc++. This guarantees that any conda packages (and the conda-packaged Python) will work as they have been compiled against this specific version of libstdc++. Now, consider an R installation on the same computer. This has been compiled against the system libstdc++, which may not be the same as conda's. Normally, this would not be a problem, as R would use the system libstdc++ while conda would use its own... except... When we use reticulate, it seems that we force the conda-derived C libraries (e.g., for scipy) to use R's libstdc++. I'm not really sure how the dynamic linker works, but I would speculate that it detects that something from libstdc++ is already present (via R) and thus it won't/can't load in another version (from conda). Environment activation doesn't seem to help; I don't think there's any environment variables that would influence this outcome. Maybe one could fiddle with If my diagnosis is correct, then the solutions suck:
I suppose we could mimic 2 by re-installing (!) R inside conda and then serializing a function across to the conda-specific R, which would allow you to use reticulate inside a version of R that runs on the same libstdc++ as the Python packages. Nasty stuff, not least because it would increase the size of the conda environments by ~500 MB at least. But at least it'll run. (And maybe the burden on the user can be reduced by having a central R installation managed by basilisk itself, then at least it'll just be created once for the entire installation and not for each environment.) |
Why? Python 3 is on almost every system those days, and, if not (e.g. on Windows) is super easy to install. |
Interestingly we see this problem on nebbiolo1 (BioC 3.15) but not on nebbiolo2 (BioC 3.16) where
Also spatialDE has not changed between release and devel so something must have changed somewhere (in basilisk, reticulate, miniconda, etc...?) that solves the problem. This gives me hope that we won't need to replace a core component like |
And FWIW I don't get this on my laptop using BioC release (like nebbiolo1):
Major difference with nebbiolo1 is that I have Ubuntu 22.04 ( sessionInfo():
|
The minor version matters, e.g., pandas 1.4.3 requires >= 3.8. I don't think one can assume that the latest Python version is available. (My laptop has 3.6.9, for example.) And certainly not if client environments require different Python versions.
Hm. The last update to basilisk in release was to change an internal license to please the Debian folks. No code changes. reticulate updated yesterday. Their It's always possible that conda itself changed the binaries some time after April, and it only got picked up by the build system now. This is because basilisk caches environments from all client packages so they aren't re-built during tests; however, all client caches are cleared upon a basilisk version bump. So, when I bumped basilisk, it would have prompted the construction of fresh Conda environments in clients like spatialDE, possibly with modified + incompatible binaries. On that note...
I notice that - as of time of writing - the last commits listed on BioC-devel are:
This suggests one possibility: the only reason that we haven't encountered errors in nebbiolo2 is because we're still using the cached environments from June. My version bumps haven't made it through to BioC-devel yet. Once they do, the caches will be cleared and the environments will be reconstructed, possibly with the incompatible libstdc++. I guess we'll find out soon enough when the next build rolls around. Amusingly enough, the June commit message is:
|
Thanks everyone for your answers. As @LTLA suggested, I tried the example code with
sessionInfo()R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] spatialDE_1.2.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 locfit_1.5-9.5
[3] here_1.0.1 dir.expiry_1.4.0
[5] lattice_0.20-45 png_0.1-7
[7] rprojroot_2.0.3 assertthat_0.2.1
[9] SingleCellExperiment_1.18.0 utf8_1.2.2
[11] R6_2.5.1 GenomeInfoDb_1.32.1
[13] backports_1.4.1 stats4_4.2.0
[15] ggplot2_3.3.6 pillar_1.7.0
[17] sparseMatrixStats_1.8.0 basilisk_1.8.0
[19] zlibbioc_1.42.0 rlang_1.0.2
[21] magick_2.7.3 S4Vectors_0.34.0
[23] R.utils_2.11.0 R.oo_1.24.0
[25] Matrix_1.4-1 checkmate_2.1.0
[27] reticulate_1.25 BiocParallel_1.30.0
[29] RCurl_1.98-1.6 munsell_0.5.0
[31] beachmat_2.12.0 DelayedArray_0.22.0
[33] HDF5Array_1.24.0 compiler_4.2.0
[35] DropletUtils_1.16.0 pkgconfig_2.0.3
[37] BiocGenerics_0.42.0 tidyselect_1.1.2
[39] SummarizedExperiment_1.26.1 gridExtra_2.3
[41] tibble_3.1.7 GenomeInfoDbData_1.2.8
[43] edgeR_3.38.0 IRanges_2.30.0
[45] matrixStats_0.62.0 fansi_1.0.3
[47] crayon_1.5.1 dplyr_1.0.9
[49] basilisk.utils_1.8.0 rhdf5filters_1.8.0
[51] bitops_1.0-7 R.methodsS3_1.8.1
[53] grid_4.2.0 jsonlite_1.8.0
[55] gtable_0.3.0 lifecycle_1.0.1
[57] DBI_1.1.2 magrittr_2.0.3
[59] scales_1.2.0 dqrng_0.3.0
[61] cli_3.3.0 scuttle_1.6.0
[63] XVector_0.36.0 remotes_2.4.2
[65] SpatialExperiment_1.6.0 limma_3.52.0
[67] filelock_1.0.2 DelayedMatrixStats_1.18.0
[69] ellipsis_0.3.2 generics_0.1.2
[71] vctrs_0.4.1 Rhdf5lib_1.18.0
[73] rjson_0.2.21 tools_4.2.0
[75] Biobase_2.56.0 glue_1.6.2
[77] purrr_0.3.4 MatrixGenerics_1.8.0
[79] parallel_4.2.0 colorspace_2.0-3
[81] rhdf5_2.40.0 BiocManager_1.30.17
[83] GenomicRanges_1.48.0 |
I am hoping this will not be a chronic problem. As long as the g++ infrastructure used by Bioc BBS (and users) is as "mature" as that employed in the miniconda3 system/ecosystem used by basilisk (over which we have control) we should be able to state conditions under which this failure won't occur. We jumped ahead in the choice of miniconda3 version because only a very recent one supported both intel and ARM macOS. How to learn about the libstdc++.so versions used by a) our apt-gets for g++ and b) conda components, does not seem particularly simple. I guess on linux we can see .so.6.0.nn as a symlink target for .so.6. Looks like the nn is available for the conda components too:
So I'd think there's a way of catching the mismatch and providing guidance on how to solve. And, down the road, we can avoid mismatches in just the way that we have in the 2.5 year history of basilisk, by verifying compatibilities between BBS and miniconda components in use by basilisk. |
From some testing on the BioC-devel Docker image, it seems that conda is grabbing the very latest acceptable version of scipy. This version requires a later version of libstdc++ than is present in the system. I realized that SpatialDE doesn't pin the versions of all its dependencies, which means that conda is free to resolve to the latest version of scipy. After some hellish trial and error, 1.8.1 seems like the last working version. So, I get:
Moral of the story? Pin your package versions. On a more general note, this episode implies that, if reticulate must be used, conda is not able to fully insulate developers from the vagaries of the operating system. My attempt to install a conda-managed R didn't seem to work as it comes with its own libstdc++ that is still different from scipy's requirements, so we're back at the same problem. |
It seems that the build system is not without a sense of irony: http://bioconductor.org/checkResults/devel/bioc-LATEST/basilisk/nebbiolo2-checksrc.html In particular:
I have prepared a working solution based on a Conda-supplied internal R, which does seem to show some promise after a bit more elbow grease. You can try it out by installing Bioconductor/basilisk.utils#6 followed by #21. It requires you to pass in a character vector of "problematic" packages in your |
Fixes to basilisk (and basilisk.utils) have been pushed to BioC-devel. It doesn't really "solve" the @davidecrs Some refactoring of your package will be necessary for basilisk to trigger the fallback. out <- basilisk::basiliskRun(
env = spatialDE_env,
fun = .naiveDE_stabilize,
counts = counts,
testload = "scipy.optimize"
) However, one major limitation of the fallback mechanism is that the function (in That said, .naiveDE_stabilize2 <- function(proc, counts) {
basiliskRun(proc, function(counts, store) {
naiveDE <- import("NaiveDE")
## NaiveDE.stabilize requires data.frame input to work
df_py <- r_to_py(as.data.frame(counts))
store$stabilized <- naiveDE$stabilize(df_py)
invisible(NULL)
}, counts=counts, persist=TRUE)
}
.naiveDE_regress_out2 <- function(proc, sample_info) {
basiliskRun(proc, function(sample_info, store) {
naiveDE <- import("NaiveDE")
sample_info_py <- r_to_py(sample_info)
store$res <- naiveDE$regress_out(sample_info_py, store$stabilized, "np.log(total_counts)")
invisible(NULL)
}, sample_info=sample_info, persist=TRUE)
} which allows you to do: proc <- basiliskStart(spatialDE_env, testload="scipy.optimize")
# on.exit(basiliskStop(proc)) # run inside a function
.naiveDE_stabilize2(proc, counts)
.naiveDE_regress_out2(proc, sample_info)
# If you want to get any of the stored results, you could do:
basiliskRun(proc, function(store) {
as.matrix(store$stabilized)
}, persist=TRUE) You can see that we re-use the same process for multiple calls to Python within the SpatialDE workflow, which is more efficient than spinning up a new process and transferring data to/from it for every step. This approach also ensures that we only use base R or reticulate functions within each |
Thanks @LTLA, I will try to update the package with these changes. |
Hi @LTLA, I'm trying to update the package with the suggested fixes, but I get an error. Where am I doing wrong? library(spatialDE)
library(basilisk)
library(SpatialExperiment)
spatialDE_dependencies <- c(
"python==3.8.5",
"pandas==0.25.3",
"patsy==0.5.1",
"pip==21.0.1"
)
spatialDE_env <- BasiliskEnvironment(
envname = "env", pkgname = "spatialDE",
packages = spatialDE_dependencies,
pip = c("SpatialDE==1.1.3", "NaiveDE==1.2.0")
)
naiveDE_stabilize <- function(proc, counts) {
basiliskRun(proc, function(counts, store) {
naiveDE <- import("NaiveDE")
## NaiveDE.stabilize requires data.frame input to work
df_py <- r_to_py(as.data.frame(counts))
store$stabilized <- naiveDE$stabilize(df_py)
invisible(NULL)
}, counts=counts, persist=TRUE)
}
spe <- mockSVG(return_SPE=T)
spatialCoordsNames(spe) <- c("x", "y")
coordinates <- as.data.frame(spatialCoords(spe))
x <- SummarizedExperiment::assay(spe, "counts")
proc <- basiliskStart(spatialDE_env, testload="scipy.optimize")
naiveDE_stabilize(proc, x) This is the error
This is the session info> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=it_IT.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] reticulate_1.26 SpatialExperiment_1.7.2 SingleCellExperiment_1.19.0
[4] SummarizedExperiment_1.27.2 Biobase_2.57.1 GenomicRanges_1.49.1
[7] GenomeInfoDb_1.33.7 IRanges_2.31.2 S4Vectors_0.35.3
[10] BiocGenerics_0.43.3 MatrixGenerics_1.9.1 matrixStats_0.62.0
[13] basilisk_1.9.6 spatialDE_1.3.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 locfit_1.5-9.6 here_1.0.1
[4] dir.expiry_1.5.1 lattice_0.20-45 png_0.1-7
[7] rprojroot_2.0.3 assertthat_0.2.1 utf8_1.2.2
[10] R6_2.5.1 backports_1.4.1 ggplot2_3.3.6
[13] pillar_1.8.1 sparseMatrixStats_1.9.0 zlibbioc_1.43.0
[16] rlang_1.0.5 rstudioapi_0.14 magick_2.7.3
[19] R.utils_2.12.0 R.oo_1.25.0 Matrix_1.5-0
[22] checkmate_2.1.0 BiocParallel_1.31.12 RCurl_1.98-1.8
[25] munsell_0.5.0 beachmat_2.13.4 DelayedArray_0.23.1
[28] HDF5Array_1.25.2 compiler_4.2.1 DropletUtils_1.17.1
[31] pkgconfig_2.0.3 tidyselect_1.1.2 gridExtra_2.3
[34] tibble_3.1.8 GenomeInfoDbData_1.2.8 edgeR_3.39.6
[37] codetools_0.2-18 fansi_1.0.3 dplyr_1.0.10
[40] basilisk.utils_1.9.3 rhdf5filters_1.9.0 bitops_1.0-7
[43] R.methodsS3_1.8.2 grid_4.2.1 jsonlite_1.8.0
[46] gtable_0.3.1 lifecycle_1.0.2 DBI_1.1.3
[49] magrittr_2.0.3 scales_1.2.1 dqrng_0.3.0
[52] cli_3.4.0 scuttle_1.7.4 XVector_0.37.1
[55] limma_3.53.6 filelock_1.0.2 DelayedMatrixStats_1.19.0
[58] generics_0.1.3 vctrs_0.4.1 rjson_0.2.21
[61] Rhdf5lib_1.19.2 tools_4.2.1 glue_1.6.2
[64] purrr_0.3.4 parallel_4.2.1 yaml_2.3.5
[67] colorspace_2.0-3 rhdf5_2.41.1 |
Oops. Hopefully 1.9.10 will fix this. Try installing basilisk from GitHub if you want it sooner. |
Hi @LTLA ,
I'm not sure if this is an error of basilisk, but it happened when the basilisk environment is created to run functions.
Specifically, recently the Linux build of SpatialDE http://bioconductor.org/checkResults/release/bioc-LATEST/spatialDE/nebbiolo1-buildsrc.html showed this error (without making any changes to the package).
Using a docker container, I was able to successfully install the spatialDE package from Bioc, but when I tried to run our example
it creates the basilisk and the conda env with the python libraries, but it fails showing the same error as the Linux build
This is the sessionInfo()
R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 11 (bullseye)Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] spatialDE_1.2.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 locfit_1.5-9.5
[3] here_1.0.1 dir.expiry_1.4.0
[5] lattice_0.20-45 png_0.1-7
[7] rprojroot_2.0.3 assertthat_0.2.1
[9] SingleCellExperiment_1.18.0 utf8_1.2.2
[11] R6_2.5.1 GenomeInfoDb_1.32.1
[13] backports_1.4.1 stats4_4.2.0
[15] ggplot2_3.3.6 pillar_1.7.0
[17] sparseMatrixStats_1.8.0 basilisk_1.8.0
[19] zlibbioc_1.42.0 rlang_1.0.2
[21] magick_2.7.3 S4Vectors_0.34.0
[23] R.utils_2.11.0 R.oo_1.24.0
[25] Matrix_1.4-1 checkmate_2.1.0
[27] reticulate_1.24 BiocParallel_1.30.0
[29] RCurl_1.98-1.6 munsell_0.5.0
[31] beachmat_2.12.0 DelayedArray_0.22.0
[33] HDF5Array_1.24.0 compiler_4.2.0
[35] DropletUtils_1.16.0 pkgconfig_2.0.3
[37] BiocGenerics_0.42.0 tidyselect_1.1.2
[39] SummarizedExperiment_1.26.1 gridExtra_2.3
[41] tibble_3.1.7 GenomeInfoDbData_1.2.8
[43] edgeR_3.38.0 IRanges_2.30.0
[45] matrixStats_0.62.0 fansi_1.0.3
[47] crayon_1.5.1 dplyr_1.0.9
[49] basilisk.utils_1.8.0 rhdf5filters_1.8.0
[51] bitops_1.0-7 R.methodsS3_1.8.1
[53] grid_4.2.0 jsonlite_1.8.0
[55] gtable_0.3.0 lifecycle_1.0.1
[57] DBI_1.1.2 magrittr_2.0.3
[59] scales_1.2.0 dqrng_0.3.0
[61] cli_3.3.0 scuttle_1.6.0
[63] XVector_0.36.0 SpatialExperiment_1.6.0
[65] limma_3.52.0 filelock_1.0.2
[67] DelayedMatrixStats_1.18.0 ellipsis_0.3.2
[69] generics_0.1.2 vctrs_0.4.1
[71] Rhdf5lib_1.18.0 rjson_0.2.21
[73] tools_4.2.0 Biobase_2.56.0
[75] glue_1.6.2 purrr_0.3.4
[77] MatrixGenerics_1.8.0 parallel_4.2.0
[79] colorspace_2.0-3 rhdf5_2.40.0
[81] BiocManager_1.30.17 GenomicRanges_1.48.0
Do I have to add something to the code of the basiliskEnv?
Best regards
The text was updated successfully, but these errors were encountered: