You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After a long journey of configuring different software versions on HPC (cf. #333 ), I ended up finding countless and inconsistent errors across nodes and sessions in HPC. Now I am trying to move on to a fully containerized approach, where we use an Apptainer image with recent stable versions of GDAL and its dependencies then mount the project root to a container internal path to make the container detect data files. container-engine branch includes ongoing works for that transition. According to this approach, we submit a job with a R script with tar_make() or tar_make_future() command with sufficient amount of threads and memory (e.g., 80 threads and 640GB of memory) to SLURM, then parallelize the workload by crew or future.callr inside the container.
Apptainer image is based on the geospatial:latest Dockerfile available in the rocker-versioned2 repository (Ubuntu 22.04, GDAL 3.4.1).
crew based: nested parallelism failed, especially with future::multicore plan inside a mirai worker. Copilot argued that nested parallelism is not supported in mirai:
Typically, each worker in a parallel computing setup like the one provided by the mirai package in R is expected to use a single core. This is because each worker is usually a separate process, and each process is typically run on a single core.
However, it's important to note that this doesn't mean that the entire computation is limited to a single core. The idea behind parallel computing is to distribute the computation across multiple workers, each running on its own core, to speed up the computation.
If you're using the future package for parallel computing, you can specify the plan to use multiple cores with plan(multicore), plan(multiprocess), or plan(cluster, workers = N), where N is the number of cores.
Nested parallelism, where each worker itself tries to use multiple cores, can be more complex to manage and is not supported by all parallel computing frameworks. If you're trying to use nested parallelism with mirai and future, you might encounter issues if mirai is not designed to handle nested parallelism or if it's not compatible with the parallel backend you're using with future.
future.callr with future::plan(future.callr) works okay and I confirmed that it submitted multiple workers simultaneously.
A very strange behavior was found in vector operations in this approach, where the intersection between the unique sites and the Ecoregion polygons returned the different number of results (1096 in triton run, 1051 in Apptainer run). I attempted to repair the Ecoregion polygons by terra::makeValid() or terra::buffer(x, width=0) in no avail.
I am still working on investigating the issues and try to figure out what the exact cause is; I feel much more efforts are put into this work than what I expected and it is getting more complex as the time goes.
The text was updated successfully, but these errors were encountered:
renv experiment needs figuring out an undetected GitHub packages such as beethoven and amadeus. Hash, repository URL, and other properties are not populated in renv.lock file when a renv is initiated. I will investigate this issue thoroughly.
After a long journey of configuring different software versions on HPC (cf. #333 ), I ended up finding countless and inconsistent errors across nodes and sessions in HPC. Now I am trying to move on to a fully containerized approach, where we use an Apptainer image with recent stable versions of GDAL and its dependencies then mount the project root to a container internal path to make the container detect data files. container-engine branch includes ongoing works for that transition. According to this approach, we submit a job with a R script with
tar_make()
ortar_make_future()
command with sufficient amount of threads and memory (e.g., 80 threads and 640GB of memory) to SLURM, then parallelize the workload bycrew
orfuture.callr
inside the container.Apptainer image is based on the
geospatial:latest
Dockerfile available in the rocker-versioned2 repository (Ubuntu 22.04, GDAL 3.4.1).crew
based: nested parallelism failed, especially withfuture::multicore
plan inside amirai
worker. Copilot argued that nested parallelism is not supported inmirai
:future.callr
withfuture::plan(future.callr)
works okay and I confirmed that it submitted multiple workers simultaneously.A very strange behavior was found in vector operations in this approach, where the intersection between the unique sites and the Ecoregion polygons returned the different number of results (1096 in
triton
run, 1051 in Apptainer run). I attempted to repair the Ecoregion polygons byterra::makeValid()
orterra::buffer(x, width=0)
in no avail.I am still working on investigating the issues and try to figure out what the exact cause is; I feel much more efforts are put into this work than what I expected and it is getting more complex as the time goes.
The text was updated successfully, but these errors were encountered: