Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerized pipeline run #334

Open
2 tasks done
sigmafelix opened this issue May 29, 2024 · 4 comments
Open
2 tasks done

Containerized pipeline run #334

sigmafelix opened this issue May 29, 2024 · 4 comments
Assignees

Comments

@sigmafelix
Copy link
Collaborator

sigmafelix commented May 29, 2024

After a long journey of configuring different software versions on HPC (cf. #333 ), I ended up finding countless and inconsistent errors across nodes and sessions in HPC. Now I am trying to move on to a fully containerized approach, where we use an Apptainer image with recent stable versions of GDAL and its dependencies then mount the project root to a container internal path to make the container detect data files. container-engine branch includes ongoing works for that transition. According to this approach, we submit a job with a R script with tar_make() or tar_make_future() command with sufficient amount of threads and memory (e.g., 80 threads and 640GB of memory) to SLURM, then parallelize the workload by crew or future.callr inside the container.

Apptainer image is based on the geospatial:latest Dockerfile available in the rocker-versioned2 repository (Ubuntu 22.04, GDAL 3.4.1).

  • crew based: nested parallelism failed, especially with future::multicore plan inside a mirai worker. Copilot argued that nested parallelism is not supported in mirai:

    Typically, each worker in a parallel computing setup like the one provided by the mirai package in R is expected to use a single core. This is because each worker is usually a separate process, and each process is typically run on a single core.
    However, it's important to note that this doesn't mean that the entire computation is limited to a single core. The idea behind parallel computing is to distribute the computation across multiple workers, each running on its own core, to speed up the computation.
    If you're using the future package for parallel computing, you can specify the plan to use multiple cores with plan(multicore), plan(multiprocess), or plan(cluster, workers = N), where N is the number of cores.
    Nested parallelism, where each worker itself tries to use multiple cores, can be more complex to manage and is not supported by all parallel computing frameworks. If you're trying to use nested parallelism with mirai and future, you might encounter issues if mirai is not designed to handle nested parallelism or if it's not compatible with the parallel backend you're using with future.

  • future.callr with future::plan(future.callr) works okay and I confirmed that it submitted multiple workers simultaneously.

A very strange behavior was found in vector operations in this approach, where the intersection between the unique sites and the Ecoregion polygons returned the different number of results (1096 in triton run, 1051 in Apptainer run). I attempted to repair the Ecoregion polygons by terra::makeValid() or terra::buffer(x, width=0) in no avail.

I am still working on investigating the issues and try to figure out what the exact cause is; I feel much more efforts are put into this work than what I expected and it is getting more complex as the time goes.

@sigmafelix
Copy link
Collaborator Author

The pipeline runs okay with the custom build GDAL and R packages on GEO. Further investigation on the unconventional behavior is on hold.

@kyle-messier
Copy link
Collaborator

Thanks @sigmafelix

@sigmafelix
Copy link
Collaborator Author

renv experiment needs figuring out an undetected GitHub packages such as beethoven and amadeus. Hash, repository URL, and other properties are not populated in renv.lock file when a renv is initiated. I will investigate this issue thoroughly.

@sigmafelix sigmafelix self-assigned this Jul 23, 2024
@sigmafelix
Copy link
Collaborator Author

After 0.4.0 merge into main, I will update container-engine to match all updates that are non container-related .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants