Skip to content

Commit

Permalink
🚧
Browse files Browse the repository at this point in the history
  • Loading branch information
cboettig committed Feb 6, 2025
1 parent e7a1795 commit e9a9029
Show file tree
Hide file tree
Showing 5 changed files with 123 additions and 0 deletions.
40 changes: 40 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,43 @@
# Rocker stack for Machine Learning in R

This repository contains images for machine learning and GPU-based computation in R.


## Deploying images

These images can be deployed in the usual manner directly with Docker:

```
docker run -ti -p 8888:8888 rocker/ml
```

### JupyterHub

These images are designed to support easy intergration with JupyterHub. Jupy

### Binder

### Codespaces

These images are also compatible with Jupyterlab in GitHub Codespaces.

## Customizing images

These images can be easily extended with additional packages from R and python.
An example is provided in the `extend` directory, showing


## Technical details

These docker images build on the widely used Ubuntu Linux distribution (LTS release, 24.04 at the time of writing).

While repositories such as Posit's package manager or R-Universe now provide pre-compiled binaries for Linux Ubuntu LTS releases, many of these packages still require that certain runtime libaries are available on the system. Typically, R users have been expected to `apt-get` these "system-level" dependencies (e.g. `libgdal`), creating an additional technical hurdle that is often unfamiliar to users.

This stack leverages the design of the [BSPM](https://github.com/rocker-org/bspm) system to automatically manage installation of system dependencies during the Docker build process.
The example shown in `extend/` illustrates how we can simply list any required packages in `install.r` and enjoy system dependencies being resolved automatically.

However, Jupyterhub deploys typically prevent users from root (`sudo`) privileges required to install system libraries, so this mechanism is not available at runtime to end users. This stack will still allow non-sudo users to install pre-built binary packages from R-Universe, provided any required system libraries are already present on the image.

On the python side, package dependencies are managed by conda, which bundles its own copies of any required system libraries. conda installations do not require root, meaning that users can easily install additional packages at build time or in an interactive session.

Note that following standard JupyterHub designs, interactive installation of packages will not persist between sessions.
14 changes: 14 additions & 0 deletions extend/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM rocker/cuda

USER root
# When run as root, R automagically handles any necessary apt-gets
COPY install.r install.r
RUN Rscript install.r

USER ${NB_USER}

## Python extensions
COPY spatial-env.yml environment.yml
RUN conda update --all --solver=classic -n base -c conda-forge conda && \
conda env update --file environment.yml

13 changes: 13 additions & 0 deletions extend/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

This folder illustrates how a user would typically extend this image with additional dependencies.

The user edits the `install.r` script in this folder to add desired R
packages, and the `environment.yml` adds the desired python/conda packages.
Building the `Dockerfile` in this repo then adds these both to the image,
automatically resolving any system dependencies as needed.

For instance, in this example we add an extensive collection of commonly
used geospatial packages in R and python.

By using `rocker/cuda` as the base image, we ensure our base image has support for NVIDIA GPUs. For non-GPU use, use `rocker/ml` as the base image instead to generate a smaller image.

46 changes: 46 additions & 0 deletions extend/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: base
channels:
- conda-forge
dependencies:
- boto3
- cartopy
- distributed
- earthaccess
- exactextract
- fiona
- fsspec
- geoarrow-types
- geopandas
- geocube
- leafmap
- libgdal-arrow-parquet
- localtileserver
- mapclassify
- maplibre
- minio
- netCDF4
- odc-geo
- odc-stac
- planetary-computer
- pmtiles
- polars
- pyarrow
- pydeck
- pyogrio
- pystac
- pystac-client
- rasterio
- rasterstats
- requests
- rio-cogeo
- rioxarray
- stackstac
- streamlit
- tippecanoe
- tqdm
- xarray
- zarr
- pip
- pip:
- git+https://github.com/boettiger-lab/cng-python.git

10 changes: 10 additions & 0 deletions extend/install.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
install.packages(c(
'sf',
'stars',
'gdalcubes',
'rstac',
'terra',
'mapgl',
'gifski'))


0 comments on commit e9a9029

Please sign in to comment.