Skip to content

Analysis-ready Rocker RStudio Server-based Container for the USDA-NRCS-NCSS Kellogg Soil Survey Lab Data Mart 'SQLite' Snapshot

Notifications You must be signed in to change notification settings

brownag/labtaxa

Repository files navigation

labtaxa

R-CMD-check docker-publish Docker Pulls ghcn.io

The goal of {labtaxa} is to provide ‘Lab Data Mart’ (https://ncsslabdatamart.sc.egov.usda.gov/) database snapshots for use in R. This repository is an R package used to download and cache the contents of the Lab Data Mart GeoPackage database.

Docker

To get up and running quickly you can use the Docker container. The labtaxa container is based on "rocker/rstudio:latest". In addition to the standard RStudio tools, the container has the cached Lab Data Mart GeoPackage (containing lab and spatial data) and the companion morphologic database (derived from NASIS pedon descriptions) pre-downloaded. Also, the results of soilDB::fetchLDM() called on the LDM GeoPackage data source are cached as an R object file (.rds). Finally, a variety of useful R packages are pre-installed. All can be accessed via an RStudio Server web browser interface.

From Docker Hub:

docker pull brownag/labtaxa:latest

Or from GitHub:

docker pull ghcr.io/brownag/labtaxa:latest

Once you have a local copy of the labtaxa container you can run:

docker run -d -p 8787:8787 -e PASSWORD=mypassword -v ~/Documents:/home/rstudio/Documents -e ROOT=TRUE brownag/labtaxa

Then open your web browser and navigate to http://localhost:8787. The default username is rstudio and the default password is mypassword.

R Package Installation

You can install the development version of {labtaxa} from GitHub:

if (!require("labtaxa")) 
 remotes::install_github("brownag/labtaxa")

Example

Download (and cache) the latest Lab Data Mart SQLite snapshot from https://ncsslabdatamart.sc.egov.usda.gov/ like so:

library(labtaxa)
ldm <- get_LDM_snapshot()

Downloaded and derived files will be cached in platform-specific directory specified by ldm_data_dir() using cache_labtaxa()

In the Docker container the snapshot has already been created and cached from the latest data (as of the last time the container was built). Updates to the method used to create the cache, as well as scheduled (monthly) updates occur.

The cached data help to get off and running quickly analyzing the entire KSSL database using the {aqp} R package toolchain.

The lab data are pre-loaded in a large SoilProfileCollection object (over 65,000 profiles). In only a few seconds from when you have the Docker container loaded, you can be filtering and processing the lab data object. Downloading archives of the complete databases can take 10s of minutes to a couple hours (depending on internet connection). Only in cases where the absolute most recent data are needed would require doing a cache update.

The downloaded databases (GeoPackage, SQLite) are queried locally using {soilDB} functions fetchLDM() and fetchNASIS(). The {soilDB} functions can take a couple minutes to process on larger databases like this, so the container building process front loads these more costly processing steps. Querying the data using a method like this essentially precedes all analyses. soilDB provides standard aggregation methods that produce {aqp} SoilProfileCollections, which provide a convenient data structure for working with horizon and site level data associated with specific soil profiles.

When you start up {labtaxa} in the Docker container you will have the latest database and the first-step data object (as if you ran the {soilDB} functions) readily available for post-processing for answering specific questions. See demo.R in the container home folder to get started.

ldm <- load_labtaxa()

length(ldm)
#> [1] 65403

If you are running on your own machine you will have to run get_LDM_snapshot() at least once (as above) before the load_labtaxa() command works. In future runs you will not need to re-download or prepare the data unless you need to update the cache.

About

Analysis-ready Rocker RStudio Server-based Container for the USDA-NRCS-NCSS Kellogg Soil Survey Lab Data Mart 'SQLite' Snapshot

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages