Skip to content

Commit

Permalink
Merge pull request #321 from ncss-tech/genhz-hzdepm1
Browse files Browse the repository at this point in the history
`generalize.hz(hzdepm = ...)`: more robust sorting in presence of missing depths
  • Loading branch information
brownag authored Jan 25, 2025
2 parents f7282ed + 2cfe779 commit 976674d
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 8 deletions.
14 changes: 8 additions & 6 deletions R/generalize.hz.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@
#' @param pattern character vector of REGEX patterns, same length as `new`
#' @param non.matching.code character, label used for any horizon not matched by `pattern`
#' @param hzdepm numeric vector of horizon mid-points; `NA` values in `hzdepm` will result in `non.matching.code` (or `NA` if not defined) in result
#' @param ordered logical, `TRUE` when `hzdepm` argument is specified
#' @param ordered logical, default `TRUE` when `hzdepm` argument is specified
#' @param na.rm logical, default `TRUE` will ignore missing depths in calculating sort order when `hzdepm` is specified and `ordered=TRUE`
#' @param ... additional arguments passed to `grep()` such as `perl = TRUE` for advanced REGEX
#' @return factor (possibly an ordered factor) of the same length as `x` (if character) or as number of horizons in `x` (if `SoilProfileCollection`)
#'
#' @return factor (an ordered factor when `ordered=TRUE`) of the same length as `x` (if character) or as number of horizons in `x` (if `SoilProfileCollection`)
#'
#' @details When `x` is a `SoilProfileCollection` the `ghl` column will be updated with the factor results. This requires that the "horizon designation name" metadata be defined for the collection to set the column for input designations.
#'
Expand Down Expand Up @@ -90,7 +92,7 @@
#' # GHL metadata is set
#' GHL(x)
#'
generalize.hz <- function(x, new, pattern, non.matching.code = 'not-used', hzdepm = NULL, ordered = !missing(hzdepm), ...) {
generalize.hz <- function(x, new, pattern, non.matching.code = 'not-used', hzdepm = NULL, ordered = !missing(hzdepm), na.rm = TRUE, ...) {

# init vector of 'other', same length as original horizon name vector
g <- rep(non.matching.code, times = length(x))
Expand All @@ -107,16 +109,16 @@ generalize.hz <- function(x, new, pattern, non.matching.code = 'not-used', hzdep
# { sum(!is.na(hzdepm)) == length(new) })

# less stringent:
# any NA hzdepm will return NA factor level, even when pattern is matched
new_sort <- names(sort(tapply(hzdepm, g, median)))
# default na.rm (TRUE) ignores depths that are NA
new_sort <- names(sort(tapply(hzdepm, g, median, na.rm = na.rm), na.last = ifelse(na.rm, na.rm, NA)))
new_sort <- new_sort[new_sort != non.matching.code]

# use an ordered factor (may be overridden w/ ordered = FALSE)
g <- factor(g, levels = c(new_sort, non.matching.code), ordered = ordered)

# if any are not matched (i.e. hzdepm is NA), replace with non.matching.code (if defined)
if (!is.null(non.matching.code)) {
g[is.na(g)] <- non.matching.code
g[is.na(g) | is.na(hzdepm)] <- non.matching.code
}
g
} else {
Expand Down
7 changes: 5 additions & 2 deletions man/generalize.hz.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 976674d

Please sign in to comment.