Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post on input checking #150

Merged
merged 48 commits into from
Mar 10, 2022
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a95958a
Create post skeleton
Bisaloo Oct 7, 2021
9c221f6
First draft
Bisaloo Oct 19, 2021
42ee9ff
Add paragraph about the future
Bisaloo Oct 27, 2021
0288846
Simplify structure
Bisaloo Dec 2, 2021
9943d0d
Write intro
Bisaloo Dec 2, 2021
963d3d5
Add part about checkmate
Bisaloo Dec 7, 2021
728087d
text edits for clarity + interp
seabbs Dec 20, 2021
32288c3
Apply suggestions from code review
Bisaloo Jan 19, 2022
d01664c
Merge pull request #1 from epiforecasts/input-checking-sa
Bisaloo Jan 19, 2022
4bb433e
Render md
Bisaloo Jan 19, 2022
094ea8d
Add a link to all packages
Bisaloo Jan 20, 2022
e5e9394
Apply suggestion from Mark
Bisaloo Jan 20, 2022
bbfe688
Update content/post/2021-10-07-input-checking/index.Rmd
Bisaloo Jan 20, 2022
5170ad0
Remove testthat from list of packages
Bisaloo Feb 17, 2022
bdc5e3e
Add examples and number of revdeps
Bisaloo Feb 17, 2022
c643c71
Remove number of revdeps
Bisaloo Feb 17, 2022
1e81460
Add check back
Bisaloo Feb 18, 2022
1323bf3
Render md
Bisaloo Feb 18, 2022
7ccbe8c
Add author file for Hugo
Bisaloo Feb 18, 2022
41644eb
Add author file for Sam Abbott (#2)
seabbs Feb 25, 2022
4e1fab7
Add section about documentation
Bisaloo Feb 25, 2022
3b872b9
Add section about match.arg()
Bisaloo Feb 25, 2022
f2eb00d
Add author file for Carl
Bisaloo Feb 25, 2022
3e645df
Add missing error = TRUE
Bisaloo Feb 25, 2022
724e74c
Switch order between match.arg() and stopifnot()
Bisaloo Feb 25, 2022
3c71248
Render md
Bisaloo Feb 25, 2022
a391ea4
Follow accessibility guidelines for Sam's author file
Bisaloo Feb 28, 2022
ebfa94c
Update spelling
Bisaloo Feb 28, 2022
dfee043
Disable crayon
Bisaloo Feb 28, 2022
3c733f3
Fix spelling again
Bisaloo Feb 28, 2022
6abf589
Add example for check
Bisaloo Feb 28, 2022
246221a
Render md
Bisaloo Feb 28, 2022
c3c21ae
Mention rlang equivalents
Bisaloo Mar 7, 2022
2c33d08
Mention blog post on internal functions
Bisaloo Mar 7, 2022
0524164
Add section taking into account more of Carl comments
Bisaloo Mar 7, 2022
8483bf0
Drop Tim from authors
Bisaloo Mar 7, 2022
beae8c3
Reorder authors
Bisaloo Mar 7, 2022
bf75fce
Add pre-intro
Bisaloo Mar 7, 2022
e8c0b03
Make links to r-hub blog relative
Bisaloo Mar 7, 2022
7ef6637
Make links to r-hub blog relative 2
Bisaloo Mar 7, 2022
87231aa
Remove extra 'below'
Bisaloo Mar 7, 2022
6a1881b
Clarify default with match.arg()
Bisaloo Mar 9, 2022
baa4862
Promote footnote about rlang's match.arg() to main text
Bisaloo Mar 9, 2022
5ac2dd5
Mention vtr
Bisaloo Mar 9, 2022
8a9c09b
Fix typo
Bisaloo Mar 9, 2022
9381f6a
Move vetr up
Bisaloo Mar 9, 2022
474bf07
Mention comparison in vetr
Bisaloo Mar 9, 2022
2fbc774
Update post date
Bisaloo Mar 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions content/authors/carl-pearson/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
name: Carl Pearson
social:
- service: Twitter
link: https://twitter.com/cap1024
icon: fa-twitter
- service: Github
link: https://github.com/pearsonca
icon: fa-github
---

Modelling Infectious Disease Dynamics to Inform Decision Making
14 changes: 14 additions & 0 deletions content/authors/hugo-gruson/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
name: Hugo Gruson
social:
- service: Home
link: https://hugogruson.fr
- service: Twitter
link: https://twitter.com/grusonh
icon: fa-twitter
- service: Github
link: https://github.com/Bisaloo
icon: fa-github
---

Evolutionary Biologist turned Research Software Engineer in Epidemiology.
14 changes: 14 additions & 0 deletions content/authors/sam-abbott/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
name: Sam Abbott
social:
- service: Home
link: https://samabbott.co.uk
- service: Twitter
link: https://twitter.com/seabbs
icon: fa-twitter
- service: Github
link: https://github.com/seabbs
icon: fa-github
---

Infectious disease researcher interested in open source tool development. More on my [research interests](https://samabbott.co.uk/research).
301 changes: 301 additions & 0 deletions content/post/2021-10-07-input-checking/index.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
---
slug: input-checking
title: "Checking the inputs of your R functions"
authors:
- Hugo Gruson
- Sam Abbott
- Carl Pearson
date: "2021-10-07"
tags:
- package development
- r-package
output: hugodown::hugo_document
---

```{r setup, include=FALSE}

knitr::opts_chunk$set(fig.path = "", comment = "")
# knitr hook to make images output use Hugo options
knitr::knit_hooks$set(
plot = function(x, options) {
hugoopts <- options$hugoopts
paste0(
"{{<figure src=",
'"', x, '" ',
if (!is.null(hugoopts)) {
glue::glue_collapse(
glue::glue('{names(hugoopts)}="{hugoopts}"'),
sep = " "
)
},
">}}\n"
)
}
)
options(crayon.enabled = FALSE)

```

Are you, like we were, tired of filling your functions with argument checking code that sometimes ends up being longer that the core of the function itself?
Are you trying to find what is the most efficient approach to check inputs easily and without forgetting any edge cases?
Read about our exploration into the various ways to check your function inputs in R in this blog post.
And please share your own tips and discoveries in the comment section below!
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

## Introduction: the dangers of not checking function inputs

R functions and R packages are a convenient way to share code with the rest of the world but it is generally not possible to know how, or with what precise aim in mind, others will use your code.
For example, they might try to use it on objects that your function was not designed for.
Let's imagine we have written a short function to compute the geometric mean:

```{r}
geometric_mean <- function(...) {
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

return(prod(...)^(1/...length()))

}
```

When you tested the function yourself, everything seemed fine:

```{r}
geometric_mean(2, 8)

geometric_mean(4, 1, 1/32)
```

But a different person using your function might expose it to situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour:

```{r, error = TRUE}
# Input with factors instead of numerics
geometric_mean(factor(2), 8)

# Input with negative values
geometric_mean(-1, 5)

# Input with NAs
geometric_mean(2, 8, NA)
```

Or worse, it could give an incorrect output:

```{r}
geometric_mean(c(2, 8))
```

Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for.
In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments.

## Pre-requisite: thoroughly document your argument types

You can notice from the simple example above that it's easy to pass invalid inputs to the `geometric_mean()` function because we didn't provide any documentation on what is or isn't a valid input.
We won't go into details here but the [roxygen2](https://roxygen2.r-lib.org/) package provides a convenient way to generate documentation for R functions.
Try to be as precise as possible when describing the required format for your inputs [^1].

[^1]: [Some package developers even developed their own standardized way to document argument types and length](https://github.com/r-lib/withr/commit/42e503092046705f30032cb3a321d64b0e9383d4).
But there is currently no standard shared across the R community.

```{r}
#' @param name A character of length one with the name of the person to greet
say_hello <- function(name) {
stopifnot(is.character(name))
paste("Hello", name)
}
```

Adding any kind of argument checking in the absence of good documentation would be vain and very frustrating for your users as they would have to figure out what is or isn't valid by trial and error.

## Checking function inputs using base R [^2]

[^2]: Note that these base functions have equivalent in the tidyverse with a more consistent design and coloured output.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
`match.arg()`'s equivalent is `rlang::arg_match()` and `stopifnot()`'s

### `match.arg()`

If the input can only take specific values, the base function `match.arg()` can also prove useful:

```{r, error = TRUE}
match.arg(arg = "R", choices = c("R", "python"))

match.arg(arg = "javascript", choices = c("R", "python"))
```

But the real power of the `match.arg()` function comes from the fact that `choices` can be automatically obtained in the context of a function:

```{r, error = TRUE}
choose_language <- function(language = c("R", "python")) {
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

# Equivalent to `match.arg(language, c("R", "python"))
match.arg(language)

paste("I love", language)

}

choose_language("R")

choose_language("julia")
```

### `stopifnot()`

There is a another, more general, built-in mechanism to check input values in base R: `stopifnot()`.
You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65).
As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests.

```{r, error = TRUE}
say_hello <- function(name) {
stopifnot(is.character(name))
paste("Hello", name)
}

say_hello("Bob")
say_hello(404)
```

However, as you can see in this example, the error message is not in plain English but contains some code instructions.
This can hinder understanding of the issue.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

Because of this, `stopifnot()` was improved in R 4.0.0:

> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688.

This means we can now provide a clearer error message directly in `stopifnot()` [^3]:

[^3]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages.

```{r, error = TRUE}
say_hello <- function(name) {
stopifnot("`name` must be a character." = is.character(name))
paste("Hello", name)
}

say_hello(404)
```

This is clearly a really great improvement to the functionality of base R.
However, we can see from this example that we could create the error message programmatically based on the contents of the test.
Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X".
This way, you don't have to repeat yourself which is generally a good aim [^4].
This becomes necessary when you start having many input checks in your function or in your package.

[^4]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/)
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

## Checking function inputs using R packages

### The example of the checkmate package

Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem [^5], you can also rely on existing packages to make your life easier.
One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/).
checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case:

[^5]: See [this earlier blog post](https://blog.r-hub.io/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

```{r}
say_hello <- function(name) {
# Among other things, check_string() checks that we provide a
# character object of length one
checkmate::assert_string(name)
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
paste("Hello", name)
}
```

```{r, error = TRUE}
say_hello(404)
```

```{r, error = TRUE}
say_hello(c("Bob", "Alice"))
```

### Other packages to check function inputs
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue.
We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies.
If interested in understanding the various approaches to input taking the documentation for these package is a great place to start.

- [assertthat](https://github.com/hadley/assertthat)

```{r, error = TRUE}
assertthat::assert_that(is.character(1))
```

- [assertr](https://docs.ropensci.org/assertr/)
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

```{r, error = TRUE}
library(magrittr)

mtcars %>%
assertr::verify(nrow(.) < 10)
```

- [assertive](https://bitbucket.org/richierocks/assertive)

```{r, error = TRUE}
assertive::assert_is_a_string(1)
```

- [ensurer](https://github.com/smbache/ensurer)

```{r, error = TRUE}
ensure_square <- ensurer::ensures_that(NCOL(.) == NROW(.))

ensure_square(matrix(1:20, 4, 5))
```

- `vctrs::vec_assert()`

Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
```{r, error = TRUE}
vctrs::vec_assert(c(1, 2), "character")
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

vctrs::vec_assert(c(1, 2), size = 3)
```

- [check](https://github.com/moodymudskipper/check) is slightly different because it doesn't provide utilities that work out of the box, but rather tools to assist you in writing your own checking functions
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

```{r, error = TRUE}
library(check)

check::setup()

set_check_fun(
"`{var}` must be a {type} vector of length {length}." = {
val <- get(var, env)
is.atomic(val) && is(val, type) && length(val) == length
}
)

say_hello <- function(name) {
check(
"`name` must be a character vector of length 1."
)
paste("hello", name)
}

say_hello("Maria")

say_hello(c("Maria", "Noelia"))
```

## There is no 'one-size-fits-all'

We have presented here different approaches but it is up to you, the developer, to decide which approach suits your needs best.
We do not believe that one choice is intrinsically better than the others.
All the workflows presented here can achieve the same result.
Your choice may be influenced by several factors we cannot take into consideration here: who is your target audience?
Will they be okay with somewhat technical terminology in the error messages?
Do you have reasons to try and limit the number of dependencies [^6]?
Which framework are you the more comfortable with and will facilitate maintenance in the future?
And ultimately, what is your personal preference?

[^6]: This is a complex discussion often caricatured, but that has already been treated on some occasions such as [this blog post from Jim Hester](https://www.tidyverse.org/blog/2019/05/itdepends/).

If you would like to hear various point of views and a more in-depth discussion about this, please refer to the [pull request related to this post](https://github.com/r-hub/blog/pull/150).

## What about the future?

In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so.
However, this always comes with a performance cost, even though it's often relatively limited.
Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support.
Interestingly several other languages have evolved to have typing systems as they have developed.
Typescript developed as an extension of JavaScript, and type annotations are now possible in Python.
[Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/)
Loading