Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post on input checking #150

Merged
merged 48 commits into from
Mar 10, 2022
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a95958a
Create post skeleton
Bisaloo Oct 7, 2021
9c221f6
First draft
Bisaloo Oct 19, 2021
42ee9ff
Add paragraph about the future
Bisaloo Oct 27, 2021
0288846
Simplify structure
Bisaloo Dec 2, 2021
9943d0d
Write intro
Bisaloo Dec 2, 2021
963d3d5
Add part about checkmate
Bisaloo Dec 7, 2021
728087d
text edits for clarity + interp
seabbs Dec 20, 2021
32288c3
Apply suggestions from code review
Bisaloo Jan 19, 2022
d01664c
Merge pull request #1 from epiforecasts/input-checking-sa
Bisaloo Jan 19, 2022
4bb433e
Render md
Bisaloo Jan 19, 2022
094ea8d
Add a link to all packages
Bisaloo Jan 20, 2022
e5e9394
Apply suggestion from Mark
Bisaloo Jan 20, 2022
bbfe688
Update content/post/2021-10-07-input-checking/index.Rmd
Bisaloo Jan 20, 2022
5170ad0
Remove testthat from list of packages
Bisaloo Feb 17, 2022
bdc5e3e
Add examples and number of revdeps
Bisaloo Feb 17, 2022
c643c71
Remove number of revdeps
Bisaloo Feb 17, 2022
1e81460
Add check back
Bisaloo Feb 18, 2022
1323bf3
Render md
Bisaloo Feb 18, 2022
7ccbe8c
Add author file for Hugo
Bisaloo Feb 18, 2022
41644eb
Add author file for Sam Abbott (#2)
seabbs Feb 25, 2022
4e1fab7
Add section about documentation
Bisaloo Feb 25, 2022
3b872b9
Add section about match.arg()
Bisaloo Feb 25, 2022
f2eb00d
Add author file for Carl
Bisaloo Feb 25, 2022
3e645df
Add missing error = TRUE
Bisaloo Feb 25, 2022
724e74c
Switch order between match.arg() and stopifnot()
Bisaloo Feb 25, 2022
3c71248
Render md
Bisaloo Feb 25, 2022
a391ea4
Follow accessibility guidelines for Sam's author file
Bisaloo Feb 28, 2022
ebfa94c
Update spelling
Bisaloo Feb 28, 2022
dfee043
Disable crayon
Bisaloo Feb 28, 2022
3c733f3
Fix spelling again
Bisaloo Feb 28, 2022
6abf589
Add example for check
Bisaloo Feb 28, 2022
246221a
Render md
Bisaloo Feb 28, 2022
c3c21ae
Mention rlang equivalents
Bisaloo Mar 7, 2022
2c33d08
Mention blog post on internal functions
Bisaloo Mar 7, 2022
0524164
Add section taking into account more of Carl comments
Bisaloo Mar 7, 2022
8483bf0
Drop Tim from authors
Bisaloo Mar 7, 2022
beae8c3
Reorder authors
Bisaloo Mar 7, 2022
bf75fce
Add pre-intro
Bisaloo Mar 7, 2022
e8c0b03
Make links to r-hub blog relative
Bisaloo Mar 7, 2022
7ef6637
Make links to r-hub blog relative 2
Bisaloo Mar 7, 2022
87231aa
Remove extra 'below'
Bisaloo Mar 7, 2022
6a1881b
Clarify default with match.arg()
Bisaloo Mar 9, 2022
baa4862
Promote footnote about rlang's match.arg() to main text
Bisaloo Mar 9, 2022
5ac2dd5
Mention vtr
Bisaloo Mar 9, 2022
8a9c09b
Fix typo
Bisaloo Mar 9, 2022
9381f6a
Move vetr up
Bisaloo Mar 9, 2022
474bf07
Mention comparison in vetr
Bisaloo Mar 9, 2022
2fbc774
Update post date
Bisaloo Mar 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions content/post/2021-10-07-input-checking/index.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
slug: input-checking
title: "Checking the inputs of your R functions"
authors:
- Sam Abbott
maelle marked this conversation as resolved.
Show resolved Hide resolved
- Hugo Gruson
- Carl Pearson
- Tim Taylor
date: "2021-10-07"
tags:
- package development
- r-package
output: hugodown::hugo_document
---

```{r setup, include=FALSE}

knitr::opts_chunk$set(fig.path = "", comment = "")
# knitr hook to make images output use Hugo options
knitr::knit_hooks$set(
plot = function(x, options) {
hugoopts <- options$hugoopts
paste0(
"{{<figure src=",
'"', x, '" ',
if (!is.null(hugoopts)) {
glue::glue_collapse(
glue::glue('{names(hugoopts)}="{hugoopts}"'),
sep = " "
)
},
">}}\n"
)
}
)

```

## Introduction: the dangers of not checking function inputs

R functions and R packages are a convenient way to share code with the rest of the world but it is generally not possible to know how, or with what precise aim in mind, others will use your code.
For example, they might try to use it on objects that your function was not designed for.
Let's imagine we have written a short function to compute the geometric mean:

```{r}
geometric_mean <- function(...) {
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

return(prod(...)^(1/...length()))

}
```

When you tested the function yourself, anything seemed fine:

```{r}
geometric_mean(2, 8)

geometric_mean(4, 1, 1/32)
```

But a different person using your function might expose it to situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour:

```{r, error = TRUE}
# Input with factors instead of numerics
geometric_mean(factor(2), 8)

# Input with negative values
geometric_mean(-1, 5)

# Input with NAs
geometric_mean(2, 8, NA)
```

Or worse, it could give an incorrect output:

```{r}
geometric_mean(c(2, 8))
```

Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for.
In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments.

## Checking function inputs using base R
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

There is a built-in mechanism to check input values in base R: `stopifnot()`.
You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65).
As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests.

```{r, error = TRUE}
say_hello <- function(name) {
stopifnot(is.character(name))
paste("Hello", name)
}

say_hello("Bob")
say_hello(404)
```

However, as you can see in this example, the error message is not in plain English but contains some code instructions.
This can hinder understanding of the issue.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

Because of this, `stopifnot()` was improved in R 4.0.0:

> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688.

This means we can now provide a clearer error message directly in `stopifnot()` [^1]:

[^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages.

```{r, error = TRUE}
say_hello <- function(name) {
stopifnot("`name` must be a character." = is.character(name))
paste("Hello", name)
}

say_hello(404)
```

This is clearly a really great improvement to the functionality of base R.
However, we can see from this example that we could create the error message programmatically based on the contents of the test.
Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X".
This way, you don't have to repeat yourself which is generally a good aim [^2].
This becomes necessary when you start having many input checks in your function or in your package.

[^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/)

## Checking function inputs using R packages

### The example of the checkmate package

Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/).
checkmate provides a large number of function to check that inputs respect a given set of properties, and returns clear error messages when that is not the case:
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

```{r}
say_hello <- function(name) {
# Among other things, check_string() checks that we provide a
# character object of length one
checkmate::assert_string(name)
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
paste("Hello", name)
}
```

```{r, error = TRUE}
say_hello(404)
```

```{r, error = TRUE}
say_hello(c("Bob", "Alice"))
```

### Other packages to check function inputs
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved

Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue.
We will not get into the details of all of the available options here but below is a list of some of the them.
If interested in understanding the various approaches to input taking the documentation for these package is a great place to start.

- [testthat](https://testthat.r-lib.org/)
- [assertthat](https://github.com/hadley/assertthat)
- [check](https://github.com/moodymudskipper/check)
- [assertr](https://docs.ropensci.org/assertr/)
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
- [assertive](https://bitbucket.org/richierocks/assertive)
- [ensurer](https://github.com/smbache/ensurer)
- `vctrs::vec_assert()`

Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
## What about the future?

In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so.
However, this always comes with a performance cost, even though it's often relatively limited.
Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support.
Interestingly several other languages have evolved to havetyping systems as they have developed.
Bisaloo marked this conversation as resolved.
Show resolved Hide resolved
Typescript developed as an extension of JavaScript, and type annotations are now possible in Python.
[Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/)
163 changes: 163 additions & 0 deletions content/post/2021-10-07-input-checking/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
---
slug: input-checking
title: "Checking the inputs of your R functions"
authors:
- Sam Abbott
- Hugo Gruson
- Carl Pearson
- Tim Taylor
date: "2021-10-07"
tags:
- package development
- r-package
output: hugodown::hugo_document
rmd_hash: 778f602a1f74bd46

---

## Introduction: the dangers of not checking function inputs

R functions and R packages are a convenient way to share code with the rest of the world but it is generally not possible to know how, or with what precise aim in mind, others will use your code. For example, they might try to use it on objects that your function was not designed for. Let's imagine we have written a short function to compute the geometric mean:

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>geometric_mean</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>...</span><span class='o'>)</span> <span class='o'>&#123;</span>

<span class='kr'><a href='https://rdrr.io/r/base/function.html'>return</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/prod.html'>prod</a></span><span class='o'>(</span><span class='nv'>...</span><span class='o'>)</span><span class='o'>^</span><span class='o'>(</span><span class='m'>1</span><span class='o'>/</span><span class='nf'><a href='https://rdrr.io/r/base/dots.html'>...length</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span>

<span class='o'>&#125;</span></code></pre>

</div>

When you tested the function yourself, anything seemed fine:

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>geometric_mean</span><span class='o'>(</span><span class='m'>2</span>, <span class='m'>8</span><span class='o'>)</span>
[1] 4

<span class='nf'>geometric_mean</span><span class='o'>(</span><span class='m'>4</span>, <span class='m'>1</span>, <span class='m'>1</span><span class='o'>/</span><span class='m'>32</span><span class='o'>)</span>
[1] 0.5</code></pre>

</div>

But a different person using your function might expose it to situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour:

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># Input with factors instead of numerics</span>
<span class='nf'>geometric_mean</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='m'>2</span><span class='o'>)</span>, <span class='m'>8</span><span class='o'>)</span>
Error in Summary.factor(structure(1L, .Label = "2", class = "factor"), : 'prod' not meaningful for factors

<span class='c'># Input with negative values</span>
<span class='nf'>geometric_mean</span><span class='o'>(</span><span class='o'>-</span><span class='m'>1</span>, <span class='m'>5</span><span class='o'>)</span>
[1] NaN

<span class='c'># Input with NAs</span>
<span class='nf'>geometric_mean</span><span class='o'>(</span><span class='m'>2</span>, <span class='m'>8</span>, <span class='kc'>NA</span><span class='o'>)</span>
[1] NA</code></pre>

</div>

Or worse, it could give an incorrect output:

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>geometric_mean</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>2</span>, <span class='m'>8</span><span class='o'>)</span><span class='o'>)</span>
[1] 16</code></pre>

</div>

Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments.

## Checking function inputs using base R

There is a built-in mechanism to check input values in base R: [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html). You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests.

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>say_hello</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>name</span><span class='o'>)</span> <span class='o'>&#123;</span>
<span class='nf'><a href='https://rdrr.io/r/base/stopifnot.html'>stopifnot</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/character.html'>is.character</a></span><span class='o'>(</span><span class='nv'>name</span><span class='o'>)</span><span class='o'>)</span>
<span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='s'>"Hello"</span>, <span class='nv'>name</span><span class='o'>)</span>
<span class='o'>&#125;</span>

<span class='nf'>say_hello</span><span class='o'>(</span><span class='s'>"Bob"</span><span class='o'>)</span>
[1] "Hello Bob"
<span class='nf'>say_hello</span><span class='o'>(</span><span class='m'>404</span><span class='o'>)</span>
Error in say_hello(404): is.character(name) is not TRUE</code></pre>

</div>

However, as you can see in this example, the error message is not in plain English but contains some code instructions. This can hinder understanding of the issue.

Because of this, [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) was improved in R 4.0.0:

> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688.

This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^1]:

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>say_hello</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>name</span><span class='o'>)</span> <span class='o'>&#123;</span>
<span class='nf'><a href='https://rdrr.io/r/base/stopifnot.html'>stopifnot</a></span><span class='o'>(</span><span class='s'>"`name` must be a character."</span> <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/character.html'>is.character</a></span><span class='o'>(</span><span class='nv'>name</span><span class='o'>)</span><span class='o'>)</span>
<span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='s'>"Hello"</span>, <span class='nv'>name</span><span class='o'>)</span>
<span class='o'>&#125;</span>

<span class='nf'>say_hello</span><span class='o'>(</span><span class='m'>404</span><span class='o'>)</span>
Error in say_hello(404): `name` must be a character.</code></pre>

</div>

This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^2]. This becomes necessary when you start having many input checks in your function or in your package.

## Checking function inputs using R packages

### The example of the checkmate package

Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of function to check that inputs respect a given set of properties, and returns clear error messages when that is not the case:

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>say_hello</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>name</span><span class='o'>)</span> <span class='o'>&#123;</span>
<span class='c'># Among other things, check_string() checks that we provide a </span>
<span class='c'># character object of length one</span>
<span class='nf'>checkmate</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/checkmate/man/checkString.html'>assert_string</a></span><span class='o'>(</span><span class='nv'>name</span><span class='o'>)</span>
<span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='s'>"Hello"</span>, <span class='nv'>name</span><span class='o'>)</span>
<span class='o'>&#125;</span></code></pre>

</div>

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>say_hello</span><span class='o'>(</span><span class='m'>404</span><span class='o'>)</span>
Error in say_hello(404): Assertion on 'name' failed: Must be of type 'string', not 'double'.</code></pre>

</div>

<div class="highlight">

<pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>say_hello</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Bob"</span>, <span class='s'>"Alice"</span><span class='o'>)</span><span class='o'>)</span>
Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have length 1.</code></pre>

</div>

### Other packages to check function inputs

Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of the them. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start.

- [testthat](https://testthat.r-lib.org/)
- [assertthat](https://github.com/hadley/assertthat)
- [check](https://github.com/moodymudskipper/check)
- [assertr](https://docs.ropensci.org/assertr/)
- [assertive](https://bitbucket.org/richierocks/assertive)
- [ensurer](https://github.com/smbache/ensurer)
- [`vctrs::vec_assert()`](https://vctrs.r-lib.org/reference/vec_assert.html)

## What about the future?

In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to have typing systems as they have developed (TypeScript as an extension of JavaScript, type annotations in Python). [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/)

[^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages.

[^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/)