From a95958aa1c1ba2872756010550f6b6fb13c8d6e9 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 7 Oct 2021 22:12:48 +0200 Subject: [PATCH 01/47] Create post skeleton --- .../post/2021-10-07-input-checking/index.Rmd | 53 +++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 content/post/2021-10-07-input-checking/index.Rmd diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd new file mode 100644 index 00000000..9e506a89 --- /dev/null +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -0,0 +1,53 @@ +--- +slug: input-checking +title: "Checking the inputs of your R functions" +authors: +- Sam Abbott +- Hugo Gruson +- Carl Pearson +- Tim Taylor +date: "2021-10-07" +tags: +- package development +- r-package +output: hugodown::hugo_document +--- + +```{r setup, include=FALSE} + +knitr::opts_chunk$set(fig.path = "", comment = "") +# knitr hook to make images output use Hugo options +knitr::knit_hooks$set( + plot = function(x, options) { + hugoopts <- options$hugoopts + paste0( + "{{
}}\n" + ) + } +) + +``` + +## Intro + +## Motivating example + +## What + +## Why + +## How (base) + +## How (not base) + +## How (other options) + +## What about the future From 9c221f612cbd263fbb1d136344b09d33644e4cd2 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Tue, 19 Oct 2021 20:36:51 +0200 Subject: [PATCH 02/47] First draft --- .../post/2021-10-07-input-checking/index.Rmd | 55 +++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 9e506a89..67af03c2 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -46,8 +46,63 @@ knitr::knit_hooks$set( ## How (base) +There is a built-in mechanism to check input values in base R: `stopifnot()`. +You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). +As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests. + +```{r, error = TRUE} +say_hello <- function(name) { + stopifnot(is.character(name)) + paste("Hello", name) +} + +say_hello("Bob") +say_hello(404) +``` + +However, as you can see in this example, the error message is not in plain English but contains some code instructions. +This can hinder understanding of the issue. + +Because of this, there was an improvement to `stopifnot()` in R 4.0.0: + +> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688. + +This means we can now provide a clearer error message directly in `stopifnot()` [^1]: + +[^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. + +```{r, error = TRUE} +say_hello <- function(name) { + stopifnot("`name` must be a character." = is.character(name)) + paste("Hello", name) +} + +say_hello(404) +``` + +But we can this from this example that we could create the error message programmatically based on the contents of the test. +Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". +This way, you don't have to repeat yourself [^2]. +This becomes necessary when you start having many input checks in your function or in your package. + +[^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) + ## How (not base) +But although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. +One of these packages designed to help you in input checking is checkmate. + ## How (other options) +Because input checking is such an important point and so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. +We will not get into the details for all of them here but it is worth mentioning: + +- testthat +- assertthat +- check +- assertr +- assertive +- ensurer +- vctrs::vec_assert() + ## What about the future From 42ee9ff695e6235de7a451fe3f0aadec78590b13 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 27 Oct 2021 09:20:40 +0200 Subject: [PATCH 03/47] Add paragraph about the future --- content/post/2021-10-07-input-checking/index.Rmd | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 67af03c2..cc3ad3f3 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -105,4 +105,10 @@ We will not get into the details for all of them here but it is worth mentioning - ensurer - vctrs::vec_assert() -## What about the future +## What about the future? + +In this post, we have seen many alternatives to check function inputs more easily, and generate more informative error messages. +However, this always comes with a performance cost, even though it's often relatively limited. +Zero-cost assertions would require some kind of typing system. +It is interesting to note that many other languages followed this evolution (TypeScript as an extension of JavaScript, type annotations in Python). +[Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) From 0288846c7db78e5911e7f5a9fa9ff10b85c3dbd9 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 2 Dec 2021 18:38:07 +0100 Subject: [PATCH 04/47] Simplify structure --- .../post/2021-10-07-input-checking/index.Rmd | 20 ++++++++----------- 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index cc3ad3f3..1352beff 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -36,15 +36,9 @@ knitr::knit_hooks$set( ``` -## Intro +## Introduction: the dangers of not checking function inputs -## Motivating example - -## What - -## Why - -## How (base) +## Checking function inputs using base R There is a built-in mechanism to check input values in base R: `stopifnot()`. You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). @@ -87,12 +81,14 @@ This becomes necessary when you start having many input checks in your function [^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) -## How (not base) +## Checking function inputs using R packages + +### The example of the checkmate package But although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. -One of these packages designed to help you in input checking is checkmate. +One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). -## How (other options) +### Other packages to check function inputs Because input checking is such an important point and so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details for all of them here but it is worth mentioning: @@ -103,7 +99,7 @@ We will not get into the details for all of them here but it is worth mentioning - assertr - assertive - ensurer -- vctrs::vec_assert() +- `vctrs::vec_assert()` ## What about the future? From 9943d0d87100a75a152754b71c454e64975f80b9 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 2 Dec 2021 19:57:35 +0100 Subject: [PATCH 05/47] Write intro --- .../post/2021-10-07-input-checking/index.Rmd | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 1352beff..944cf1bd 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -38,6 +38,49 @@ knitr::knit_hooks$set( ## Introduction: the dangers of not checking function inputs +R functions and R packages are convenient way to share code with the rest of the world. +But you never know how others will try to use your code. +They might try to use it on objects that your function was not designed for. +Let's imagine we have written a short function to compute the geometric mean: + +```{r} +geometric_mean <- function(...) { + + return(prod(...)^(1/...length())) + +} +``` + +When you tested the function yourself, anything seemed fine: + +```{r} +geometric_mean(2, 8) + +geometric_mean(4, 1, 1/32) +``` + +But a different person using your function might expose it the situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour: + +```{r, error = TRUE} +# Input with factors instead of numerics +geometric_mean(factor(2), 8) + +# Input with negative values +geometric_mean(-1, 5) + +# Input with NAs +geometric_mean(2, 8, NA) +``` + +Or worse, it could give an incorrect output: + +```{r} +geometric_mean(c(2, 8)) +``` + +Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. +In this blog post, we will review the diversity of approaches to help you check your function inputs. + ## Checking function inputs using base R There is a built-in mechanism to check input values in base R: `stopifnot()`. From 963d3d5713368b4b58256f7310e1915a6e1d3276 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Tue, 7 Dec 2021 11:31:38 +0100 Subject: [PATCH 06/47] Add part about checkmate --- .../post/2021-10-07-input-checking/index.Rmd | 18 ++ .../post/2021-10-07-input-checking/index.md | 163 ++++++++++++++++++ 2 files changed, 181 insertions(+) create mode 100644 content/post/2021-10-07-input-checking/index.md diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 944cf1bd..5ffb3607 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -130,6 +130,24 @@ This becomes necessary when you start having many input checks in your function But although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). +checkmate provides a large number of function to check that function inputs respect a given set of properties, and returns clear error messages when it's not the case: + +```{r} +say_hello <- function(name) { + # Among other things, check_string() checks that we provide a + # character object of length one + checkmate::assert_string(name) + paste("Hello", name) +} +``` + +```{r, error = TRUE} +say_hello(404) +``` + +```{r, error = TRUE} +say_hello(c("Bob", "Alice")) +``` ### Other packages to check function inputs diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md new file mode 100644 index 00000000..bf17f99f --- /dev/null +++ b/content/post/2021-10-07-input-checking/index.md @@ -0,0 +1,163 @@ +--- +slug: input-checking +title: "Checking the inputs of your R functions" +authors: +- Sam Abbott +- Hugo Gruson +- Carl Pearson +- Tim Taylor +date: "2021-10-07" +tags: +- package development +- r-package +output: hugodown::hugo_document +rmd_hash: 292cf2cbf2d52bd0 + +--- + +## Introduction: the dangers of not checking function inputs + +R functions and R packages are convenient way to share code with the rest of the world. But you never know how others will try to use your code. They might try to use it on objects that your function was not designed for. Let's imagine we have written a short function to compute the geometric mean: + +
+ +
geometric_mean <- function(...) {
+  
+  return(prod(...)^(1/...length()))
+  
+}
+ +
+ +When you tested the function yourself, anything seemed fine: + +
+ +
geometric_mean(2, 8)
+[1] 4
+
+geometric_mean(4, 1, 1/32)
+[1] 0.5
+ +
+ +But a different person using your function might expose it the situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour: + +
+ +
# Input with factors instead of numerics
+geometric_mean(factor(2), 8)
+Error in Summary.factor(structure(1L, .Label = "2", class = "factor"), : 'prod' not meaningful for factors
+
+# Input with negative values
+geometric_mean(-1, 5)
+[1] NaN
+
+# Input with NAs
+geometric_mean(2, 8, NA)
+[1] NA
+ +
+ +Or worse, it could give an incorrect output: + +
+ +
geometric_mean(c(2, 8))
+[1] 16
+ +
+ +Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we will review the diversity of approaches to help you check your function inputs. + +## Checking function inputs using base R + +There is a built-in mechanism to check input values in base R: [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html). You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests. + +
+ +
say_hello <- function(name) {
+  stopifnot(is.character(name))
+  paste("Hello", name)
+}
+
+say_hello("Bob")
+[1] "Hello Bob"
+say_hello(404)
+Error in say_hello(404): is.character(name) is not TRUE
+ +
+ +However, as you can see in this example, the error message is not in plain English but contains some code instructions. This can hinder understanding of the issue. + +Because of this, there was an improvement to [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) in R 4.0.0: + +> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688. + +This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^1]: + +
+ +
say_hello <- function(name) {
+  stopifnot("`name` must be a character." = is.character(name))
+  paste("Hello", name)
+}
+
+say_hello(404)
+Error in say_hello(404): `name` must be a character.
+ +
+ +But we can this from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself [^2]. This becomes necessary when you start having many input checks in your function or in your package. + +## Checking function inputs using R packages + +### The example of the checkmate package + +But although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of function to check that function inputs respect a given set of properties, and returns clear error messages when it's not the case: + +
+ +
say_hello <- function(name) {
+  # Among other things, check_string() checks that we provide a 
+  # character object of length one
+  checkmate::assert_string(name)
+  paste("Hello", name)
+}
+ +
+ +
+ +
say_hello(404)
+Error in say_hello(404): Assertion on 'name' failed: Must be of type 'string', not 'double'.
+ +
+ +
+ +
say_hello(c("Bob", "Alice"))
+Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have length 1.
+ +
+ +### Other packages to check function inputs + +Because input checking is such an important point and so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details for all of them here but it is worth mentioning: + +- testthat +- assertthat +- check +- assertr +- assertive +- ensurer +- [`vctrs::vec_assert()`](https://vctrs.r-lib.org/reference/vec_assert.html) + +## What about the future? + +In this post, we have seen many alternatives to check function inputs more easily, and generate more informative error messages. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions would require some kind of typing system. It is interesting to note that many other languages followed this evolution (TypeScript as an extension of JavaScript, type annotations in Python). [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) + +[^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. + +[^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) + From 728087d4c5262ae0f7da13e44f450e955b109fd0 Mon Sep 17 00:00:00 2001 From: Sam Abbott Date: Mon, 20 Dec 2021 23:01:30 +0000 Subject: [PATCH 07/47] text edits for clarity + interp --- .../post/2021-10-07-input-checking/index.Rmd | 33 ++++++++++--------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 5ffb3607..ed3fb3e6 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -38,10 +38,10 @@ knitr::knit_hooks$set( ## Introduction: the dangers of not checking function inputs -R functions and R packages are convenient way to share code with the rest of the world. -But you never know how others will try to use your code. -They might try to use it on objects that your function was not designed for. -Let's imagine we have written a short function to compute the geometric mean: +R functions and R packages are a convenient way to share code with the rest of the world +but it is generally not possible to know how, or with what precise aim in mind, others will +use your code. For example, they might try to use it on objects that your function was not +designed for. Let's imagine we have written a short function to compute the geometric mean: ```{r} geometric_mean <- function(...) { @@ -59,7 +59,7 @@ geometric_mean(2, 8) geometric_mean(4, 1, 1/32) ``` -But a different person using your function might expose it the situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour: +But a different person using your function might expose it situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour: ```{r, error = TRUE} # Input with factors instead of numerics @@ -79,7 +79,7 @@ geometric_mean(c(2, 8)) ``` Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. -In this blog post, we will review the diversity of approaches to help you check your function inputs. +In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments. ## Checking function inputs using base R @@ -100,7 +100,7 @@ say_hello(404) However, as you can see in this example, the error message is not in plain English but contains some code instructions. This can hinder understanding of the issue. -Because of this, there was an improvement to `stopifnot()` in R 4.0.0: +Because of this, `stopifnot()` was improved R 4.0.0: > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688. @@ -117,9 +117,9 @@ say_hello <- function(name) { say_hello(404) ``` -But we can this from this example that we could create the error message programmatically based on the contents of the test. +This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". -This way, you don't have to repeat yourself [^2]. +This way, you don't have to repeat yourself [^2] which is generally a good aim. This becomes necessary when you start having many input checks in your function or in your package. [^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) @@ -128,9 +128,9 @@ This becomes necessary when you start having many input checks in your function ### The example of the checkmate package -But although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. +Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). -checkmate provides a large number of function to check that function inputs respect a given set of properties, and returns clear error messages when it's not the case: +checkmate provides a large number of function to check that inputs respect a given set of properties, and returns clear error messages when that is not the case: ```{r} say_hello <- function(name) { @@ -151,8 +151,9 @@ say_hello(c("Bob", "Alice")) ### Other packages to check function inputs -Because input checking is such an important point and so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. -We will not get into the details for all of them here but it is worth mentioning: +Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. +We will not get into the details of all of the available options here but below is a list of some of the them. If interested in understanding the various approaches to input taking the documentation +for these package is a great place to start. - testthat - assertthat @@ -164,8 +165,8 @@ We will not get into the details for all of them here but it is worth mentioning ## What about the future? -In this post, we have seen many alternatives to check function inputs more easily, and generate more informative error messages. +In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. -Zero-cost assertions would require some kind of typing system. -It is interesting to note that many other languages followed this evolution (TypeScript as an extension of JavaScript, type annotations in Python). +Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. +Interestingly several other languages have evolved to have typing systems as they have developed (TypeScript as an extension of JavaScript, type annotations in Python). [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) From 32288c387f9cb40adeb7fd64a64ec1899fb70c0e Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 19 Jan 2022 13:44:18 +0000 Subject: [PATCH 08/47] Apply suggestions from code review --- content/post/2021-10-07-input-checking/index.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index ed3fb3e6..7b9c8959 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -59,7 +59,7 @@ geometric_mean(2, 8) geometric_mean(4, 1, 1/32) ``` -But a different person using your function might expose it situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour: +But a different person using your function might expose it to situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour: ```{r, error = TRUE} # Input with factors instead of numerics @@ -100,7 +100,7 @@ say_hello(404) However, as you can see in this example, the error message is not in plain English but contains some code instructions. This can hinder understanding of the issue. -Because of this, `stopifnot()` was improved R 4.0.0: +Because of this, `stopifnot()` was improved in R 4.0.0: > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688. @@ -119,7 +119,7 @@ say_hello(404) This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". -This way, you don't have to repeat yourself [^2] which is generally a good aim. +This way, you don't have to repeat yourself which is generally a good aim [^2]. This becomes necessary when you start having many input checks in your function or in your package. [^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) From 4bb433e3f180fd0ed2255939573407d706377e40 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 19 Jan 2022 15:06:14 +0100 Subject: [PATCH 09/47] Render md --- .../post/2021-10-07-input-checking/index.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index bf17f99f..7a5c394f 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,13 +11,13 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 292cf2cbf2d52bd0 +rmd_hash: cec101d449fbd3fc --- ## Introduction: the dangers of not checking function inputs -R functions and R packages are convenient way to share code with the rest of the world. But you never know how others will try to use your code. They might try to use it on objects that your function was not designed for. Let's imagine we have written a short function to compute the geometric mean: +R functions and R packages are a convenient way to share code with the rest of the world but it is generally not possible to know how, or with what precise aim in mind, others will use your code. For example, they might try to use it on objects that your function was not designed for. Let's imagine we have written a short function to compute the geometric mean:
@@ -41,7 +41,7 @@ When you tested the function yourself, anything seemed fine:
-But a different person using your function might expose it the situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour: +But a different person using your function might expose it to situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour:
@@ -68,7 +68,7 @@ Or worse, it could give an incorrect output:
-Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we will review the diversity of approaches to help you check your function inputs. +Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments. ## Checking function inputs using base R @@ -90,7 +90,7 @@ Error in say_hello(404): is.character(name) is not TRUE However, as you can see in this example, the error message is not in plain English but contains some code instructions. This can hinder understanding of the issue. -Because of this, there was an improvement to [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) in R 4.0.0: +Because of this, [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) was improved in R 4.0.0: > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688. @@ -108,13 +108,13 @@ Error in say_hello(404): `name` must be a character. -But we can this from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself [^2]. This becomes necessary when you start having many input checks in your function or in your package. +This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^2]. This becomes necessary when you start having many input checks in your function or in your package. ## Checking function inputs using R packages ### The example of the checkmate package -But although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of function to check that function inputs respect a given set of properties, and returns clear error messages when it's not the case: +Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of function to check that inputs respect a given set of properties, and returns clear error messages when that is not the case:
@@ -143,7 +143,7 @@ Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have len ### Other packages to check function inputs -Because input checking is such an important point and so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details for all of them here but it is worth mentioning: +Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of the them. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. - testthat - assertthat @@ -155,7 +155,7 @@ Because input checking is such an important point and so difficult to get right, ## What about the future? -In this post, we have seen many alternatives to check function inputs more easily, and generate more informative error messages. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions would require some kind of typing system. It is interesting to note that many other languages followed this evolution (TypeScript as an extension of JavaScript, type annotations in Python). [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) +In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to have typing systems as they have developed (TypeScript as an extension of JavaScript, type annotations in Python). [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) [^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. From 094ea8d5af3b83059be54c1ed4da58b1ae15fd04 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 20 Jan 2022 14:18:55 +0100 Subject: [PATCH 10/47] Add a link to all packages --- .../post/2021-10-07-input-checking/index.Rmd | 28 +++++++++---------- .../post/2021-10-07-input-checking/index.md | 14 +++++----- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 7b9c8959..3d76f5c1 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -38,10 +38,9 @@ knitr::knit_hooks$set( ## Introduction: the dangers of not checking function inputs -R functions and R packages are a convenient way to share code with the rest of the world -but it is generally not possible to know how, or with what precise aim in mind, others will -use your code. For example, they might try to use it on objects that your function was not -designed for. Let's imagine we have written a short function to compute the geometric mean: +R functions and R packages are a convenient way to share code with the rest of the world but it is generally not possible to know how, or with what precise aim in mind, others will use your code. +For example, they might try to use it on objects that your function was not designed for. +Let's imagine we have written a short function to compute the geometric mean: ```{r} geometric_mean <- function(...) { @@ -117,7 +116,8 @@ say_hello <- function(name) { say_hello(404) ``` -This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. +This is clearly a really great improvement to the functionality of base R. +However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^2]. This becomes necessary when you start having many input checks in your function or in your package. @@ -152,15 +152,15 @@ say_hello(c("Bob", "Alice")) ### Other packages to check function inputs Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. -We will not get into the details of all of the available options here but below is a list of some of the them. If interested in understanding the various approaches to input taking the documentation -for these package is a great place to start. - -- testthat -- assertthat -- check -- assertr -- assertive -- ensurer +We will not get into the details of all of the available options here but below is a list of some of the them. +If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. + +- [testthat](https://testthat.r-lib.org/) +- [assertthat](https://github.com/hadley/assertthat) +- [check](https://github.com/moodymudskipper/check) +- [assertr](https://docs.ropensci.org/assertr/) +- [assertive](https://bitbucket.org/richierocks/assertive) +- [ensurer](https://github.com/smbache/ensurer) - `vctrs::vec_assert()` ## What about the future? diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 7a5c394f..06823854 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: cec101d449fbd3fc +rmd_hash: 778f602a1f74bd46 --- @@ -145,12 +145,12 @@ Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have len Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of the them. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. -- testthat -- assertthat -- check -- assertr -- assertive -- ensurer +- [testthat](https://testthat.r-lib.org/) +- [assertthat](https://github.com/hadley/assertthat) +- [check](https://github.com/moodymudskipper/check) +- [assertr](https://docs.ropensci.org/assertr/) +- [assertive](https://bitbucket.org/richierocks/assertive) +- [ensurer](https://github.com/smbache/ensurer) - [`vctrs::vec_assert()`](https://vctrs.r-lib.org/reference/vec_assert.html) ## What about the future? From e5e9394c039f1b0b79c129b2799236a0250ea4cb Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 20 Jan 2022 14:42:41 +0100 Subject: [PATCH 11/47] Apply suggestion from Mark --- content/post/2021-10-07-input-checking/index.Rmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 3d76f5c1..d5facd86 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -168,5 +168,6 @@ If interested in understanding the various approaches to input taking the docume In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. -Interestingly several other languages have evolved to have typing systems as they have developed (TypeScript as an extension of JavaScript, type annotations in Python). +Interestingly several other languages have evolved to havetyping systems as they have developed. +Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) From bbfe6889706ec3e51defd118adfa5a889f02bc55 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 20 Jan 2022 14:08:55 +0000 Subject: [PATCH 12/47] Update content/post/2021-10-07-input-checking/index.Rmd MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maëlle Salmon --- content/post/2021-10-07-input-checking/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index d5facd86..b3e1a848 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -168,6 +168,6 @@ If interested in understanding the various approaches to input taking the docume In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. -Interestingly several other languages have evolved to havetyping systems as they have developed. +Interestingly several other languages have evolved to have typing systems as they have developed. Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) From 5170ad05a3caaabf969ed1dd3b0737c56b717bb2 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 17 Feb 2022 18:24:09 +0100 Subject: [PATCH 13/47] Remove testthat from list of packages --- content/post/2021-10-07-input-checking/index.Rmd | 1 - 1 file changed, 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index b3e1a848..e1a534e5 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -155,7 +155,6 @@ Because input checking is such an important point task and because it is so diff We will not get into the details of all of the available options here but below is a list of some of the them. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. -- [testthat](https://testthat.r-lib.org/) - [assertthat](https://github.com/hadley/assertthat) - [check](https://github.com/moodymudskipper/check) - [assertr](https://docs.ropensci.org/assertr/) From bdc5e3eedbe49788642ba58a0cc56ec4a6fbe21c Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 17 Feb 2022 18:58:06 +0100 Subject: [PATCH 14/47] Add examples and number of revdeps --- .../post/2021-10-07-input-checking/index.Rmd | 44 +++++++++-- .../post/2021-10-07-input-checking/index.md | 73 ++++++++++++++++--- 2 files changed, 100 insertions(+), 17 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index e1a534e5..ee285ac8 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -101,7 +101,7 @@ This can hinder understanding of the issue. Because of this, `stopifnot()` was improved in R 4.0.0: -> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688. +> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688. This means we can now provide a clearer error message directly in `stopifnot()` [^1]: @@ -152,16 +152,46 @@ say_hello(c("Bob", "Alice")) ### Other packages to check function inputs Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. -We will not get into the details of all of the available options here but below is a list of some of the them. +We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. -- [assertthat](https://github.com/hadley/assertthat) -- [check](https://github.com/moodymudskipper/check) -- [assertr](https://docs.ropensci.org/assertr/) -- [assertive](https://bitbucket.org/richierocks/assertive) -- [ensurer](https://github.com/smbache/ensurer) +- [assertthat](https://github.com/hadley/assertthat) (`r length(tools::dependsOnPkgs("assertthat"))` reverse dependencies) + +```{r, error = TRUE} +assertthat::assert_that(is.character(1)) +``` + +- [assertr](https://docs.ropensci.org/assertr/) (`r length(tools::dependsOnPkgs("assertr"))` reverse dependencies) + +```{r, error = TRUE} +library(magrittr) + +mtcars %>% + assertr::verify(nrow(.) < 10) +``` + +- [assertive](https://bitbucket.org/richierocks/assertive) (`r length(tools::dependsOnPkgs("assertive"))` reverse dependencies) + +```{r, error = TRUE} +assertive::assert_is_a_string(1) +``` + +- [ensurer](https://github.com/smbache/ensurer) (`r length(tools::dependsOnPkgs("ensurer"))` reverse dependencies) + +```{r, error = TRUE} +ensure_square <- ensurer::ensures_that(NCOL(.) == NROW(.)) + +ensure_square(matrix(1:20, 4, 5)) +``` + - `vctrs::vec_assert()` +```{r, error = TRUE} +vctrs::vec_assert(c(1, 2), "character") + +vctrs::vec_assert(c(1, 2), size = 3) +``` + ## What about the future? In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 06823854..fa5e75c3 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 778f602a1f74bd46 +rmd_hash: 0c5654a4144805c4 --- @@ -92,7 +92,7 @@ However, as you can see in this example, the error message is not in plain Engli Because of this, [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) was improved in R 4.0.0: -> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR\#17688. +> stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688. This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^1]: @@ -143,19 +143,72 @@ Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have len ### Other packages to check function inputs -Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of the them. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. +Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. + +- [assertthat](https://github.com/hadley/assertthat) (11 reverse dependencies) + +
+ +
assertthat::assert_that(is.character(1))
+Error: 1 is not a character vector
+ +
+ +- [assertr](https://docs.ropensci.org/assertr/) (0 reverse dependencies) + +
+ +
library(magrittr)
+
+mtcars %>%
+  assertr::verify(nrow(.) < 10)
+verification [nrow(.) < 10] failed! (1 failure)
+
+    verb redux_fn    predicate column index value
+1 verify       NA nrow(.) < 10     NA     1    NA
+Error: assertr stopped execution
+ +
+ +- [assertive](https://bitbucket.org/richierocks/assertive) (0 reverse dependencies) + +
+ +
assertive::assert_is_a_string(1)
+Error in eval(expr, envir, enclos): is_a_string : 1 is not of class 'character'; it has class 'numeric'.
+ +
+ +- [ensurer](https://github.com/smbache/ensurer) (0 reverse dependencies) + +
+ +
ensure_square <- ensurer::ensures_that(NCOL(.) == NROW(.))
+
+ensure_square(matrix(1:20, 4, 5))
+Error: conditions failed for call 'rmarkdown::render(" .. ecking/index.Rmd", ':
+     * NCOL(.) == NROW(.)
+ +
-- [testthat](https://testthat.r-lib.org/) -- [assertthat](https://github.com/hadley/assertthat) -- [check](https://github.com/moodymudskipper/check) -- [assertr](https://docs.ropensci.org/assertr/) -- [assertive](https://bitbucket.org/richierocks/assertive) -- [ensurer](https://github.com/smbache/ensurer) - [`vctrs::vec_assert()`](https://vctrs.r-lib.org/reference/vec_assert.html) +
+ +
vctrs::vec_assert(c(1, 2), "character")
+Error in `vctrs::vec_assert()`:
+! `c(1, 2)` must be a vector with type .
+Instead, it has type .
+
+vctrs::vec_assert(c(1, 2), size = 3)
+Error in `stop_vctrs()`:
+! `c(1, 2)` must have size 3, not size 2.
+ +
+ ## What about the future? -In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to have typing systems as they have developed (TypeScript as an extension of JavaScript, type annotations in Python). [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) +In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to havetyping systems as they have developed. Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) [^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. From c643c71cc0c1c42fe279c1d67ebe25ed49e2df2c Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Thu, 17 Feb 2022 18:59:42 +0100 Subject: [PATCH 15/47] Remove number of revdeps --- content/post/2021-10-07-input-checking/index.Rmd | 8 ++++---- content/post/2021-10-07-input-checking/index.md | 10 +++++----- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index ee285ac8..e71acf29 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -155,13 +155,13 @@ Because input checking is such an important point task and because it is so diff We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. -- [assertthat](https://github.com/hadley/assertthat) (`r length(tools::dependsOnPkgs("assertthat"))` reverse dependencies) +- [assertthat](https://github.com/hadley/assertthat) ```{r, error = TRUE} assertthat::assert_that(is.character(1)) ``` -- [assertr](https://docs.ropensci.org/assertr/) (`r length(tools::dependsOnPkgs("assertr"))` reverse dependencies) +- [assertr](https://docs.ropensci.org/assertr/) ```{r, error = TRUE} library(magrittr) @@ -170,13 +170,13 @@ mtcars %>% assertr::verify(nrow(.) < 10) ``` -- [assertive](https://bitbucket.org/richierocks/assertive) (`r length(tools::dependsOnPkgs("assertive"))` reverse dependencies) +- [assertive](https://bitbucket.org/richierocks/assertive) ```{r, error = TRUE} assertive::assert_is_a_string(1) ``` -- [ensurer](https://github.com/smbache/ensurer) (`r length(tools::dependsOnPkgs("ensurer"))` reverse dependencies) +- [ensurer](https://github.com/smbache/ensurer) ```{r, error = TRUE} ensure_square <- ensurer::ensures_that(NCOL(.) == NROW(.)) diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index fa5e75c3..9c5f5119 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 0c5654a4144805c4 +rmd_hash: f35d02901a9698ee --- @@ -145,7 +145,7 @@ Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have len Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. -- [assertthat](https://github.com/hadley/assertthat) (11 reverse dependencies) +- [assertthat](https://github.com/hadley/assertthat)
@@ -154,7 +154,7 @@ Error: 1 is not a character vector
-- [assertr](https://docs.ropensci.org/assertr/) (0 reverse dependencies) +- [assertr](https://docs.ropensci.org/assertr/)
@@ -170,7 +170,7 @@ Error: assertr stopped execution
-- [assertive](https://bitbucket.org/richierocks/assertive) (0 reverse dependencies) +- [assertive](https://bitbucket.org/richierocks/assertive)
@@ -179,7 +179,7 @@ Error in eval(expr, envir, enclos): is_a_string : 1 is not of class 'character';
-- [ensurer](https://github.com/smbache/ensurer) (0 reverse dependencies) +- [ensurer](https://github.com/smbache/ensurer)
From 1e814603d51406001e5efdcd28a1fb517e41c5c9 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 18 Feb 2022 17:52:51 +0100 Subject: [PATCH 16/47] Add check back --- content/post/2021-10-07-input-checking/index.Rmd | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index e71acf29..2d8fe4e4 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -192,6 +192,8 @@ vctrs::vec_assert(c(1, 2), "character") vctrs::vec_assert(c(1, 2), size = 3) ``` +- [check](https://github.com/moodymudskipper/check) is slightly different because it doesn't provide utilities that work out of the box, but rather tools to assist you in writing your own checking functions + ## What about the future? In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. From 1323bf30bb1bf5c7ee8f756c06490f3cc7a7557e Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 18 Feb 2022 18:07:15 +0100 Subject: [PATCH 17/47] Render md --- content/post/2021-10-07-input-checking/index.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 9c5f5119..474547b6 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: f35d02901a9698ee +rmd_hash: ee9db5872d16c5aa --- @@ -206,9 +206,11 @@ Instead, it has type .
+- [check](https://github.com/moodymudskipper/check) is slightly different because it doesn't provide utilities that work out of the box, but rather tools to assist you in writing your own checking functions + ## What about the future? -In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to havetyping systems as they have developed. Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) +In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to have typing systems as they have developed. Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) [^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. From 7ccbe8c84e14678cb66e1f0d0ecfb294436f58cd Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 18 Feb 2022 18:09:59 +0100 Subject: [PATCH 18/47] Add author file for Hugo --- content/authors/hugo-gruson/_index.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 content/authors/hugo-gruson/_index.md diff --git a/content/authors/hugo-gruson/_index.md b/content/authors/hugo-gruson/_index.md new file mode 100644 index 00000000..fbdbcd4b --- /dev/null +++ b/content/authors/hugo-gruson/_index.md @@ -0,0 +1,14 @@ +--- +name: Hugo Gruson +social: +- service: Home + link: https://hugogruson.fr +- service: Twitter + link: https://twitter.com/grusonh + icon: fa-twitter +- service: Github + link: https://github.com/Bisaloo + icon: fa-github +--- + +Evolutionary Biologist turned Research Software Engineer in Epidemiology. From 41644ebbc7bb9db9fe9d9a412282c82c194bb06f Mon Sep 17 00:00:00 2001 From: Sam Abbott Date: Fri, 25 Feb 2022 10:45:01 +0000 Subject: [PATCH 19/47] Add author file for Sam Abbott (#2) --- content/authors/sam-abbott/index.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 content/authors/sam-abbott/index.md diff --git a/content/authors/sam-abbott/index.md b/content/authors/sam-abbott/index.md new file mode 100644 index 00000000..ed195a22 --- /dev/null +++ b/content/authors/sam-abbott/index.md @@ -0,0 +1,14 @@ +--- +name: Sam Abbott +social: +- service: Home + link: https://samabbott.co.uk +- service: Twitter + link: https://twitter.com/seabbs + icon: fa-twitter +- service: Github + link: https://github.com/seabbs + icon: fa-github +--- + +Infectious disease researcher interested in open source tool development. More on my research interests [here](https://samabbott.co.uk/research). From 4e1fab79a27989d7b7fd5bedd5a4e776edfbb30a Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 25 Feb 2022 12:37:33 +0100 Subject: [PATCH 20/47] Add section about documentation --- .../post/2021-10-07-input-checking/index.Rmd | 27 ++++++++++++++++--- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 2d8fe4e4..6c05ff2f 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -80,6 +80,25 @@ geometric_mean(c(2, 8)) Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments. +## Pre-requisite: thouroughly document your argument types + +You can notice from the simple example above that it's easy to pass invalid inputs to the `geometric_mean()` function because we didn't provide any documentation on what is or isn't a valid input. +We won't go into details here but the [roxygen2](https://roxygen2.r-lib.org/) package provides a convenient way to generate documentation for R functions. +Try to be as precise as possible when describing the required format for your inputs [^1]. + +[^1]: [Some package developers even developed their own standardized way to document argument types and length](https://github.com/r-lib/withr/commit/42e503092046705f30032cb3a321d64b0e9383d4). + But there is currently no standard shared across the R community. + +```{r} +#' @param name A character of length one with the name of the person to greet +say_hello <- function(name) { + stopifnot(is.character(name)) + paste("Hello", name) +} +``` + +Adding any kind of argument checking in the absence of good documentation would be vain and very frustrating for your users as they would have to figure out what is or isn't valid by trial and error. + ## Checking function inputs using base R There is a built-in mechanism to check input values in base R: `stopifnot()`. @@ -103,9 +122,9 @@ Because of this, `stopifnot()` was improved in R 4.0.0: > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688. -This means we can now provide a clearer error message directly in `stopifnot()` [^1]: +This means we can now provide a clearer error message directly in `stopifnot()` [^2]: -[^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. +[^2]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. ```{r, error = TRUE} say_hello <- function(name) { @@ -119,10 +138,10 @@ say_hello(404) This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". -This way, you don't have to repeat yourself which is generally a good aim [^2]. +This way, you don't have to repeat yourself which is generally a good aim [^3]. This becomes necessary when you start having many input checks in your function or in your package. -[^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +[^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) ## Checking function inputs using R packages From 3b872b951b6a5319f0a7d63e67f282f21e8763a6 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 25 Feb 2022 13:39:04 +0100 Subject: [PATCH 21/47] Add section about match.arg() --- .../post/2021-10-07-input-checking/index.Rmd | 29 +++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 6c05ff2f..1829fc03 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -101,6 +101,8 @@ Adding any kind of argument checking in the absence of good documentation would ## Checking function inputs using base R +### `stopifnot()` + There is a built-in mechanism to check input values in base R: `stopifnot()`. You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests. @@ -143,6 +145,33 @@ This becomes necessary when you start having many input checks in your function [^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +### `match.arg()` + +If the input can only take specific values, the base function `match.arg()` can also prove useful: + +```{r} +match.arg(arg = "R", choices = c("R", "python")) + +match.arg(arg = "javascript", choices = c("R", "python")) +``` + +But the real power of the `match.arg()` function comes from the fact that `choices` can be automatically obtained in the context of a function: + +```{r, error = TRUE} +choose_language <- function(language = c("R", "python")) { + + # Equivalent to `match.arg(language, c("R", "python")) + match.arg(language) + + paste("I love", language) + +} + +choose_language("R") + +choose_language("julia") +``` + ## Checking function inputs using R packages ### The example of the checkmate package From f2eb00d497a680bb3196c59e0fc97a57c9fd4fdb Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 25 Feb 2022 13:43:17 +0100 Subject: [PATCH 22/47] Add author file for Carl --- content/authors/carl-pearson/_index.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 content/authors/carl-pearson/_index.md diff --git a/content/authors/carl-pearson/_index.md b/content/authors/carl-pearson/_index.md new file mode 100644 index 00000000..dd492ecb --- /dev/null +++ b/content/authors/carl-pearson/_index.md @@ -0,0 +1,12 @@ +--- +name: Carl Pearson +social: +- service: Twitter + link: https://twitter.com/cap1024 + icon: fa-twitter +- service: Github + link: https://github.com/pearsonca + icon: fa-github +--- + +Modelling Infectious Disease Dynamics to Inform Decision Making From 3e645df16ed2955d495dc18449c23bf9413ba037 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 25 Feb 2022 14:20:19 +0100 Subject: [PATCH 23/47] Add missing error = TRUE --- content/post/2021-10-07-input-checking/index.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 1829fc03..8c1f05a9 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -50,7 +50,7 @@ geometric_mean <- function(...) { } ``` -When you tested the function yourself, anything seemed fine: +When you tested the function yourself, everything seemed fine: ```{r} geometric_mean(2, 8) @@ -149,7 +149,7 @@ This becomes necessary when you start having many input checks in your function If the input can only take specific values, the base function `match.arg()` can also prove useful: -```{r} +```{r, error = TRUE} match.arg(arg = "R", choices = c("R", "python")) match.arg(arg = "javascript", choices = c("R", "python")) From 724e74c7f77153b6f80062d38838ab0dfdea5248 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 25 Feb 2022 14:22:15 +0100 Subject: [PATCH 24/47] Switch order between match.arg() and stopifnot() --- .../post/2021-10-07-input-checking/index.Rmd | 56 +++++++++---------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 8c1f05a9..529bb0e6 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -101,9 +101,36 @@ Adding any kind of argument checking in the absence of good documentation would ## Checking function inputs using base R +### `match.arg()` + +If the input can only take specific values, the base function `match.arg()` can also prove useful: + +```{r, error = TRUE} +match.arg(arg = "R", choices = c("R", "python")) + +match.arg(arg = "javascript", choices = c("R", "python")) +``` + +But the real power of the `match.arg()` function comes from the fact that `choices` can be automatically obtained in the context of a function: + +```{r, error = TRUE} +choose_language <- function(language = c("R", "python")) { + + # Equivalent to `match.arg(language, c("R", "python")) + match.arg(language) + + paste("I love", language) + +} + +choose_language("R") + +choose_language("julia") +``` + ### `stopifnot()` -There is a built-in mechanism to check input values in base R: `stopifnot()`. +There is a another, more general, built-in mechanism to check input values in base R: `stopifnot()`. You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests. @@ -145,33 +172,6 @@ This becomes necessary when you start having many input checks in your function [^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) -### `match.arg()` - -If the input can only take specific values, the base function `match.arg()` can also prove useful: - -```{r, error = TRUE} -match.arg(arg = "R", choices = c("R", "python")) - -match.arg(arg = "javascript", choices = c("R", "python")) -``` - -But the real power of the `match.arg()` function comes from the fact that `choices` can be automatically obtained in the context of a function: - -```{r, error = TRUE} -choose_language <- function(language = c("R", "python")) { - - # Equivalent to `match.arg(language, c("R", "python")) - match.arg(language) - - paste("I love", language) - -} - -choose_language("R") - -choose_language("julia") -``` - ## Checking function inputs using R packages ### The example of the checkmate package From 3c712488460c3a3cea5e72ece17a8c60de7f0bf6 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Fri, 25 Feb 2022 14:22:30 +0100 Subject: [PATCH 25/47] Render md --- .../post/2021-10-07-input-checking/index.md | 69 +++++++++++++++++-- 1 file changed, 62 insertions(+), 7 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 474547b6..0c868587 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: ee9db5872d16c5aa +rmd_hash: f2bfaa06b74f2ff1 --- @@ -29,7 +29,7 @@ R functions and R packages are a convenient way to share code with the rest of t
-When you tested the function yourself, anything seemed fine: +When you tested the function yourself, everything seemed fine:
@@ -70,9 +70,62 @@ Or worse, it could give an incorrect output: Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments. +## Pre-requisite: thouroughly document your argument types + +You can notice from the simple example above that it's easy to pass invalid inputs to the `geometric_mean()` function because we didn't provide any documentation on what is or isn't a valid input. We won't go into details here but the [roxygen2](https://roxygen2.r-lib.org/) package provides a convenient way to generate documentation for R functions. Try to be as precise as possible when describing the required format for your inputs [^1]. + +
+ +
#' @param name A character of length one with the name of the person to greet
+say_hello <- function(name) {
+  stopifnot(is.character(name))
+  paste("Hello", name)
+}
+ +
+ +Adding any kind of argument checking in the absence of good documentation would be vain and very frustrating for your users as they would have to figure out what is or isn't valid by trial and error. + ## Checking function inputs using base R -There is a built-in mechanism to check input values in base R: [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html). You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests. +### `match.arg()` + +If the input can only take specific values, the base function [`match.arg()`](https://rdrr.io/r/base/match.arg.html) can also prove useful: + +
+ +
match.arg(arg = "R", choices = c("R", "python"))
+[1] "R"
+
+match.arg(arg = "javascript", choices = c("R", "python"))
+Error in match.arg(arg = "javascript", choices = c("R", "python")): 'arg' should be one of "R", "python"
+ +
+ +But the real power of the [`match.arg()`](https://rdrr.io/r/base/match.arg.html) function comes from the fact that `choices` can be automatically obtained in the context of a function: + +
+ +
choose_language <- function(language = c("R", "python")) {
+  
+  # Equivalent to `match.arg(language, c("R", "python"))
+  match.arg(language)
+  
+  paste("I love", language)
+  
+}
+
+choose_language("R")
+[1] "I love R"
+
+choose_language("julia")
+Error in match.arg(language): 'arg' should be one of "R", "python"
+ +
+ +### `stopifnot()` + +There is a another, more general, built-in mechanism to check input values in base R: [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html). You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests.
@@ -94,7 +147,7 @@ Because of this, [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) was impr > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688. -This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^1]: +This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^2]:
@@ -108,7 +161,7 @@ Error in say_hello(404): `name` must be a character.
-This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^2]. This becomes necessary when you start having many input checks in your function or in your package. +This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^3]. This becomes necessary when you start having many input checks in your function or in your package. ## Checking function inputs using R packages @@ -212,7 +265,9 @@ Instead, it has type . In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to have typing systems as they have developed. Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) -[^1]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. +[^1]: [Some package developers even developed their own standardized way to document argument types and length](https://github.com/r-lib/withr/commit/42e503092046705f30032cb3a321d64b0e9383d4). But there is currently no standard shared across the R community. + +[^2]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. -[^2]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +[^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) From a391ea4609b98dc3eed35dd793094a3fe68f5385 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 28 Feb 2022 11:51:43 +0000 Subject: [PATCH 26/47] Follow accessibility guidelines for Sam's author file MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maëlle Salmon --- content/authors/sam-abbott/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/authors/sam-abbott/index.md b/content/authors/sam-abbott/index.md index ed195a22..d2bcad1a 100644 --- a/content/authors/sam-abbott/index.md +++ b/content/authors/sam-abbott/index.md @@ -11,4 +11,4 @@ social: icon: fa-github --- -Infectious disease researcher interested in open source tool development. More on my research interests [here](https://samabbott.co.uk/research). +Infectious disease researcher interested in open source tool development. More on my [research interests](https://samabbott.co.uk/research). From ebfa94c3c12df0e81b9411267b5df14b4a471549 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 28 Feb 2022 11:52:04 +0000 Subject: [PATCH 27/47] Update spelling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maëlle Salmon --- content/post/2021-10-07-input-checking/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 529bb0e6..f51b6c59 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -80,7 +80,7 @@ geometric_mean(c(2, 8)) Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments. -## Pre-requisite: thouroughly document your argument types +## Pre-requisite: thoroughly document your argument types You can notice from the simple example above that it's easy to pass invalid inputs to the `geometric_mean()` function because we didn't provide any documentation on what is or isn't a valid input. We won't go into details here but the [roxygen2](https://roxygen2.r-lib.org/) package provides a convenient way to generate documentation for R functions. From dfee04350aaf85efe5c64c1c60b3e5a20fff88a1 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 28 Feb 2022 12:15:28 +0100 Subject: [PATCH 28/47] Disable crayon --- content/post/2021-10-07-input-checking/index.Rmd | 1 + content/post/2021-10-07-input-checking/index.md | 10 +++++----- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index f51b6c59..35d0956a 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -33,6 +33,7 @@ knitr::knit_hooks$set( ) } ) +options(crayon.enabled = FALSE) ``` diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 0c868587..c2808954 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: f2bfaa06b74f2ff1 +rmd_hash: 79f5f9b5fa2c8da4 --- @@ -249,13 +249,13 @@ Error: conditions failed for call 'rmarkdown::render(" .. ecking/index.Rmd", ':
vctrs::vec_assert(c(1, 2), "character")
-Error in `vctrs::vec_assert()`:
-! `c(1, 2)` must be a vector with type .
+Error in `vctrs::vec_assert()`:
+! `c(1, 2)` must be a vector with type .
 Instead, it has type .
 
 vctrs::vec_assert(c(1, 2), size = 3)
-Error in `stop_vctrs()`:
-! `c(1, 2)` must have size 3, not size 2.
+Error in `stop_vctrs()`: +! `c(1, 2)` must have size 3, not size 2.
From 3c733f38de972e63129fd92c94b7dba47960f88c Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 28 Feb 2022 11:53:17 +0000 Subject: [PATCH 29/47] Fix spelling again MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maëlle Salmon --- content/post/2021-10-07-input-checking/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 35d0956a..c477fbe9 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -179,7 +179,7 @@ This becomes necessary when you start having many input checks in your function Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). -checkmate provides a large number of function to check that inputs respect a given set of properties, and returns clear error messages when that is not the case: +checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case: ```{r} say_hello <- function(name) { From 6abf58924f6ee981b2076f716327fff4ee22609a Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 28 Feb 2022 17:05:26 +0100 Subject: [PATCH 30/47] Add example for check --- .../post/2021-10-07-input-checking/index.Rmd | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index c477fbe9..702eaf2a 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -243,6 +243,30 @@ vctrs::vec_assert(c(1, 2), size = 3) - [check](https://github.com/moodymudskipper/check) is slightly different because it doesn't provide utilities that work out of the box, but rather tools to assist you in writing your own checking functions +```{r, error = TRUE} +library(check) + +check::setup() + +set_check_fun( + "`{var}` must be a {type} vector of length {length}." = { + val <- get(var, env) + is.atomic(val) && is(val, type) && length(val) == length + } +) + +say_hello <- function(name) { + check( + "`name` must be a character vector of length 1." + ) + paste("hello", name) +} + +say_hello("Maria") + +say_hello(c("Maria", "Noelia")) +``` + ## What about the future? In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. From 246221a83e134b737f208cca5c5a1fe5ffb71c6c Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 28 Feb 2022 17:05:36 +0100 Subject: [PATCH 31/47] Render md --- .../post/2021-10-07-input-checking/index.md | 34 +++++++++++++++++-- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index c2808954..4b37a9b4 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 79f5f9b5fa2c8da4 +rmd_hash: 7a27e6aa3faf87ad --- @@ -70,7 +70,7 @@ Or worse, it could give an incorrect output: Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments. -## Pre-requisite: thouroughly document your argument types +## Pre-requisite: thoroughly document your argument types You can notice from the simple example above that it's easy to pass invalid inputs to the `geometric_mean()` function because we didn't provide any documentation on what is or isn't a valid input. We won't go into details here but the [roxygen2](https://roxygen2.r-lib.org/) package provides a convenient way to generate documentation for R functions. Try to be as precise as possible when describing the required format for your inputs [^1]. @@ -167,7 +167,7 @@ This is clearly a really great improvement to the functionality of base R. Howev ### The example of the checkmate package -Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of function to check that inputs respect a given set of properties, and returns clear error messages when that is not the case: +Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case:
@@ -261,6 +261,34 @@ Error in `stop_vctrs()`: - [check](https://github.com/moodymudskipper/check) is slightly different because it doesn't provide utilities that work out of the box, but rather tools to assist you in writing your own checking functions +
+ +
library(check)
+
+check::setup() 
+
+set_check_fun(
+  "`{var}` must be a {type} vector of length {length}." = {
+      val <- get(var, env)
+      is.atomic(val) && is(val, type) && length(val) == length
+  }
+)
+
+say_hello <- function(name) {
+  check(
+    "`name` must be a character vector of length 1."
+    )
+  paste("hello", name)
+}
+
+say_hello("Maria")
+[1] "hello Maria"
+
+say_hello(c("Maria", "Noelia"))
+Error: `name` must be a character vector of length 1.
+ +
+ ## What about the future? In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to have typing systems as they have developed. Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) From c3c21aecbff4e684acbf9b9d0fd36bb85ae14a7b Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 15:14:30 +0100 Subject: [PATCH 32/47] Mention rlang equivalents --- content/post/2021-10-07-input-checking/index.Rmd | 13 ++++++++----- content/post/2021-10-07-input-checking/index.md | 14 ++++++++------ 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 702eaf2a..ae2e8b9e 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -100,7 +100,10 @@ say_hello <- function(name) { Adding any kind of argument checking in the absence of good documentation would be vain and very frustrating for your users as they would have to figure out what is or isn't valid by trial and error. -## Checking function inputs using base R +## Checking function inputs using base R [^2] + +[^2]: Note that these base functions have equivalent in the tidyverse with a more consistent design and coloured output. + `match.arg()`'s equivalent is `rlang::arg_match()` and `stopifnot()`'s ### `match.arg()` @@ -152,9 +155,9 @@ Because of this, `stopifnot()` was improved in R 4.0.0: > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688. -This means we can now provide a clearer error message directly in `stopifnot()` [^2]: +This means we can now provide a clearer error message directly in `stopifnot()` [^3]: -[^2]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. +[^3]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. ```{r, error = TRUE} say_hello <- function(name) { @@ -168,10 +171,10 @@ say_hello(404) This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". -This way, you don't have to repeat yourself which is generally a good aim [^3]. +This way, you don't have to repeat yourself which is generally a good aim [^4]. This becomes necessary when you start having many input checks in your function or in your package. -[^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +[^4]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) ## Checking function inputs using R packages diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 4b37a9b4..ec7ee472 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 7a27e6aa3faf87ad +rmd_hash: 95b73fa760508f30 --- @@ -86,7 +86,7 @@ You can notice from the simple example above that it's easy to pass invalid inpu Adding any kind of argument checking in the absence of good documentation would be vain and very frustrating for your users as they would have to figure out what is or isn't valid by trial and error. -## Checking function inputs using base R +## Checking function inputs using base R [^2] ### `match.arg()` @@ -147,7 +147,7 @@ Because of this, [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) was impr > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688. -This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^2]: +This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^3]:
@@ -161,7 +161,7 @@ Error in say_hello(404): `name` must be a character.
-This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^3]. This becomes necessary when you start having many input checks in your function or in your package. +This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^4]. This becomes necessary when you start having many input checks in your function or in your package. ## Checking function inputs using R packages @@ -295,7 +295,9 @@ In this post, we have discussed some methods to check function inputs, and to ge [^1]: [Some package developers even developed their own standardized way to document argument types and length](https://github.com/r-lib/withr/commit/42e503092046705f30032cb3a321d64b0e9383d4). But there is currently no standard shared across the R community. -[^2]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. +[^2]: Note that these base functions have equivalent in the tidyverse with a more consistent design and coloured output. `match.arg`'s equivalent is [`rlang::arg_match()`](https://rlang.r-lib.org/reference/arg_match.html) and [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html)'s -[^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +[^3]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. + +[^4]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) From 2c33d08be608c015dc9c52bc6684c031b150203a Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 15:18:23 +0100 Subject: [PATCH 33/47] Mention blog post on internal functions --- content/post/2021-10-07-input-checking/index.Rmd | 4 +++- content/post/2021-10-07-input-checking/index.md | 8 +++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index ae2e8b9e..e6bb8083 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -180,10 +180,12 @@ This becomes necessary when you start having many input checks in your function ### The example of the checkmate package -Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. +Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem [^5], you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case: +[^5]: See [this earlier blog post](https://blog.r-hub.io/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. + ```{r} say_hello <- function(name) { # Among other things, check_string() checks that we provide a diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index ec7ee472..aface768 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 95b73fa760508f30 +rmd_hash: e9cce993a641d19e --- @@ -167,7 +167,7 @@ This is clearly a really great improvement to the functionality of base R. Howev ### The example of the checkmate package -Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case: +Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem [^5], you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case:
@@ -295,9 +295,11 @@ In this post, we have discussed some methods to check function inputs, and to ge [^1]: [Some package developers even developed their own standardized way to document argument types and length](https://github.com/r-lib/withr/commit/42e503092046705f30032cb3a321d64b0e9383d4). But there is currently no standard shared across the R community. -[^2]: Note that these base functions have equivalent in the tidyverse with a more consistent design and coloured output. `match.arg`'s equivalent is [`rlang::arg_match()`](https://rlang.r-lib.org/reference/arg_match.html) and [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html)'s +[^2]: Note that these base functions have equivalent in the tidyverse with a more consistent design and coloured output. [`match.arg()`](https://rdrr.io/r/base/match.arg.html)'s equivalent is [`rlang::arg_match()`](https://rlang.r-lib.org/reference/arg_match.html) and [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html)'s [^3]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. [^4]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +[^5]: See [this earlier blog post](https://blog.r-hub.io/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. + From 0524164c1c28069c416005c99c2745d71e3b3dae Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 15:39:23 +0100 Subject: [PATCH 34/47] Add section taking into account more of Carl comments --- content/post/2021-10-07-input-checking/index.Rmd | 15 +++++++++++++++ content/post/2021-10-07-input-checking/index.md | 10 +++++++++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index e6bb8083..4f139cc6 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -272,6 +272,21 @@ say_hello("Maria") say_hello(c("Maria", "Noelia")) ``` +## There is no 'one-size-fits-all' + +We have presented here different approaches but it is up to you, the developer, to decide which approach suits your needs best. +We do not believe that one choice is intrinsically better than the others. +All the workflows presented here can achieve the same result. +Your choice may be influenced by several factors we cannot take into consideration here: who is your target audience? +Will they be okay with somewhat technical terminology in the error messages? +Do you have reasons to try and limit the number of dependencies [^6]? +Which framework are you the more comfortable with and will facilitate maintenance in the future? +And ultimately, what is your personal preference? + +[^6]: This is a complex discussion often caricatured, but that has already been treated on some occasions such as [this blog post from Jim Hester](https://www.tidyverse.org/blog/2019/05/itdepends/). + +If you would like to hear various point of views and a more in-depth discussion about this, please refer to the [pull request related to this post](https://github.com/r-hub/blog/pull/150). + ## What about the future? In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index aface768..bab35e7c 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -11,7 +11,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: e9cce993a641d19e +rmd_hash: 69ab95ec35a8fe1e --- @@ -289,6 +289,12 @@ Error: `name` must be a character vector of length 1.
+## There is no 'one-size-fits-all' + +We have presented here different approaches but it is up to you, the developer, to decide which approach suits your needs best. We do not believe that one choice is intrinsically better than the others. All the workflows presented here can achieve the same result. Your choice may be influenced by several factors we cannot take into consideration here: who is your target audience? Will they be okay with somewhat technical terminology in the error messages? Do you have reasons to try and limit the number of dependencies [^6]? Which framework are you the more comfortable with and will facilitate maintenance in the future? And ultimately, what is your personal preference? + +If you would like to hear various point of views and a more in-depth discussion about this, please refer to the [pull request related to this post](https://github.com/r-hub/blog/pull/150). + ## What about the future? In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it's often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to have typing systems as they have developed. Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. [Will R one day follow suit?](https://blog.q-lang.org/posts/2021-10-16-project/) @@ -303,3 +309,5 @@ In this post, we have discussed some methods to check function inputs, and to ge [^5]: See [this earlier blog post](https://blog.r-hub.io/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. +[^6]: This is a complex discussion often caricatured, but that has already been treated on some occasions such as [this blog post from Jim Hester](https://www.tidyverse.org/blog/2019/05/itdepends/). + From 8483bf0d89658e8a5d315fca92836f8529a0dfb9 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 15:41:10 +0100 Subject: [PATCH 35/47] Drop Tim from authors as per his request --- content/post/2021-10-07-input-checking/index.Rmd | 1 - content/post/2021-10-07-input-checking/index.md | 3 +-- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 4f139cc6..ad5d65c4 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -5,7 +5,6 @@ authors: - Sam Abbott - Hugo Gruson - Carl Pearson -- Tim Taylor date: "2021-10-07" tags: - package development diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index bab35e7c..befb346d 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -5,13 +5,12 @@ authors: - Sam Abbott - Hugo Gruson - Carl Pearson -- Tim Taylor date: "2021-10-07" tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 69ab95ec35a8fe1e +rmd_hash: 4bab1cca23234a4d --- From beae8c3de560e20033e9ae6b3b0b0787832e0445 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 15:41:57 +0100 Subject: [PATCH 36/47] Reorder authors --- content/post/2021-10-07-input-checking/index.Rmd | 2 +- content/post/2021-10-07-input-checking/index.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index ad5d65c4..f75082b8 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -2,8 +2,8 @@ slug: input-checking title: "Checking the inputs of your R functions" authors: -- Sam Abbott - Hugo Gruson +- Sam Abbott - Carl Pearson date: "2021-10-07" tags: diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index befb346d..8c5db2ed 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -2,15 +2,15 @@ slug: input-checking title: "Checking the inputs of your R functions" authors: -- Sam Abbott - Hugo Gruson +- Sam Abbott - Carl Pearson date: "2021-10-07" tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 4bab1cca23234a4d +rmd_hash: abf2aec8c3e2d01d --- From bf75fce9e18b100bfce6f8fd36e6b4b39255a21f Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 15:45:46 +0100 Subject: [PATCH 37/47] Add pre-intro --- content/post/2021-10-07-input-checking/index.Rmd | 5 +++++ content/post/2021-10-07-input-checking/index.md | 4 +++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index f75082b8..18cd1a2b 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -36,6 +36,11 @@ options(crayon.enabled = FALSE) ``` +Are you, like we were, tired of filling your functions with argument checking code that sometimes ends up being longer that the core of the function itself? +Are you trying to find what is the most efficient approach to check inputs easily and without forgetting any edge cases? +Read about our exploration into the various ways to check your function inputs in R in this blog post. +And please share your own tips and discoveries in the comment section below! + ## Introduction: the dangers of not checking function inputs R functions and R packages are a convenient way to share code with the rest of the world but it is generally not possible to know how, or with what precise aim in mind, others will use your code. diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 8c5db2ed..2913ffc8 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -10,10 +10,12 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: abf2aec8c3e2d01d +rmd_hash: a922f2a17e9ef593 --- +Are you, like we were, tired of filling your functions with argument checking code that sometimes ends up being longer that the core of the function itself? Are you trying to find what is the most efficient approach to check inputs easily and without forgetting any edge cases? Read about our exploration into the various ways to check your function inputs in R in this blog post. And please share your own tips and discoveries in the comment section below! + ## Introduction: the dangers of not checking function inputs R functions and R packages are a convenient way to share code with the rest of the world but it is generally not possible to know how, or with what precise aim in mind, others will use your code. For example, they might try to use it on objects that your function was not designed for. Let's imagine we have written a short function to compute the geometric mean: From e8c0b03cc7857453f0ff2f899556597e2ddced73 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 17:21:58 +0000 Subject: [PATCH 38/47] Make links to r-hub blog relative MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maëlle Salmon --- content/post/2021-10-07-input-checking/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 18cd1a2b..e33e475b 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -178,7 +178,7 @@ Each time we test if the object is of `class_X` and this is not true, we could t This way, you don't have to repeat yourself which is generally a good aim [^4]. This becomes necessary when you start having many input checks in your function or in your package. -[^4]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +[^4]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](/2021/07/30/cache/) ## Checking function inputs using R packages From 7ef6637ad88a44d96761c3cd2713721349f8ecc7 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 17:23:15 +0000 Subject: [PATCH 39/47] Make links to r-hub blog relative 2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maëlle Salmon --- content/post/2021-10-07-input-checking/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index e33e475b..196d0430 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -188,7 +188,7 @@ Although some developers create [their own functions](https://github.com/djnavar One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case: -[^5]: See [this earlier blog post](https://blog.r-hub.io/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. +[^5]: See [this earlier blog post](/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. ```{r} say_hello <- function(name) { From 87231aaf705e8eb1b1cec7df5d7fa037769b1615 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Mon, 7 Mar 2022 17:23:38 +0000 Subject: [PATCH 40/47] Remove extra 'below' MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maëlle Salmon --- content/post/2021-10-07-input-checking/index.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 196d0430..82396922 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -39,7 +39,7 @@ options(crayon.enabled = FALSE) Are you, like we were, tired of filling your functions with argument checking code that sometimes ends up being longer that the core of the function itself? Are you trying to find what is the most efficient approach to check inputs easily and without forgetting any edge cases? Read about our exploration into the various ways to check your function inputs in R in this blog post. -And please share your own tips and discoveries in the comment section below! +And please share your own tips and discoveries in the comment section! ## Introduction: the dangers of not checking function inputs From 6a1881b8fcb59f56fea5cbfd0f23f2e698a8f27e Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 9 Mar 2022 12:34:27 +0100 Subject: [PATCH 41/47] Clarify default with match.arg() --- content/post/2021-10-07-input-checking/index.Rmd | 7 +++++-- content/post/2021-10-07-input-checking/index.md | 9 ++++++--- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 82396922..9f37ced9 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -119,13 +119,14 @@ match.arg(arg = "R", choices = c("R", "python")) match.arg(arg = "javascript", choices = c("R", "python")) ``` -But the real power of the `match.arg()` function comes from the fact that `choices` can be automatically obtained in the context of a function: +But the real power of the `match.arg()` function comes from the fact that `choices` can be automatically obtained in the context of a function. +The default choice is then always the first element: ```{r, error = TRUE} choose_language <- function(language = c("R", "python")) { # Equivalent to `match.arg(language, c("R", "python")) - match.arg(language) + language <- match.arg(language) paste("I love", language) @@ -133,6 +134,8 @@ choose_language <- function(language = c("R", "python")) { choose_language("R") +choose_language() + choose_language("julia") ``` diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 2913ffc8..c2b7414e 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -10,7 +10,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: a922f2a17e9ef593 +rmd_hash: 9db4c63dcef34c70 --- @@ -103,14 +103,14 @@ Error in match.arg(arg = "javascript", choices = c("R", "python")): 'arg' should
-But the real power of the [`match.arg()`](https://rdrr.io/r/base/match.arg.html) function comes from the fact that `choices` can be automatically obtained in the context of a function: +But the real power of the [`match.arg()`](https://rdrr.io/r/base/match.arg.html) function comes from the fact that `choices` can be automatically obtained in the context of a function. The default choice is then always the first element:
choose_language <- function(language = c("R", "python")) {
   
   # Equivalent to `match.arg(language, c("R", "python"))
-  match.arg(language)
+  language <- match.arg(language)
   
   paste("I love", language)
   
@@ -119,6 +119,9 @@ But the real power of the [`match.arg()`](https://rdrr.io/r/base/match.arg.html)
 choose_language("R")
 [1] "I love R"
 
+choose_language()
+[1] "I love R"
+
 choose_language("julia")
 Error in match.arg(language): 'arg' should be one of "R", "python"
From baa4862079eef9dab66cf02e8422792df0d55a0c Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 9 Mar 2022 13:01:28 +0100 Subject: [PATCH 42/47] Promote footnote about rlang's match.arg() to main text --- .../post/2021-10-07-input-checking/index.Rmd | 23 +++++++++--------- .../post/2021-10-07-input-checking/index.md | 24 +++++++++---------- 2 files changed, 23 insertions(+), 24 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 9f37ced9..5803af9a 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -104,10 +104,7 @@ say_hello <- function(name) { Adding any kind of argument checking in the absence of good documentation would be vain and very frustrating for your users as they would have to figure out what is or isn't valid by trial and error. -## Checking function inputs using base R [^2] - -[^2]: Note that these base functions have equivalent in the tidyverse with a more consistent design and coloured output. - `match.arg()`'s equivalent is `rlang::arg_match()` and `stopifnot()`'s +## Checking function inputs using base R ### `match.arg()` @@ -139,6 +136,8 @@ choose_language() choose_language("julia") ``` +We are getting out of the realm of base R but it is worth mentioning that `match.arg()` has an equivalent in the tidyverse with a more consistent design and coloured output: `rlang::arg_match()`. + ### `stopifnot()` There is a another, more general, built-in mechanism to check input values in base R: `stopifnot()`. @@ -162,9 +161,9 @@ Because of this, `stopifnot()` was improved in R 4.0.0: > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688. -This means we can now provide a clearer error message directly in `stopifnot()` [^3]: +This means we can now provide a clearer error message directly in `stopifnot()` [^2]: -[^3]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. +[^2]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. ```{r, error = TRUE} say_hello <- function(name) { @@ -178,20 +177,20 @@ say_hello(404) This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". -This way, you don't have to repeat yourself which is generally a good aim [^4]. +This way, you don't have to repeat yourself which is generally a good aim [^3]. This becomes necessary when you start having many input checks in your function or in your package. -[^4]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](/2021/07/30/cache/) +[^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](/2021/07/30/cache/) ## Checking function inputs using R packages ### The example of the checkmate package -Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem [^5], you can also rely on existing packages to make your life easier. +Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem [^4], you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case: -[^5]: See [this earlier blog post](/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. +[^4]: See [this earlier blog post](/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. ```{r} say_hello <- function(name) { @@ -286,11 +285,11 @@ We do not believe that one choice is intrinsically better than the others. All the workflows presented here can achieve the same result. Your choice may be influenced by several factors we cannot take into consideration here: who is your target audience? Will they be okay with somewhat technical terminology in the error messages? -Do you have reasons to try and limit the number of dependencies [^6]? +Do you have reasons to try and limit the number of dependencies [^5]? Which framework are you the more comfortable with and will facilitate maintenance in the future? And ultimately, what is your personal preference? -[^6]: This is a complex discussion often caricatured, but that has already been treated on some occasions such as [this blog post from Jim Hester](https://www.tidyverse.org/blog/2019/05/itdepends/). +[^5]: This is a complex discussion often caricatured, but that has already been treated on some occasions such as [this blog post from Jim Hester](https://www.tidyverse.org/blog/2019/05/itdepends/). If you would like to hear various point of views and a more in-depth discussion about this, please refer to the [pull request related to this post](https://github.com/r-hub/blog/pull/150). diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index c2b7414e..36b6782c 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -10,7 +10,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 9db4c63dcef34c70 +rmd_hash: 2e7a2c0e610572b2 --- @@ -87,7 +87,7 @@ You can notice from the simple example above that it's easy to pass invalid inpu Adding any kind of argument checking in the absence of good documentation would be vain and very frustrating for your users as they would have to figure out what is or isn't valid by trial and error. -## Checking function inputs using base R [^2] +## Checking function inputs using base R ### `match.arg()` @@ -127,6 +127,8 @@ Error in match.arg(language): 'arg' should be one of "R", "python"
+We are getting out of the realm of base R but it is worth mentioning that [`match.arg()`](https://rdrr.io/r/base/match.arg.html) has an equivalent in the tidyverse with a more consistent design and coloured output: [`rlang::arg_match()`](https://rlang.r-lib.org/reference/arg_match.html). + ### `stopifnot()` There is a another, more general, built-in mechanism to check input values in base R: [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html). You can see it [used](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/approx.R#L78) [throughout](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/cor.R#L36) [R](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/graphics/R/smoothScatter.R#L47) [source](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/srcfile.R#L23) [code](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/parse.R#L65). As its name suggests, it will *stop* the function execution *if* an object does *not* pass some tests. @@ -151,7 +153,7 @@ Because of this, [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) was impr > stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688. -This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^3]: +This means we can now provide a clearer error message directly in [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html) [^2]:
@@ -165,13 +167,13 @@ Error in say_hello(404): `name` must be a character.
-This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^4]. This becomes necessary when you start having many input checks in your function or in your package. +This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of `class_X` and this is not true, we could throw an error saying something like "x must of a class_X". This way, you don't have to repeat yourself which is generally a good aim [^3]. This becomes necessary when you start having many input checks in your function or in your package. ## Checking function inputs using R packages ### The example of the checkmate package -Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem [^5], you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case: +Although some developers create [their own functions](https://github.com/djnavarro/bs4cards/blob/a021d731a307ec7af692a42364308b60e2bf9827/R/validators.R) to solve this problem [^4], you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is [checkmate](https://mllg.github.io/checkmate/). checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case:
@@ -295,7 +297,7 @@ Error: `name` must be a character vector of length 1. ## There is no 'one-size-fits-all' -We have presented here different approaches but it is up to you, the developer, to decide which approach suits your needs best. We do not believe that one choice is intrinsically better than the others. All the workflows presented here can achieve the same result. Your choice may be influenced by several factors we cannot take into consideration here: who is your target audience? Will they be okay with somewhat technical terminology in the error messages? Do you have reasons to try and limit the number of dependencies [^6]? Which framework are you the more comfortable with and will facilitate maintenance in the future? And ultimately, what is your personal preference? +We have presented here different approaches but it is up to you, the developer, to decide which approach suits your needs best. We do not believe that one choice is intrinsically better than the others. All the workflows presented here can achieve the same result. Your choice may be influenced by several factors we cannot take into consideration here: who is your target audience? Will they be okay with somewhat technical terminology in the error messages? Do you have reasons to try and limit the number of dependencies [^5]? Which framework are you the more comfortable with and will facilitate maintenance in the future? And ultimately, what is your personal preference? If you would like to hear various point of views and a more in-depth discussion about this, please refer to the [pull request related to this post](https://github.com/r-hub/blog/pull/150). @@ -305,13 +307,11 @@ In this post, we have discussed some methods to check function inputs, and to ge [^1]: [Some package developers even developed their own standardized way to document argument types and length](https://github.com/r-lib/withr/commit/42e503092046705f30032cb3a321d64b0e9383d4). But there is currently no standard shared across the R community. -[^2]: Note that these base functions have equivalent in the tidyverse with a more consistent design and coloured output. [`match.arg()`](https://rdrr.io/r/base/match.arg.html)'s equivalent is [`rlang::arg_match()`](https://rlang.r-lib.org/reference/arg_match.html) and [`stopifnot()`](https://rdrr.io/r/base/stopifnot.html)'s - -[^3]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. +[^2]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. -[^4]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +[^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) -[^5]: See [this earlier blog post](https://blog.r-hub.io/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. +[^4]: See [this earlier blog post](https://blog.r-hub.io/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. -[^6]: This is a complex discussion often caricatured, but that has already been treated on some occasions such as [this blog post from Jim Hester](https://www.tidyverse.org/blog/2019/05/itdepends/). +[^5]: This is a complex discussion often caricatured, but that has already been treated on some occasions such as [this blog post from Jim Hester](https://www.tidyverse.org/blog/2019/05/itdepends/). From 5ac2dd511754f4bd474042356ced98ba9d59dcf6 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 9 Mar 2022 13:09:53 +0100 Subject: [PATCH 43/47] Mention vtr --- .../post/2021-10-07-input-checking/index.Rmd | 12 ++++++++++++ .../post/2021-10-07-input-checking/index.md | 19 ++++++++++++++++++- 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 5803af9a..710de9f4 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -244,6 +244,18 @@ ensure_square <- ensurer::ensures_that(NCOL(.) == NROW(.)) ensure_square(matrix(1:20, 4, 5)) ``` +- [vetr](https://github.com/brodieG/vetr) + +```{r, error = TRUE} +template <- numeric(1L) + +vetr::vet(template, 42) + +vetr::vet(template, 1:3) + +vetr::vet(template, "hello") +``` + - `vctrs::vec_assert()` ```{r, error = TRUE} diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 36b6782c..4a3192b3 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -10,7 +10,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 2e7a2c0e610572b2 +rmd_hash: 5ea789cb694eedce --- @@ -250,6 +250,23 @@ Error: conditions failed for call 'rmarkdown::render(" .. ecking/index.Rmd", ':
+- [vetr](https://github.com/brodieG/vetr) + +
+ +
template <- numeric(1L)
+
+vetr::vet(template, 42)
+[1] TRUE
+
+vetr::vet(template, 1:3)
+[1] "`length(1:3)` should be 1 (is 3)"
+
+vetr::vet(template, "hello")
+[1] "`\"hello\"` should be type \"numeric\" (is \"character\")"
+ +
+ - [`vctrs::vec_assert()`](https://vctrs.r-lib.org/reference/vec_assert.html)
From 8a9c09b9794d83c35857974fe161042ffc2474a9 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 9 Mar 2022 13:12:29 +0100 Subject: [PATCH 44/47] Fix typo --- content/post/2021-10-07-input-checking/index.Rmd | 2 +- content/post/2021-10-07-input-checking/index.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 710de9f4..ae7dd6fb 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -213,7 +213,7 @@ say_hello(c("Bob", "Alice")) Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. -If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. +If you're interested in understanding the various approaches to input checking, the documentation for these package is a great place to start. - [assertthat](https://github.com/hadley/assertthat) diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 4a3192b3..3aecf086 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -10,7 +10,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 5ea789cb694eedce +rmd_hash: 25e9608409d594fd --- @@ -202,7 +202,7 @@ Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have len ### Other packages to check function inputs -Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If interested in understanding the various approaches to input taking the documentation for these package is a great place to start. +Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If you're interested in understanding the various approaches to input checking, the documentation for these package is a great place to start. - [assertthat](https://github.com/hadley/assertthat) From 9381f6aa187c5ca3eb93ba69e1e315b5583843c3 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 9 Mar 2022 13:15:10 +0100 Subject: [PATCH 45/47] Move vetr up Since we said we will order by number of reverse deps --- .../post/2021-10-07-input-checking/index.Rmd | 24 ++++++------- .../post/2021-10-07-input-checking/index.md | 36 +++++++++---------- 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index ae7dd6fb..314cce0f 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -221,6 +221,18 @@ If you're interested in understanding the various approaches to input checking, assertthat::assert_that(is.character(1)) ``` +- [vetr](https://github.com/brodieG/vetr) + +```{r, error = TRUE} +template <- numeric(1L) + +vetr::vet(template, 42) + +vetr::vet(template, 1:3) + +vetr::vet(template, "hello") +``` + - [assertr](https://docs.ropensci.org/assertr/) ```{r, error = TRUE} @@ -244,18 +256,6 @@ ensure_square <- ensurer::ensures_that(NCOL(.) == NROW(.)) ensure_square(matrix(1:20, 4, 5)) ``` -- [vetr](https://github.com/brodieG/vetr) - -```{r, error = TRUE} -template <- numeric(1L) - -vetr::vet(template, 42) - -vetr::vet(template, 1:3) - -vetr::vet(template, "hello") -``` - - `vctrs::vec_assert()` ```{r, error = TRUE} diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 3aecf086..645baa3f 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -10,7 +10,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: 25e9608409d594fd +rmd_hash: b73f799175d0e95a --- @@ -213,6 +213,23 @@ Error: 1 is not a character vector
+- [vetr](https://github.com/brodieG/vetr) + +
+ +
template <- numeric(1L)
+
+vetr::vet(template, 42)
+[1] TRUE
+
+vetr::vet(template, 1:3)
+[1] "`length(1:3)` should be 1 (is 3)"
+
+vetr::vet(template, "hello")
+[1] "`\"hello\"` should be type \"numeric\" (is \"character\")"
+ +
+ - [assertr](https://docs.ropensci.org/assertr/)
@@ -250,23 +267,6 @@ Error: conditions failed for call 'rmarkdown::render(" .. ecking/index.Rmd", ':
-- [vetr](https://github.com/brodieG/vetr) - -
- -
template <- numeric(1L)
-
-vetr::vet(template, 42)
-[1] TRUE
-
-vetr::vet(template, 1:3)
-[1] "`length(1:3)` should be 1 (is 3)"
-
-vetr::vet(template, "hello")
-[1] "`\"hello\"` should be type \"numeric\" (is \"character\")"
- -
- - [`vctrs::vec_assert()`](https://vctrs.r-lib.org/reference/vec_assert.html)
From 474bf0794218c5749d448760cc194c2b524bf3d7 Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 9 Mar 2022 13:16:50 +0100 Subject: [PATCH 46/47] Mention comparison in vetr --- content/post/2021-10-07-input-checking/index.Rmd | 1 + content/post/2021-10-07-input-checking/index.md | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2021-10-07-input-checking/index.Rmd index 314cce0f..e5727579 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2021-10-07-input-checking/index.Rmd @@ -214,6 +214,7 @@ say_hello(c("Bob", "Alice")) Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If you're interested in understanding the various approaches to input checking, the documentation for these package is a great place to start. +For a more in-depth comparison of the different packages, vetr itself has [a nice overview on this topic](https://htmlpreview.github.io/?https://github.com/brodieG/vetr/blob/master/extra/compare.html). - [assertthat](https://github.com/hadley/assertthat) diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2021-10-07-input-checking/index.md index 645baa3f..ebd53ec2 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2021-10-07-input-checking/index.md @@ -10,7 +10,7 @@ tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: b73f799175d0e95a +rmd_hash: d0b4522ddb48a500 --- @@ -202,7 +202,7 @@ Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have len ### Other packages to check function inputs -Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If you're interested in understanding the various approaches to input checking, the documentation for these package is a great place to start. +Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If you're interested in understanding the various approaches to input checking, the documentation for these package is a great place to start. For a more in-depth comparison of the different packages, vetr itself has [a nice overview on this topic](https://htmlpreview.github.io/?https://github.com/brodieG/vetr/blob/master/extra/compare.html). - [assertthat](https://github.com/hadley/assertthat) From 2fbc7743f6898079573131f8dd3f52942491dfdf Mon Sep 17 00:00:00 2001 From: Hugo Gruson Date: Wed, 9 Mar 2022 15:11:25 +0100 Subject: [PATCH 47/47] Update post date --- .../index.Rmd | 2 +- .../index.md | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) rename content/post/{2021-10-07-input-checking => 2022-03-10-input-checking}/index.Rmd (99%) rename content/post/{2021-10-07-input-checking => 2022-03-10-input-checking}/index.md (98%) diff --git a/content/post/2021-10-07-input-checking/index.Rmd b/content/post/2022-03-10-input-checking/index.Rmd similarity index 99% rename from content/post/2021-10-07-input-checking/index.Rmd rename to content/post/2022-03-10-input-checking/index.Rmd index e5727579..06d3da59 100644 --- a/content/post/2021-10-07-input-checking/index.Rmd +++ b/content/post/2022-03-10-input-checking/index.Rmd @@ -5,7 +5,7 @@ authors: - Hugo Gruson - Sam Abbott - Carl Pearson -date: "2021-10-07" +date: "2022-03-10" tags: - package development - r-package diff --git a/content/post/2021-10-07-input-checking/index.md b/content/post/2022-03-10-input-checking/index.md similarity index 98% rename from content/post/2021-10-07-input-checking/index.md rename to content/post/2022-03-10-input-checking/index.md index ebd53ec2..c66a6ab2 100644 --- a/content/post/2021-10-07-input-checking/index.md +++ b/content/post/2022-03-10-input-checking/index.md @@ -5,16 +5,16 @@ authors: - Hugo Gruson - Sam Abbott - Carl Pearson -date: "2021-10-07" +date: "2022-03-10" tags: - package development - r-package output: hugodown::hugo_document -rmd_hash: d0b4522ddb48a500 +rmd_hash: 35d296a521792ef8 --- -Are you, like we were, tired of filling your functions with argument checking code that sometimes ends up being longer that the core of the function itself? Are you trying to find what is the most efficient approach to check inputs easily and without forgetting any edge cases? Read about our exploration into the various ways to check your function inputs in R in this blog post. And please share your own tips and discoveries in the comment section below! +Are you, like we were, tired of filling your functions with argument checking code that sometimes ends up being longer that the core of the function itself? Are you trying to find what is the most efficient approach to check inputs easily and without forgetting any edge cases? Read about our exploration into the various ways to check your function inputs in R in this blog post. And please share your own tips and discoveries in the comment section! ## Introduction: the dangers of not checking function inputs @@ -326,9 +326,9 @@ In this post, we have discussed some methods to check function inputs, and to ge [^2]: Read [the tidyverse style guide](https://style.tidyverse.org/error-messages.html) for more guidance on how to write good error messages. -[^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](https://blog.r-hub.io/2021/07/30/cache/) +[^3]: The [Don't Repeat Yourself (DRY) principle of software development](https://en.wikipedia.org/wiki/Don't_repeat_yourself), also mentioned in this post on [caching](/2021/07/30/cache/) -[^4]: See [this earlier blog post](https://blog.r-hub.io/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. +[^4]: See [this earlier blog post](/2019/12/12/internal-functions/) for more information about why and who you would go with writing internal functions. [^5]: This is a complex discussion often caricatured, but that has already been treated on some occasions such as [this blog post from Jim Hester](https://www.tidyverse.org/blog/2019/05/itdepends/).