forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Quotation.Rmd
1223 lines (874 loc) · 44.3 KB
/
Quotation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Quasiquotation
```{r, include = FALSE}
source("common.R")
```
## Introduction
Now that you understand the tree structure of R code, it's time to return to one of the fundamental ideas that make `expr()` and `ast()` work: quotation. In tidy evaluation, all quoting functions are actually quasiquoting functions because they also support unquoting. Where quotation is the act of capturing an unevaluated expression, __unquotation__ is the ability to selectively evaluate parts of an otherwise quoted expression. Together, this is called quasiquotation. Quasiquotation makes it easy to create functions that combine code written by the function's author with code written by the function's user. This helps to solve a wide variety of challenging problems.
Quasiquotation is one of the three pillars of tidy evaluation. You'll learn about the other two (quosures and the data mask) in Chapter \@ref(evaluation). By itself, quasiquotation is most useful for programming, particularly for generating code. But when it's combined with the other techniques, tidy evaluation becomes a powerful tool for data analysis.
### Outline {-}
* Section \@ref(quasi-motivation) motivates the development of quasiquotation
with a function, `cement()`, that works like `paste()` but automatically
"quotes" its arguments so that you don't have to.
* Section \@ref(quoting) gives you the tools to quote expressions, whether
they come from you or the user, or whether you use rlang or base R tools.
* Section \@ref(unquoting) introduces the biggest difference between rlang
quoting functions and base quoting function: unquoting with `!!` and `!!!`.
* Section \@ref(base-nonquote) discusses the three main "non-quoting"
techniques that base R functions uses to disable quoting behaviour.
* Section \@ref(tidy-dots) explores another place that you can use `!!!`,
functions that take `...`. It also introduces the special `:=` operator,
which allows you to dynamically change argument names.
* Section \@ref(expr-case-studies) shows a few practical uses of quoting to solve
problems that naturally require some code generation.
* Section \@ref(history) finishes up with a little history of quasiquotation
for those who are interested.
### Prerequisites {-}
Make sure you've read the metaprogramming overview in Chapter \@ref(meta-big-picture) to get a broad overview of the motivation and the basic vocabulary, and that you're familiar with the tree structure of expressions as described in Section \@ref(expression-details).
Code-wise, we'll mostly be using the tools from [rlang](https://rlang.r-lib.org), but at the end of the chapter you'll also see some powerful applications in conjunction with [purrr](https://purrr.tidyverse.org).
```{r setup, message = FALSE}
library(rlang)
library(purrr)
```
### Related work {-}
\index{macros}
\index{fexprs}
Quoting functions have deep connections to Lisp __macros__. But macros are usually run at compile-time, which doesn't exist in R, and they always input and output ASTs. See @lumley-2001 for one approach to implementing them in R. Quoting functions are more closely related to the more esoteric Lisp [__fexprs__](http://en.wikipedia.org/wiki/Fexpr), functions where all arguments are quoted by default. These terms are useful to know when looking for related work in other programming languages.
## Motivation {#quasi-motivation}
We'll start with a concrete example that helps motivate the need for unquoting, and hence quasiquotation. Imagine you're creating a lot of strings by joining together words:
```{r}
paste("Good", "morning", "Hadley")
paste("Good", "afternoon", "Alice")
```
You are sick and tired of writing all those quotes, and instead you just want to use bare words. To that end, you've written the following function. (Don't worry about the implementation for now; you'll learn about the pieces later.)
```{r}
cement <- function(...) {
args <- ensyms(...)
paste(purrr::map(args, as_string), collapse = " ")
}
cement(Good, morning, Hadley)
cement(Good, afternoon, Alice)
```
Formally, this function quotes all of its inputs. You can think of it as automatically putting quotation marks around each argument. That's not precisely true as the intermediate objects it generates are expressions, not strings, but it's a useful approximation, and the root meaning of the term "quote".
This function is nice because we no longer need to type quotation marks. The problem comes when we want to use variables. It's easy to use variables with `paste()`: just don't surround them with quotation marks.
```{r}
name <- "Hadley"
time <- "morning"
paste("Good", time, name)
```
Obviously this doesn't work with `cement()` because every input is automatically quoted:
```{r}
cement(Good, time, name)
```
We need some way to explicitly _unquote_ the input to tell `cement()` to remove the automatic quote marks. Here we need `time` and `name` to be treated differently to `Good`. Quasiquotation gives us a standard tool to do so: `!!`, called "unquote", and pronounced bang-bang. `!!` tells a quoting function to drop the implicit quotes:
```{r}
cement(Good, !!time, !!name)
```
It's useful to compare `cement()` and `paste()` directly. `paste()` evaluates its arguments, so we must quote where needed; `cement()` quotes its arguments, so we must unquote where needed.
```{r, eval = FALSE}
paste("Good", time, name)
cement(Good, !!time, !!name)
```
### Vocabulary
\index{arguments!evaluated vs. quoted}
\index{non-standard evaluation}
The distinction between quoted and evaluated arguments is important:
* An __evaluated__ argument obeys R's usual evaluation rules.
* A __quoted__ argument is captured by the function, and is processed in
some custom way.
`paste()` evaluates all its arguments; `cement()` quotes all its arguments.
If you're ever unsure about whether an argument is quoted or evaluated, try executing the code outside of the function. If it doesn't work or does something different, then that argument is quoted. For example, you can use this technique to determine that the first argument to `library()` is quoted:
```{r, error = TRUE}
# works
library(MASS)
# fails
MASS
```
Talking about whether an argument is quoted or evaluated is a more precise way of stating whether or not a function uses non-standard evaluation (NSE). I will sometimes use "quoting function" as short-hand for a "function that quotes one or more arguments", but generally, I'll talk about quoted arguments since that is the level at which the difference applies.
### Exercises
1. For each function in the following base R code, identify which arguments
are quoted and which are evaluated.
```{r, results = FALSE}
library(MASS)
mtcars2 <- subset(mtcars, cyl == 4)
with(mtcars2, sum(vs))
sum(mtcars2$am)
rm(mtcars2)
```
1. For each function in the following tidyverse code, identify which arguments
are quoted and which are evaluated.
```{r, eval = FALSE}
library(dplyr)
library(ggplot2)
by_cyl <- mtcars %>%
group_by(cyl) %>%
summarise(mean = mean(mpg))
ggplot(by_cyl, aes(cyl, mean)) + geom_point()
```
## Quoting
\index{quoting}
The first part of quasiquotation is quotation: capturing an expression without evaluating it. We'll need a pair of functions because the expression can be supplied directly or indirectly, via lazily-evaluated function argument. I'll start with the rlang quoting functions, then circle back to those provided by base R.
### Capturing expressions
\index{expressions!capturing}
\indexc{expr()}
\index{quoting!expr@\texttt{expr()}}
There are four important quoting functions. For interactive exploration, the most important is `expr()`, which captures its argument exactly as provided:
```{r}
expr(x + y)
expr(1 / 2 / 3)
```
(Remember that white space and comments are not part of the expression, so will not be captured by a quoting function.)
`expr()` is great for interactive exploration, because it captures what you, the developer, typed. It's not so useful inside a function:
```{r}
f1 <- function(x) expr(x)
f1(a + b + c)
```
\indexc{enexpr()}
We need another function to solve this problem: `enexpr()`. This captures what the caller supplied to the function by looking at the internal promise object that powers lazy evaluation (Section \@ref(promises)).
```{r}
f2 <- function(x) enexpr(x)
f2(a + b + c)
```
(It's called "en"-`expr()` by analogy to enrich. Enriching someone makes them richer; `enexpr()`ing a argument makes it an expression.)
To capture all arguments in `...`, use `enexprs()`.
```{r}
f <- function(...) enexprs(...)
f(x = 1, y = 10 * z)
```
Finally, `exprs()` is useful interactively to make a list of expressions:
```{r, results = FALSE}
exprs(x = x ^ 2, y = y ^ 3, z = z ^ 4)
# shorthand for
# list(x = expr(x ^ 2), y = expr(y ^ 3), z = expr(z ^ 4))
```
In short, use `enexpr()` and `enexprs()` to capture the expressions supplied as arguments _by the user_. Use `expr()` and `exprs()` to capture expressions that _you_ supply.
### Capturing symbols
\index{symbols!capturing}
\indexc{ensym()}
Sometimes you only want to allow the user to specify a variable name, not an arbitrary expression. In this case, you can use `ensym()` or `ensyms()`. These are variants of `enexpr()` and `enexprs()` that check the captured expression is either symbol or a string (which is converted to a symbol[^string-symbol]). `ensym()` and `ensyms()` throw an error if given anything else.
[^string-symbol]: This is for compatibility with base R, which allows you to provide a string instead of a symbol in many places: `"x" <- 1`, `"foo"(x, y)`, `c("x" = 1)`.
```{r}
f <- function(...) ensyms(...)
f(x)
f("x")
```
### With base R
\index{expressions!capturing with base R}
\index{quoting!quote@\texttt{quote()}}
Each rlang function described above has an equivalent in base R. They primary difference is that the base equivalents do not support unquoting (which we'll talk about very soon). This make them quoting functions, rather than quasiquoting functions.
The base equivalent of `expr()` is `quote()`:
```{r}
quote(x + y)
```
The base function closest to `enexpr()` is `substitute()`:
```{r}
f3 <- function(x) substitute(x)
f3(x + y)
```
\indexc{alist()}
The base equivalent to `exprs()` is `alist()`:
```{r}
alist(x = 1, y = x + 2)
```
The equivalent to `enexprs()` is an undocumented feature of `substitute()`[^peter-meilstrup]:
```{r}
f <- function(...) as.list(substitute(...()))
f(x = 1, y = 10 * z)
```
[^peter-meilstrup]: Discovered by Peter Meilstrup and described in [R-devel on 2018-08-13](http://r.789695.n4.nabble.com/substitute-on-arguments-in-ellipsis-quot-dot-dot-dot-quot-td4751658.html).
There are two other important base quoting functions that we'll cover elsewhere:
* `bquote()` provides a limited form of quasiquotation, and is discussed in
Section \@ref(base-nonquote).
* `~`, the formula, is a quoting function that also captures the environment.
It's the inspiration for quosures, the topic of the next chapter, and is
discussed in Section \@ref(quosure-impl).
### Substitution
\indexc{substitute()}
You'll most often see `substitute()` used to capture unevaluated arguments. However, as well as quoting, `substitute()` also does "substitution": if you give it an expression, rather than a symbol, it will substitute in the values of symbols defined in the current environment.
```{r}
f4 <- function(x) substitute(x * 2)
f4(a + b + c)
```
I think this makes code hard to understand, because if it is taken out of context, you can't tell if the goal of `substitute(x + y)` is to replace `x`, `y`, or both. If you do want to use `substitute()` for substitution, I recommend that you use the second argument to make your goal clear:
```{r}
substitute(x * y * z, list(x = 10, y = quote(a + b)))
```
### Summary
When quoting (i.e. capturing code), there are two important distinctions:
* Is it supplied by the developer of the code or the user of the code?
I.e. is it fixed (supplied in the body of the function) or varying (supplied
via an argument)?
* Do you want to capture a single expression or multiple expressions?
This leads to a 2 x 2 table of functions for rlang, Table \@ref(tab:quoting-rlang), and base R, Table \@ref(tab:quoting-base).
| | Developer | User |
|------|-----------|-------------|
| One | `expr()` | `enexpr()` |
| Many | `exprs()` | `enexprs()` |
Table: (\#tab:quoting-rlang) rlang quasiquoting functions
| | Developer | User |
|------|-----------|------------------------------|
| One | `quote()` | `substitute()` |
| Many | `alist()` | `as.list(substitute(...()))` |
Table: (\#tab:quoting-base) base R quoting functions
### Exercises
1. How is `expr()` implemented? Look at its source code.
1. Compare and contrast the following two functions. Can you predict the
output before running them?
```{r, results = FALSE}
f1 <- function(x, y) {
exprs(x = x, y = y)
}
f2 <- function(x, y) {
enexprs(x = x, y = y)
}
f1(a + b, c + d)
f2(a + b, c + d)
```
1. What happens if you try to use `enexpr()` with an expression (i.e.
`enexpr(x + y)` ? What happens if `enexpr()` is passed a missing argument?
1. How are `exprs(a)` and `exprs(a = )` different? Think about both the
input and the output.
1. What are other differences between `exprs()` and `alist()`? Read the
documentation for the named arguments of `exprs()` to find out.
1. The documentation for `substitute()` says:
> Substitution takes place by examining each component of the parse tree
> as follows:
>
> * If it is not a bound symbol in `env`, it is unchanged.
> * If it is a promise object (i.e., a formal argument to a function)
> the expression slot of the promise replaces the symbol.
> * If it is an ordinary variable, its value is substituted;
> * Unless `env` is .GlobalEnv in which case the symbol is left
> unchanged.
Create examples that illustrate each of the four different cases.
## Unquoting
\index{unquoting}
\index{quasiquotation}
\index{expressions!unquoting}
So far, you've only seen relatively small advantages of the rlang quoting functions over the base R quoting functions: they have a more consistent naming scheme. The big difference is that rlang quoting functions are actually quasiquoting functions because they can also unquote.
Unquoting allows you to selectively evaluate parts of the expression that would otherwise be quoted, which effectively allows you to merge together ASTs using a template AST. Since base functions don't use unquoting, they instead use a variety of other techniques, which you'll learn about in Section \@ref(base-nonquote).
Unquoting is one inverse of quoting. It allows you to selectively evaluate code inside `expr()`, so that `expr(!!x)` is equivalent to `x`. In Chapter \@ref(evaluation), you'll learn about another inverse, evaluation. This happens outside `expr()`, so that `eval(expr(x))` is equivalent to `x`.
### Unquoting one argument
\indexc{"!"!}
Use `!!` to unquote a single argument in a function call. `!!` takes a single expression, evaluates it, and inlines the result in the AST.
```{r}
x <- expr(a + b + c)
expr(f(!!x, y))
```
I think this is easiest to understand with a diagram. `!!` introduces a placeholder in the AST, shown with dotted borders. Here the placeholder `x` is replaced by an AST, illustrated by a dotted connection.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/quotation/bang-bang.png")
```
As well as call objects, `!!` also works with symbols and constants:
```{r}
a <- sym("y")
b <- 1
expr(f(!!a, !!b))
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/quotation/simple.png")
```
If the right-hand side of `!!` is a function call, `!!` will evaluate it and insert the results:
```{r}
mean_rm <- function(var) {
var <- ensym(var)
expr(mean(!!var, na.rm = TRUE))
}
expr(!!mean_rm(x) + !!mean_rm(y))
```
`!!` preserves operator precedence because it works with expressions.
```{r}
x1 <- expr(x + 1)
x2 <- expr(x + 2)
expr(!!x1 / !!x2)
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/quotation/infix.png")
```
If we simply pasted the text of the expressions together, we'd end up with `x + 1 / x + 2`, which has a very different AST:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/quotation/infix-bad.png")
```
### Unquoting a function
\index{unquoting!functions}
`!!` is most commonly used to replace the arguments to a function, but you can also use it to replace the function itself. The only challenge here is operator precedence: `expr(!!f(x, y))` unquotes the result of `f(x, y)`, so you need an extra pair of parentheses.
```{r}
f <- expr(foo)
expr((!!f)(x, y))
```
This also works when `f` is itself a call:
```{r}
f <- expr(pkg::foo)
expr((!!f)(x, y))
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/quotation/fun.png")
```
Because of the large number of parentheses involved, it can be clearer to use `rlang::call2()`:
```{r}
f <- expr(pkg::foo)
call2(f, expr(x), expr(y))
```
### Unquoting a missing argument {#unquote-missing}
\index{unquoting!missing arguments}
\index{missing arguments!unquoting}
Very occasionally it is useful to unquote a missing argument (Section \@ref(empty-symbol)), but the naive approach doesn't work:
```{r, error = TRUE}
arg <- missing_arg()
expr(foo(!!arg, !!arg))
```
You can work around this with the `rlang::maybe_missing()` helper:
```{r}
expr(foo(!!maybe_missing(arg), !!maybe_missing(arg)))
```
### Unquoting in special forms
\index{unquoting!special forms}
\index{special forms!unquoting}
There are a few special forms where unquoting is a syntax error. Take `$` for example: it must always be followed by the name of a variable, not another expression. This means attempting to unquote with `$` will fail with a syntax error:
```r
expr(df$!!x)
#> Error: unexpected '!' in "expr(df$!"
```
To make unquoting work, you'll need to use the prefix form (Section \@ref(prefix-transform)):
```{r}
x <- expr(x)
expr(`$`(df, !!x))
```
### Unquoting many arguments
\indexc{"!"!"!}
\index{splicing!expressions}
\index{splicing|seealso {"!"!"!}}
\index{unquoting!many arguments}
`!!` is a one-to-one replacement. `!!!` (called "unquote-splice", and pronounced bang-bang-bang) is a one-to-many replacement. It takes a list of expressions and inserts them at the location of the `!!!`:
<!-- GVW: brief note to explain why `!!` can't be made smart enough to do this automatically? -->
```{r}
xs <- exprs(1, a, -b)
expr(f(!!!xs, y))
# Or with names
ys <- set_names(xs, c("a", "b", "c"))
expr(f(!!!ys, d = 4))
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/quotation/bang-bang-bang.png")
```
`!!!` can be used in any rlang function that takes `...` regardless of whether or not `...` is quoted or evaluated. We'll come back to this in Section \@ref(tidy-dots); for now note that this can be useful in `call2()`.
```{r}
call2("f", !!!xs, expr(y))
```
### The polite fiction of `!!`
So far we have acted as if `!!` and `!!!` are regular prefix operators like `+` , `-`, and `!`. They're not. From R's perspective, `!!` and `!!!` are simply the repeated application of `!`:
```{r}
!!TRUE
!!!TRUE
```
`!!` and `!!!` behave specially inside all quoting functions powered by rlang, where they behave like real operators with precedence equivalent to unary `+` and `-`. This requires considerable work inside rlang, but means that you can write `!!x + !!y` instead of `(!!x) + (!!y)`.
The biggest downside[^bang-bang-print] to using a fake operator is that you might get silent errors when misusing `!!` outside of quasiquoting functions. Most of the time this is not an issue because `!!` is typically used to unquote expressions or quosures. Since expressions are not supported by the negation operator, you will get an argument type error in this case:
[^bang-bang-print]: Prior to R 3.5.1, there was another major downside: the R deparser treated `!!x` as `!(!x)`. This is why in old versions of R you might see extra parentheses when printing expressions. The good news is that these parentheses are not real and can be safely ignored most of the time. The bad news is that they will become real if you reparse that printed output to R code. These roundtripped functions will not work as expected since `!(!x)` does not unquote.
```{r, error = TRUE}
x <- quote(variable)
!!x
```
But you can get silently incorrect results when working with numeric values:
```{r}
df <- data.frame(x = 1:5)
y <- 100
with(df, x + !!y)
```
Given these drawbacks, you might wonder why we introduced new syntax instead of using regular function calls. Indeed, early versions of tidy evaluation used function calls like `UQ()` and `UQS()`. However, they're not really function calls, and pretending they are leads to a misleading mental mode. We chose `!!` and `!!!` as the least-bad solution:
* The are visually strong and don't look like existing syntax. When you
see `!!x` or `!!!x` it's clear that something unusual is happening.
* They overrides a rarely used piece of syntax, as double negation is not a
common pattern in R[^js-double-neg]. If you you do need it, you can just
add parentheses `!(!x)`.
[^js-double-neg]: Unlike, say, javascript, where `!!x` is a commonly used shortcut to convert an integer into a logical.
### Non-standard ASTs {#non-standard-ast}
\index{ASTs!non-standard}
With unquoting, it's easy to create non-standard ASTs, i.e. ASTs that contain components that are not expressions. (It is also possible to create non-standard ASTs by directly manipulating the underlying objects, but it's harder to do so accidentally.) These are valid, and occasionally useful, but their correct use is beyond the scope of this book. However, it's important to learn about them, because they can be deparsed, and hence printed, in misleading ways.
For example, if you inline more complex objects, their attributes are not printed. This can lead to confusing output:
```{r}
x1 <- expr(class(!!data.frame(x = 10)))
x1
eval(x1)
```
You have two main tools to reduce this confusion: `rlang::expr_print()` and `lobstr::ast()`:
```{r}
expr_print(x1)
lobstr::ast(!!x1)
```
Another confusing case arises if you inline an integer sequence:
```{r}
x2 <- expr(f(!!c(1L, 2L, 3L, 4L, 5L)))
x2
expr_print(x2)
lobstr::ast(!!x2)
```
It's also possible to create regular ASTs that can not be generated from code because of operator precedence. In this case, R will print parentheses that do not exist in the AST:
```{r}
x3 <- expr(1 + !!expr(2 + 3))
x3
lobstr::ast(!!x3)
```
### Exercises
1. Given the following components:
```{r}
xy <- expr(x + y)
xz <- expr(x + z)
yz <- expr(y + z)
abc <- exprs(a, b, c)
```
Use quasiquotation to construct the following calls:
```{r, eval = FALSE}
(x + y) / (y + z)
-(x + z) ^ (y + z)
(x + y) + (y + z) - (x + y)
atan2(x + y, y + z)
sum(x + y, x + y, y + z)
sum(a, b, c)
mean(c(a, b, c), na.rm = TRUE)
foo(a = x + y, b = y + z)
```
1. The following two calls print the same, but are actually different:
```{r}
(a <- expr(mean(1:10)))
(b <- expr(mean(!!(1:10))))
identical(a, b)
```
What's the difference? Which one is more natural?
## Non-quoting {#base-nonquote}
\indexc{bquote()}
\index{unquoting!base R}
Base R has one function that implements quasiquotation: `bquote()`. It uses `.()` for unquoting:
```{r}
xyz <- bquote((x + y + z))
bquote(-.(xyz) / 2)
```
`bquote()` isn't used by any other function in base R, and has had relatively little impact on how R code is written. There are three challenges to effective use of `bquote()`:
* It is only easily used with your code; it is hard to apply it to arbitrary
code supplied by a user.
* It does not provide an unquote-splice operator that allows you to unquote
multiple expressions stored in a list.
* It lacks the ability to handle code accompanied by an environment, which
is crucial for functions that evaluate code in the context of a data frame,
like `subset()` and friends.
Instead functions that quote an argument use some other technique to allow indirect specification. Rather than using use unquoting all base R approaches selectively turn quoting off, so I call them __non-quoting__ techniques.
```{r, eval = FALSE, include = FALSE}
call <- names(pryr::find_uses("package:base", "match.call"))
subs <- names(pryr::find_uses("package:base", "substitute"))
eval <- names(pryr::find_uses("package:base", "eval"))
intersect(subs, eval)
```
There are four basic forms seen in base R:
* A pair of quoting and non-quoting functions. For example, `$` has two
arguments, and the second argument is quoted. This is easier to see if you
write in prefix form: `mtcars$cyl` is equivalent to `` `$`(mtcars, cyl) ``.
If you want to refer to a variable indirectly, you use `[[`, as it
takes the name of a variable as a string.
```{r}
x <- list(var = 1, y = 2)
var <- "y"
x$var
x[[var]]
```
There are three other quoting functions closely related to `$`: `subset()`,
`transform()`, and `with()`. These are seen as wrappers around `$` only
suitable for interactive use so they all have the same non-quoting
alternative: `[`
`<-`/`assign()` and `::`/`getExportedValue()` work similarly to `$`/`[`.
\indexc{\$}
\indexc{<-}
* A pair of quoting and non-quoting arguments. For example, `rm()` allows
you to provide bare variable names in `...`, or a character vector of
variable names in `list`:
```{r}
x <- 1
rm(x)
y <- 2
vars <- c("y", "vars")
rm(list = vars)
```
`data()` and `save()` work similarly.
\indexc{rm()}
* An argument that controls whether a different argument is quoting or
non-quoting. For example, in `library()`, the `character.only` argument
controls the quoting behaviour of the first argument, `package`:
```{r, message = FALSE}
library(MASS)
pkg <- "MASS"
library(pkg, character.only = TRUE)
```
`demo()`, `detach()`, `example()`, and `require()` work similarly.
\indexc{library()}
* Quoting if evaluation fails. For example, the first argument to `help()`
is non-quoting if it evaluates to a string; if evaluation fails, the
first argument is quoted.
```{r, eval = FALSE}
# Shows help for var
help(var)
var <- "mean"
# Shows help for mean
help(var)
var <- 10
# Shows help for var
help(var)
```
`ls()`, `page()`, and `match.fun()` work similarly.
\indexc{help()}
\indexc{lm()}
Another important class of quoting functions are the base modelling and plotting functions, which follow the so-called standard non-standard evaluation rules: <http://developer.r-project.org/nonstandard-eval.pdf>. For example, `lm()` quotes the `weight` and `subset` arguments, and when used with a formula argument, the plotting function quotes the aesthetic arguments (`col`, `cex`, etc):
```{r}
palette(RColorBrewer::brewer.pal(3, "Set1"))
plot(
Sepal.Length ~ Petal.Length,
data = iris,
col = Species,
pch = 20,
cex = 2
)
```
These functions have no built-in options for indirect specification, but you'll learn how to simulate unquoting in Section \@ref(base-nonquote).
## Dot-dot-dot (`...`) {#tidy-dots}
\indexc{...}
\index{tidy dots}
<!-- GVW: this seems a long way away from the introduction of `!!!` earlier - move this up above non-quoting in base R? -->
`!!!` is useful because it's not uncommon to have a list of expressions that you want to insert into a call. It turns out that this pattern is common elsewhere. Take the following two motivating problems:
* What do you do if the elements you want to put in `...` are already stored
in a list? For example, imagine you have a list of data frames that
you want to `rbind()` together:
```{r}
dfs <- list(
a = data.frame(x = 1, y = 2),
b = data.frame(x = 3, y = 4)
)
```
You could solve this specific case with `rbind(dfs$a, dfs$b)`, but how
do you generalise that solution to a list of arbitrary length?
* What do you do if you want to supply the argument name indirectly? For
example, imagine you want to create a single column data frame where
the name of the column is specified in a variable:
```{r}
var <- "x"
val <- c(4, 3, 9)
```
In this case, you could create a data frame and then change names
(i.e. `setNames(data.frame(val), var)`), but this feels inelegant.
How can we do better?
One way to think about these problems is to draw explicit parallels to quasiquotation:
* Row-binding multiple data frames is like unquote-splicing: we want to inline
individual elements of the list into the call:
```{r}
dplyr::bind_rows(!!!dfs)
```
When used in this context, the behaviour of `!!!` is known as "spatting" in
Ruby, Go, PHP, and Julia. It is closely related to `*args` (star-args) and
`**kwarg` (star-star-kwargs) in Python, which are sometimes called argument
unpacking.
\index{splicing}
* The second problem is like unquoting the left-hand side of `=`: rather
than interpreting `var` literally, we want to use the value stored in the
variable called `var`:
```{r}
tibble::tibble(!!var := val)
```
Note the use of `:=` (pronounced colon-equals) rather than `=`. Unfortunately
we need this new operation because R's grammar does not allow expressions as
argument names:
```{r, eval = FALSE}
tibble::tibble(!!var = value)
#> Error: unexpected '=' in "tibble::tibble(!!var ="
```
`:=` is like a vestigial organ: it's recognised by R's parser, but it
doesn't have any code associated with it. It looks like an `=` but allows
expressions on either side, making it a more flexible alternative to `=`.
It is used in data.table for similar reasons.
\indexc{:=}
<!-- GVW: I think `:=` needs/deserves more than a fly-by in a bullet point... -->
Base R takes a different approach, which we'll come back to Section \@ref{do-call}.
We say functions that support these tools, without quoting arguments, have __tidy dots__[^tidy-dots]. To gain tidy dots behaviour in your own function, all you need to do is use `list2()`.
[^tidy-dots]: This is admittedly not the most creative of names, but it clearly suggests it's something that has been added to R after the fact.
### Examples
\index{attributes!attributes@\texttt{attributes()}}
One place we could use `list2()` is to create a wrapper around `attributes()` that allows us to set attributes flexibly:
```{r}
set_attr <- function(.x, ...) {
attr <- rlang::list2(...)
attributes(.x) <- attr
.x
}
attrs <- list(x = 1, y = 2)
attr_name <- "z"
1:10 %>%
set_attr(w = 0, !!!attrs, !!attr_name := 3) %>%
str()
```
### `exec()`
\indexc{exec()}
\indexc{list2()}
What if you want to use this technique with a function that doesn't have tidy dots? One option is to use `rlang::exec()` to call a function with some arguments supplied directly (in `...`) and others indirectly (in a list):
```{r}
# Directly
exec("mean", x = 1:10, na.rm = TRUE, trim = 0.1)
# Indirectly
args <- list(x = 1:10, na.rm = TRUE, trim = 0.1)
exec("mean", !!!args)
# Mixed
params <- list(na.rm = TRUE, trim = 0.1)
exec("mean", x = 1:10, !!!params)
```
`rlang::exec()` also makes it possible to supply argument names indirectly:
```{r}
arg_name <- "na.rm"
arg_val <- TRUE
exec("mean", 1:10, !!arg_name := arg_val)
```
And finally, it's useful if you have a vector of function names or a list of functions that you want to call with the same arguments:
```{r}
x <- c(runif(10), NA)
funs <- c("mean", "median", "sd")
purrr::map_dbl(funs, exec, x, na.rm = TRUE)
```
`exec()` is closely related to `call2()`; where `call2()` returns an expression, `exec()` evaluates it.
### `dots_list()`
\indexc{dots\_list()}
`list2()` provides one other handy feature: by default it will ignore any empty arguments at the end. This is useful in functions like `tibble::tibble()` because it means that you can easily change the order of variables without worrying about the final comma:
```{r, results = FALSE}
# Can easily move x to first entry:
tibble::tibble(
y = 1:5,
z = 3:-1,
x = 5:1,
)
# Need to remove comma from z and add comma to x
data.frame(
y = 1:5,
z = 3:-1,
x = 5:1
)
```
`list2()` is a wrapper around `rlang::dots_list()` with defaults set to the most commonly used settings. You can get more control by calling `dots_list()` directly:
* `.ignore_empty` allows you to control exactly which arguments are ignored.
The default ignores a single trailing argument to get the behaviour
describe above, but you can choose to ignore all missing arguments, or
no missing arguments.
* `.homonoyms` controls what happens if multiple arguments use the same name:
```{r, error = TRUE}
str(dots_list(x = 1, x = 2))
str(dots_list(x = 1, x = 2, .homonyms = "first"))
str(dots_list(x = 1, x = 2, .homonyms = "last"))
str(dots_list(x = 1, x = 2, .homonyms = "error"))
```
* If there are empty arguments that are not ignored, `.preserve_empty`
controls what to do with them. The default throws an error; setting
`.preserve_empty = TRUE` instead returns missing symbols. This is useful
if you're using `dots_list()` to generate function calls.
### With base R {#do-call}
\index{splicing!base R}
\indexc{do.call()}
Base R provides a Swiss army knife to solve these problems: `do.call()`. `do.call()` has two main arguments. The first argument, `what`, gives a function to call. The second argument, `args`, is a list of arguments to pass to that function, and so `do.call("f", list(x, y, z))` is equivalent to `f(x, y, z)`.
* `do.call()` gives a straightforward solution to `rbind()`ing together many
data frames:
```{r}
do.call("rbind", dfs)
```
* With a little more work, we can use `do.call()` to solve the second problem.
We first create a list of arguments, then name that, then use `do.call()`:
```{r}
args <- list(val)
names(args) <- var
do.call("data.frame", args)
```
Some base functions (including `interaction()`, `expand.grid()`, `options()`, and `par()`) use a trick to avoid `do.call()`: if the first component of `...` is a list, they'll take its components instead of looking at the other elements of `...`. The implementation looks something like this:
```{r}
f <- function(...) {
dots <- list(...)
if (length(dots) == 1 && is.list(dots[[1]])) {
dots <- dots[[1]]
}
# Do something
...
}
```
Another approach to avoiding `do.call()` is found in the `RCurl::getURL()` function written by Duncan Temple Lang. `getURL()` takes both `...` and `.opts` which are concatenated together. This looks something like this:
```{r}
f <- function(..., .dots) {
dots <- c(list(...), .dots)
# Do something
}
```
At the time I discovered it, I found this technique particularly compelling so you can see it used throughout the tidyverse. Now, however, I prefer the approach described next.
### Exercises
1. One way to implement `exec()` is shown below. Describe how it works. What are the
key ideas?
```{r}
exec <- function(f, ..., .env = caller_env()) {
args <- list2(...)
do.call(f, args, envir = .env)
}
```
1. Carefully read the source code for `interaction()`, `expand.grid()`, and
`par()`. Compare and contrast the techniques they use for switching
between dots and list behaviour.
1. Explain the problem with this definition of `set_attr()`
```{r, error = TRUE}
set_attr <- function(x, ...) {
attr <- rlang::list2(...)
attributes(x) <- attr
x
}
set_attr(1:10, x = 10)
```