-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04_efficient-code.qmd
671 lines (476 loc) · 29.4 KB
/
04_efficient-code.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
---
title: "Best practice for writing efficient code"
---
We've had a look at ways to benchmark your code and identify bottlenecks. When making optimisation decisions, it's always best to use theses tools to be get hard data on how your code is likely to perform. Having said that, there are some **golden rules** in R programming that can help you write performant code from the start and help guide the search for more efficient solutions when bottlenecks are identified.
In the next few sections, we'll have a look at some key best practices for writing efficient R code. A characteristic of R is that there are **many many** ways to do the same thing. We will look at a lot of examples here that might seem like a strange way to do things, and many are! However, it's often quite instructive to benchmark and memory profile such code to make what's going on more concrete.
Let's start with one of the most pernicious drain on performance which should be avoided at all costs, growing data structures!
## Avoid growing data structures
Let's start with an example vector `x` of 1000 random numbers between 1 and 100.
```{r}
set.seed(20230116)
x <- runif(10000, min = 1, max = 100)
```
Let's say we want to process our vector with a simple algorithm. If the value of `x` is less that 10, recode to 0. Otherwise, take the square root.
```{r}
#| output: false
library(dplyr)
library(pryr)
library(ggplot2)
```
### The perils of growing data structures
The most basic implementation we might think of is a `for` loop that processes each element individually, using an `if` statement to decide what action to take depending on the value of the element and appending the processed output of each iteration using the `c()` function, element by element, to another vector `y`.
Let's create a function to implement it:
```{r}
for_loop_c <- function(x) {
y <- NULL
for (i in seq_along(x)) {
if (x[i] < 10) {
y <- c(y, 0)
} else {
y <- c(y, sqrt(x[i]))
}
}
y
}
```
Let's go ahead and make a starting benchmark:
```{r}
bench::mark(
for_loop_c(x)
)
```
It might not seem obvious until we start comparing it to other approaches but this is actually very slow for processing a numeric vector of length `r length(x)`. Notice as well how much memory is allocated to the execution, especially compared to the size of the original vector!
```{r}
object_size(x)
```
The biggest problem with this implementation, and the reason for loops are so often (wrongly) maligned for inefficiency is the fact that we are incrementally growing the result vector `y` using function `c`.
To dig into the issue let's contrast it to another approach for growing vectors, using sub-assignment.
In the following example, we're doing the same processing to each element but instead of binding it to `y` using `c()`, we use sub-assignment to increment the length of `y` directly.
```{r}
for_loop_assign <- function(x) {
y <- NULL
for (i in seq_along(x)) {
if (x[i] < 10) {
y[i] <- 0
} else {
y[i] <- sqrt(x[i])
}
}
y
}
```
Let's compare the two approaches with a benchmark.
```{r}
bench::mark(
for_loop_c(x),
for_loop_assign(x)
)
```
That's a big difference! We're still growing the vector element by element in `for_loop_assign`, but it's much faster than using the `c()` function. It's also much more memory efficient! Why?
Well, to get to the bottom of this, I actually ended up posting a [question on stack overflow](https://stackoverflow.com/questions/75321441/performance-difference-between-growing-vectors-using-c-versus-assignment-in-r). Let's investigate with an experiment. I've created adapted versions of the functions that perform a simplified version of the element-wise processing but also record the cumulative number of memory address changes at each iteration.
If the address of `y` in any given iteration is not the same as the previous one, the cumulative count of memory addresses that `y` has occupied during execution is increased. Results are then compiled into a data.frame and cumulative counts and returned instead of `y`. So the functions now track the number memory addresses changed during our for loop.
```{r}
# Create function that appends to result vector through c
# Collect cumulative number of address changes per iteration
for_loop_c_addr <- function(x, count_addr = TRUE) {
y <- NULL
y_addr <- address(y)
cum_address_n <- 0
cum_address_n_v <- numeric(length(x))
for (i in seq_along(x)) {
y <- c(y, sqrt(x[i]))
if (address(y) != y_addr) {
cum_address_n <- cum_address_n + 1
y_addr <- address(y)
}
cum_address_n_v[i] <- cum_address_n
}
data.frame(i = seq_along(cum_address_n_v),
cum_address_n = cum_address_n_v,
mode = "c")
}
# Create function that appends to result vector through assignment.
# Collect cumulative number of address changes per iteration
for_loop_assign_addr <- function(x) {
y <- NULL
y_addr <- address(y)
cum_address_n <- 0
cum_address_n_v <- numeric(length(x))
for (i in seq_along(x)) {
y[i] <- sqrt(x[i])
if (address(y) != y_addr) {
cum_address_n <- cum_address_n + 1
y_addr <- address(y)
}
cum_address_n_v[i] <- cum_address_n
}
data.frame(i = seq_along(cum_address_n_v),
cum_address_n = cum_address_n_v,
mode = "assign")
}
## Execute function, compile results and plot
c_df <- for_loop_c_addr(x)
assign_df <- for_loop_assign_addr(x)
```
We can now explore our results by are plotting the cumulative count of address changes against `i`.
Let's have a look at what happens during sub-assignment:
```{r}
assign_df |>
ggplot(aes(x = i, y = cum_address_n, colour = mode)) +
geom_line() +
ylab("Cumulative count of address changes")
```
We see that, although address changes throughout our for loop as a result of memory requests to accommodate the growing vectors, this does not occur at every iteration, in fact it happens a total of `r max(assign_df$cum_address_n)` times. In between these events, `y` is being grown **in-place** without needing to change address.
This occurs because, since version 3.4.0, **R will allocate a bit of extra memory for atomic vectors when requesting additional memory, so that 'growing' such vectors via sub-assignment may not require a reallocation if some spare capacity is still available**, and indeed that's what we see in our experiment.
Let's know plot it against our results for the `c()` approach:
```{r}
rbind(c_df, assign_df) |>
ggplot(aes(x = i, y = cum_address_n, colour = mode)) +
geom_line() +
ylab("Cumulative count of address changes")
```
When using `c(),` R must first allocate space for the new `y` created by the output of `c()` and then copy the inputs to `c()` (i.e. previous `y` and the additional value being appended) object to its new memory location. This results in a change of address at every iteration. Because requests for additional memory are costly, using `c()` to grow a results vector is so much slower and memory inefficient.
The same is true for functions like [`c()`](https://rdrr.io/r/base/c.html), [`append()`](https://rdrr.io/r/base/append.html), [`cbind()`](https://rdrr.io/r/base/cbind.html), [`rbind()`](https://rdrr.io/r/base/cbind.html), or `paste()` so avoid using them in a for loop at all costs.
### Pre-allocating data structures
The best way to improve the performance of our algorithm is to pre allocate a results vector of the required size and then assign our results to elements that already exist. This allows for R to allocate the required amount for `y` once and then modify it in place without creating copies at every iteration.
Below I create two functions that do just that. The first one, `for_loop_preallocate_NA()` creates a vector of `NA`s the same length as `x` whereas the second function `for_loop_preallocate_num()` creates a numeric vector.
```{r}
for_loop_preallocate_NA <- function(x) {
y <- rep(NA, times = length(x))
for (i in seq_along(x)) {
if (x[i] < 10) {
y[i] <- 0
} else {
y[i] <- sqrt(x[i])
}
}
y
}
for_loop_preallocate_num <- function(x) {
y <- numeric(length = length(x))
for (i in seq_along(x)) {
if (x[i] < 10) {
y[i] <- 0
} else {
y[i] <- sqrt(x[i])
}
}
y
}
```
Let's see what happens to our benchmarks when we pre-allocate the results vector and let's now look at it in relative terms by using `mark()` with argument `relative = TRUE`
```{r}
bench::mark(
for_loop_c(x),
for_loop_assign(x),
for_loop_preallocate_NA(x),
for_loop_preallocate_num(x),
relative = TRUE
)
```
We can see a huge improvement in performance if we pre-allocate `y` compared to using `c()` and this is also reflected in the huge drop in memory allocation for the operation.
We also see a very minor speed up and smaller memory foot print by preallocating the appropriate type of vector. This is because in `for_loop_preallocate_num()` R doesn't have to coerce `y` to the appropriate numeric in the first iteration as it does in `for_loop_preallocate_NA()`, an operation that requires making a copy of the vector.
Another interesting outcome of our benchmarking experiments though is that, since R 3.4.0, **growing vectors through sub-assignment can be a reasonable approach if there is no way to know what size object to pre-allocate.**
## Vectorise as much as possible
Another popular mantra with respect to efficient programming in R to ***"Vectorise as much as possible"*****.** While it's indeed another golden rule, it can be a misunderstood concept as I've often heard it translated to *"Avoid loops, use the apply family instead"* which, as you will see, is not actually quite accurate, at least not in many situations.
Let's for now play along with that interpretation and explore performance of creating custom function `cond_sqrt_i()` that processes single values and then using some `apply` and `purrr` functions (the tidyverse's answer to the apply family) to apply the function to each element of our vector in one go.
I've wrapped two implementations from each family of functions for ease and fairness of comparison to our for loop approaches:
- `lapply_unlist_fun()` uses `lapply` and `unlist` to return a vector of values.
- `sapply_fun()` uses `sapply` which returns a vector.
- `purrr_map_fun` uses the `map` function from the `purrr` package and `unlist` to return a vector.
- `purrr_map_dbl_fun` uses the `map_dbl` function fro the `purrr` package that returns a double vector by default.
```{r}
cond_sqrt_i <- function(x) {
if (x < 10) {
return(0)
} else {
return(sqrt(x))
}
}
lapply_unlist_fun <- function(x) {
unlist(lapply(x, cond_sqrt_i))
}
sapply_fun <- function(x) {
sapply(x, cond_sqrt_i)
}
purrr_map_fun <- function(x) {
unlist(purrr::map(x, cond_sqrt_i))
}
purrr_map_dbl_fun <- function(x) {
purrr::map_dbl(x, cond_sqrt_i)
}
```
Let's go ahead and add these approaches to our benchmarks. I use the `dplyr` function arrange at the bottom to order the results of our benchmarks by ascending values in median.
```{r}
bench::mark(
for_loop_c(x),
for_loop_assign(x),
for_loop_preallocate_NA(x),
for_loop_preallocate_num(x),
lapply_unlist_fun(x),
sapply_fun(x),
purrr_map_fun(x),
purrr_map_dbl_fun(x),
relative = TRUE
) %>%
arrange(median)
```
Perhaps the results of this contrived test were surprising, perhaps not. What is clear though is that **vectorising your code is not just about avoiding for loops**, although that's often a step.
Yes, **vectorised code is more compact and concise**, which our latest implementations indeed are. But in the `*apply` and `purrr` approaches we tried above, we are not embracing the true concept of vectorisation because we are still having to call `if` on every element of `x` and `sqrt` on *many* of the element of `x`. This is pretty much what our original for loop which preallocates the results vector is doing, without the overhead of calling an `*apply`/`purrr` function as well.
Vectorising is **about taking a whole-object approach to a problem**, thinking about vectors, not individual elements of a vector. And to achieve this, we must harness the power of the fact that **many many functions in R are themselves vectorised** and can take whole vectors as inputs, not just single values, operate on them efficiently and return vectors with a single function call.
Here's a short list of examples of vectorised operators and functions in R.
- Logical operators: `>`, `<`, `==`, `!=` etc
- Arithmetic operators: `+`, `-`, `/`, `*` etc.
- Replacement and indexing operators: `[[<-`, `[<-` , `[[`, `[`
- Mathematical functions: `log()`, `log10()`, `exp()`, and, fortunately for our example **`sqrt()`**!
Under the hood, vectorised functions also implement for loops, but the are performed by C code instead of R code, making them much faster.
Given this, let's try two more approaches embracing the vectorised nature of (many) R functions:
First, let's write a function that uses the vectorised `ifelse()` function. The function takes a logical vector as it's first argument `tests`, here created by the comparative statement `x < 10` and returns the result of any expression in the second argument `yes` where elements of `test` are `TRUE` and the result of any expression in the third argument `no` where elements of `test` are `FALSE`.
```{r}
ifelse_fun <- function(x) {
ifelse(x < 10, 0, sqrt(x))
}
```
Let's finally try one last approach making use of both vectorisation and a mathematical trick. We know that the square root of 0 is also zero. With that in mind, we can process `x` in two vectorised steps, one to convert all values in `x` \< 10 to 0 and then use a single call to `sqrt()` on the whole vector to get the square root (which will only change the value of non 0 elements).
::: callout-tip
**Bonus Tip**
Sometimes the most effective way to optimise an algorithm or operation is to things carefully about the underlying maths and attempt to optimise through mathematical tricks.
:::
```{r}
cond_sqrt_vctr <- function(x) {
x[x < 10] <- 0
sqrt(x)
}
```
Let's do a final benchmark comparing all the approaches we've tried:
```{r}
bench::mark(
for_loop_c(x),
for_loop_assign(x),
for_loop_preallocate_NA(x),
for_loop_preallocate_num(x),
lapply_unlist_fun(x),
sapply_fun(x),
purrr_map_fun(x),
purrr_map_dbl_fun(x),
ifelse_fun(x),
cond_sqrt_vctr(x),
relative = TRUE
) %>%
arrange(median)
```
Here we can see that the clear winner in terms of speed, function `cond_sqrt_vctr()` which harnesses the power of vectorisation most effectively. It's faster that `ifelse` because, although `ifelse` is vectorised, if we look at the source code we see that a big chunk of the body is taken up on performing checks. It also wraps execution of the `yes` and `no` expressions in `rep()` to deal with situations where the expressions themselves cannot be vectorised which adds an additional function call.
::: callout-important
#### Take aways:
- Avoid growing data structures using functions like `c()`, `rbind()`, `cbind()`, `append()`, `paste()` etc at all costs.
- For loops are not always inefficient!
- Pre-allocate memory for results data structures. If that's not possible, growing vectors through sub-assignment can be a reasonable option.
- Vectorisation is not about replacing for loops with apply functions. It's about working with entire vectors with a single function call.
:::
::: callout-note
Vectorised operators work on vectors and derivatives like matrices and arrays. **They do not however work on lists or more complex objects**.
The `apply` and `purrr` families are designed to operate on lists and are therefore an important part of R programming. Ultimately though all data structures in R boil down to some sort of vector. So it's to keep in that in mind and write code that can operate efficiently on vectors, even if that is then going to be applied to a bunch of vectors held in a list or other more complicated S3 or S4 class object.
:::
## Use functions & operators implemented in C as much as possible
In base R, C code compiled into R at build time can be **called directly in what are termed *primitives*** *or via the* `.Internal` inteface.Primitives and Internals are implemented in C, they are generally fast.
Primitives are available to users to call directly. When writing code, a good rule of thumb as to get down to primitive functions with as few function calls as possible.
You can check whether a function is primitive with function `is.primitive()`
```{r}
is.primitive(switch)
is.primitive(abs)
is.primitive(ceiling)
is.primitive(`>`)
```
For more on this topic, check out the chapter on [internal vs primitive functions](https://cran.r-project.org/doc/manuals/r-release/R-ints.html#g_t_002eInternal-vs-_002ePrimitive) in the R Internals documentation.
### Detecting Internals
However while use of `.Internal()` to access underlying C functions is considered risky and for expert use only
(`?.Internal` includes the following in it's function description! 😜
> Only true **R** wizards should even consider using this function, and only **R** developers can add to the list of internal functions.
)
it's nonetheless good to be aware of which functions use .Internals and though implement
For example, when we check whether function `mean` is primitive, we get the `FALSE`.
```{r}
is.primitive(mean)
```
Digging a little deeper into the `mean.default` source code, however, we see that, after some input validation and handling, the function calls `.Internal(mean())` , i.e. the algorithm itself is executed using C.
```{r}
mean.default
```
To get a list of functions and operators implemented in C you can use `pryr::names_c()` .
```{r}
head(pryr::names_c())
```
This basically prints table `R_FunTab`, a table maintained by R of R function names and corresponding C functions to call (found in R source code in file [/src/main/names.c](https://github.com/wch/r-source/blob/trunk/src/main/names.c)).
Here's an interactive version you can use to search for functions by name to check whether they are implemented in C.
```{r}
#| echo: false
DT::datatable(pryr::names_c())
```
The above discussion relates to base R functions that call C. Many packages also include and call C or even Fortran functions of their own. This can be identified by calls to `.C()`, `.Call()` or `.External()` (or `.Fortran()` for the Fortran interface. In addition, package `Rcpp` has made writing function in C++ and incorporating them into R package comparatively much simpler. So checking [`Rcpp`'s reverse dependencies](https://cran.r-project.org/web/packages/Rcpp/index.html) can also point to packages including underlying C++ whihc will most likely be much more performant.
## Return early
Often, the ability of a function to return a meaningful result depends on the inputs and the environment. If some of those preconditions are not met, the function returns some signal value (e.g. `NULL`, `NA`) or an error code or throws an exception.
**Return early** is the way of writing functions or methods so that the **expected positive result** is returned at the end of the function and the rest of the code terminates the execution (by returning or throwing an exception) when conditions are not met.
It is most often advocated as a way of avoiding long, nested and convoluted `if` & `else` statements but, can also improve efficiency if unnecessary computations can be avoided all together.
Consider the following example function. It takes three inputs, `x`, `y` & `z`, and each input needs to undergo some coumputationally costly processing (represented here by the `Sys.sleep()` calls) before the processed inputs can be retuned:
```{r}
fun_no_early_exit <- function(x, y, z) {
# Expensive processing done to input x
Sys.sleep(1)
# Expensive processing done to input y
Sys.sleep(1)
# Expensive processing done to input y
Sys.sleep(1)
# Aggregation performed on processed inputs
sum(x, y, z)
}
```
```{r}
fun_no_early_exit(1, 2, 3)
```
However, the function requires that none of the inputs are `NA` to return a meaningful value. If any of the values are `NA`, it is propagated through the processing and the final aggregation returns `NA`. It does mean that processing on all inputs still gets carried out unnecessarily.
```{r}
fun_no_early_exit(1, 2, NA)
```
So let's write another function that checks inputs at the top of the function. If any are `NA`, it returns `NA` straight away. Otherwise, computation proceeds.
```{r}
fun_early_exit <- function(x, y, z) {
# Check that no inputs are `NA`
if (any(is.na(c(x, y, z)))) {
return(NA_real_)
}
# Expensive processing done to input x
Sys.sleep(1)
# Expensive processing done to input y
Sys.sleep(1)
# Expensive processing done to input y
Sys.sleep(1)
# Aggregation performed on processed inputs
sum(x, y, z)
}
```
Let's first benchmark the two functions using valid inputs:
```{r}
bench::mark(
fun_no_early_exit(1, 2, 3),
fun_early_exit(1, 2, 3)
)
```
Let's now check using an invalid input:
```{r}
bench::mark(
fun_no_early_exit(1, 2, NA),
fun_early_exit(1, 2, NA)
)
```
The function that returns early is much more efficient as it avoids all the costly but ultimately pointless computation on both valid and invalid inputs. The difference in performance is proportionate to the execution time of the computations that can be skipped.
Having said that, we could even move the checks to outside the function, thus saving on the function call all together.
```{r}
x <- 1
y <- 2
z <- NA
bench::mark(
check_before = {
if (any(is.na(c(x, y, z)))) {
res <- NA_real_
} else {
res <- fun_no_early_exit(1, 2, NA)
}
},
check_in_fun = {
res <- fun_early_exit(1, 2, NA)
}
)
```
This is a very contrived example and the performance difference puts it in the range of not just micro- but nano-optimisation. The point is that the earlier you can establish the validity (or in our case the invalidity) of inputs you are working with, the more you can save on unnecessary computation.
::: callout-important
While not evident in this example, checks themselves do add to execution times, dependant on their complexity. So, when performance is an issue, checks need to be well considered and deployed in a targeted manner. For example, If you are writing a package with an exported function the might call deep stacks of internal, non user-facing functions, it's costly to be checking the inputs in every level of the call stack. Instead, it's better to be doing as many runtime checks as possible at the top of the stack, checking the inputs to exported functions, and **use your testing suite under various conditions to ensure that the top level checks are enough to propagate valid inputs to the internal functions.**
:::
## Use memoisation
Memoisation is an optimisation technique that can make code execution more efficient and hence faster. It involves storing the computational results of an operation, e.g. a function call, in cache, and retrieving that same information from the cache the next time the function is called with the same arguments instead of re-computing it.
The package `memoise` allows use to apply memoisation to functions in R which can lead to significant speed ups on code that results in calling a given function multiple times with the same arguments.
Let's have a look at an example that demonstrates the usefulness of memoisation.
The following function, taken from the `memoise` package documentation, calculates Fibonacci numbers.
Fibonacci numbers form a sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones.
The sequence below represents the first 12 numbers in the Fibonacci sequence and `144` represents the 12th Fibonacci number
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144
```{r}
fib <- function(n) {
if (n < 2) {
return(n)
} else {
return(fib(n - 1) + fib(n - 2))
}
}
```
So function `fib()` takes an argument `n` and returns the $n^{th}$ by recursively calculating the sum of the two preceding Fibonacci numbers.
```{r}
fib(1)
fib(12)
fib(30)
```
It thus effectively generates a full Fibonacci sequence up to the $n^{th}$ Fibonacci number every time it executes and calculates the value of the same number twice `n - 1` times.
This recursiveness means the functions gets progressively slower with as `n` increases.
```{r}
bench::press(
n = c(10, 20, 30),
bench::mark(fib(n), check = FALSE)
)
```
Now let's create a memoised version of the function. To be able to compare them side by side, we will first create another version of `fib()`, `fib_mem()` and then use function `memoise::memoise()` to to memoise it. *(Note that because the function is recursive, we need to change the function name being called within the function itself as well).*
Once memoised, the function will cache the results of executions each time it's called with new argument values in an in memory cache but look up the results of calls with arguments it's already executed.
```{r}
fib_mem <- function(n) {
if (n < 2) {
return(n)
} else {
return(fib_mem(n - 1) + fib_mem(n - 2))
}
}
fib_mem <- memoise::memoise(fib_mem)
```
In the case of our `fib_mem()` function, because the function definition is recursive, the intermediate results can be looked up rather than recalculated at each level of recursion. And when the top level `fib_mem()` is called with the same `n,` it simply retrieves it from the cache rather than re-computing it.
So let's see how the two functions perform. We'll call `fib()` once, then call `fib_mem()` twice with the same arguments to see what happens each time.
```{r}
system.time(fib(30))
system.time(fib_mem(30))
system.time(fib_mem(30))
```
It's clear that memoisation has dramatically improved performance compared to the original function. More over, execution is near 0 the next time the function is re-run as nothing is re-computed.
*Note as well that we did not use `bench::mark()` to test our execution times because it involves repeatedly executing. It therefore makes it hard to demonstrate the difference in execution time between the first time the function is called and subsequent calls.*
#### When to use memoisation
- When a function is likely to be called multiple times with the same arguments.
- When the output of the function to be cached is relatively small in size.
- When making queries to external resources, such as network APIs and databases.
#### When to be cautious about or avoid memoisation
- When the output of the function to be cached is so large that it might start making computations memory bound.
- When the function is rarely called using the same inputs.
For more details on memoising and various options for caching, consult the [`memoise` package documentation](https://memoise.r-lib.org/).
## Use latest version of R
Each iteration of R is improving something to gain more speed with less and less memory. It's always useful if you need more speed to switch to latest version of R and see if you get any speed gains.
We've seen that in **R** **3.5.0**, memory management of growing structures through sub-assignment was improved. In **R 4.0.0**, the way R keeps track of references to objects and therefore of when copies are required was improved.
Another important example is that, **as of R 3.4.0, R attempts to compile functions when they are first ran to byte code**. On subsequent function calls, instead of reinterpreting the body of the function, R executes the saved and compiled byte code. Typically, this results in faster execution times on later function calls.
Before this, if you wanted your functions byte compiled to improve performance, you needed to explicitly byte compile them using function `cmpfun` from package `compiler`.
In the following example, if you are running R \>= 3.4.0, you will see that the first time the user defined `csum` function is run takes longer than the second time because behind the scenes, R is byte compiling. You will also see that using `compile::cpmfun()` to byte compile the `csum` function afterwards has no effect as it already byte compiled.
If you are running an earlier version of R however you would see `compile::cpmfun()` having an effect on performance while the first two runs should take about the same time.
```{r}
csum <- function(x) {
if (length(x) < 2) return(x)
sum <- x[1]
for (i in seq(2, length(x))) {
sum[i] <- sum[i-1] + x[i]
}
sum
}
tictoc::tic(msg = "First time function run (byte compiled by default in R >= 3.4.0)")
res <- csum(x)
tictoc::toc()
tictoc::tic(msg = "Second time function run (already byte compiled)")
res <- csum(x)
tictoc::toc()
csum_cmp <- compiler::cmpfun(csum)
tictoc::tic(msg = "Explicit byte compilation does not improve performance")
res <- csum_cmp(x)
tictoc::toc()
```
In general if you are using old versions of R or old functions that are deprecated and are no longer recommended, switching to a new version or new methods will most likely deliver improvements in speed. The best place to track improvements is to consult [CRAN's R NEWS](https://cran.r-project.org/doc/manuals/r-devel/NEWS.html).
The same goes for many packages, as we will see with the completely overhauled package `dtplyr`.