-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdebug-r-code.qmd
489 lines (352 loc) · 13 KB
/
debug-r-code.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
---
title: "Debugging R Code"
---
Debugging is an essential skill for any R programmer. It's the process of
finding and fixing errors or unexpected behaviors in your code. This chapter
will cover various techniques and tools to help you effectively debug R code.
## Types of Errors in R
Understanding the different types of errors you may encounter is the first step
in debugging code. In R, we typically deal with four main categories of errors:
**syntax**, **runtime**, **logical**, and **semantic**.
### Syntax Errors
Syntax errors occur when the R interpreter cannot parse your code due to
violations of the language's syntax rules. These errors prevent your code from
running at all. These errors are quicker to spot and fix because they prevent
the code from running. Let's look at a couple common syntax errors in R:
:::todo
Cut down and/or move some examples to an appendix
:::
#### Missing terms
Common syntax errors in R include missing terms in function calls or expressions. For example, if you forget to include a comma between arguments in a function call, R will throw an error:
```{.r}
plot(x, y type = "l")
# ^ Missing comma
# Error: unexpected symbol in "plot(x, y type"
```
Or, if you forget to include a plus sign in a linear model formula, R will throw an error:
```{.r}
lm(y ~ x1 + x2 x3, data = my_data)
# ^ Missing plus sign
# Error: unexpected symbol in "lm(y ~ x1 + x2 x3"
```
The more severe syntax errors appear when you forget to close parentheses or quotation marks. For example:
```{.r}
lm(y ~ x1 + x2, data = my_data
# ^ Missing closing parenthesis
```
```{.r}
# > lm(y ~ x1 + x2, data = my_data
# +
# ^ R is waiting for more input.
```
```{.r}
print("Hello, world!)
# ^ Missing quotation mark
```
```{.r}
# > print("Hello, world!)
# +
# ^ R is waiting for more input.
```
These syntax errors will require you to first stop R from waiting for more input
by pressing `Esc` or `Ctrl + C` and then adding the missing parenthesis or
quotation mark.
#### Case and Naming Sensitivity
R is case-sensitive, so you must use the correct case for function names,
variable names, and other identifiers. For example, if you try to call a function
with the wrong name, R will throw an error:
```{.r}
summry(my_data) # Should be summary()
# Error in summry(my_data) : could not find function "summry"
```
Similarly, if you use the wrong case for a variable name, R will not recognize
the variable:
```{.r}
N_STUDENTS <- 20
y <- n_students + 5 # Should be N_STUDENTS
# Error: object 'n_students' not found
```
#### Wrong Operators
Using an assignment operator instead of the equality operator in a
conditional statement.
```{.r}
subset(df, x = 1)
# ^ Should be x == 1
# Error in subset.default(df, x = 1) : 'subset' must be logical
```
#### Symbols instead of Strings
Using symbols instead of strings in function calls or assignments.
```{.r}
factor(c(male, female, male)) # Should be c("male", "female", "male")
# Error: object 'male' not found
```
### Runtime Errors
Runtime errors occur during the execution of your code. The code is
syntactically correct but ends up encountering issues during execution. These
errors can be caused by various reasons like trying to perform an operation on
an object of the wrong type, accessing a non-existent variable, or using
incorrect functions or data structures.
1. Attempting to calculate a correlation with non-numeric data
```r
df <- data.frame(x = c("a", "b", "c"), y = c(1, 2, 3))
cor(df)
# Error in cor(df) : 'x' must be numeric
```
In this example, the `cor()` function expects numeric data, but the `x` column
contains character data, resulting in a runtime error when trying to calculate
the correlation.
2. Trying to access a non-existent variable.
```r
print(non_existent_variable)
# Error: object 'non_existent_variable' not found
```
In this case, the variable `non_existent_variable` doesn't exist in the current
environment, leading to a runtime error when trying to print it.
### Logical Errors
Logical errors are the most subtle and often the hardest to detect. These
errors don't cause the program to crash or produce error messages, but they
result in incorrect output. They occur when the code doesn't do what you
intended it to do.
Flawed algorithm for calculating factorial
```{r}
factorial <- function(n) {
result <- 0
for (i in 1:n) {
result <- result + i # Should be result <- result * i
}
return(result)
}
```
Off-by-one errors:
```{r}
days_in_month <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
for (i in 0:12) { # Should be 1:12
print(days_in_month[i])
}
```
### Semantic Errors
Semantic errors occur when code is syntactically correct and runs without
throwing exceptions, but produces incorrect or unintended results. These
errors are logical flaws in the program that don't align with the
intended behavior. They can be challenging to identify because they don't
manifest as errors or warnings.
Take the following example of a function that checks if a person is an adult:
```{r}
is_adult <- function(age) {
age > 18 & age < 65 # Should be age >= 18 & age < 65
}
is_adult(17)
is_adult(18)
is_adult(19)
```
In this case, the function `is_adult()` incorrectly classifies 18-year-olds as
non-adults. This is a semantic error that doesn't produce an error message but
produces incorrect results.
:::todo
Add a section comparing and contrasting semantic vs. logical errors.
:::
## Understanding Error Messages
The first step in debugging is to understand the error messages R provides.
R error messages typically include:
- The type of error
- The line number where the error occurred
- A brief description of the problem
For example:
```r
Error in mean(x) : object 'x' not found
```
This error tells us that the function `mean()` was called with an object `x`,
but `x` doesn't exist in the current environment. From this message, we can
infer that `x` is not defined or is out of scope.
## Using `print()` Statements
One of the simplest debugging techniques is to use `print()` statements
throughout your code. This helps you track the values of variables and the
flow of your program.
```{r}
calculate_mean <- function(data) {
print("Entering function")
print(paste("Data:", data))
result <- mean(data)
print(paste("Result:", result))
return(result)
}
```
When you run this function, you'll see the printed messages in the console,
which can help you identify where things are going wrong.
```{r}
calculate_mean(c(1, 2, 3, 4, 5))
```
### Using Diagnostic Messages
In addition to `print()` statements, you can use `message()`, `warning()`, and
`stop()` to provide more informative diagnostic messages.
#### `message()`
The `message()` function is similar to `print()` but is intended for
informational messages rather than debugging output.
```{r}
calculate_mean <- function(data) {
if (length(data) == 0) {
message("Data is empty")
}
result <- mean(data)
return(result)
}
```
When you run this function with an empty data vector, you'll see the message
"Data is empty" in the console. The function will continue to execute, but the
message can help you identify potential issues.
#### `warning()`
The `warning()` function is used to display non-fatal issues that may affect
the results of your code.
```{r}
calculate_mean_with_warning <- function(data) {
if (length(data) == 0) {
warning("Data is empty")
}
result <- mean(data)
return(result)
}
```
When you run this function with an empty data vector, you'll see a warning
message in the console. The function will continue to execute, but the warning
can help you identify potential problems. We can also use `warnings()` to
retrieve a list of all warnings that have been generated.
```{r}
warnings()
```
We can modify the behavior of warnings by setting the global option `warn`.
In particular, `warn` offers three levels of support:
1. `options(warn = 0)` will suppress all warnings until the top-level function has completed.
2. `options(warn = 1)` will display warnings as they occur but not halt execution.
3. `options(warn = 2)` converts warnings into errors that will halt execution.
For debugging purposes, you may wish to use `options(warn = 2)` to catch
issues early in your code.
#### `stop()`
The `stop()` function allows you to halt execution and display an error message.
This can be useful for catching unexpected conditions in your code.
```{r}
#| error: true
calculate_mean_with_stop <- function(data) {
if (length(data) == 0) {
stop("Data is empty")
}
result <- mean(data)
return(result)
}
calculate_mean_with_stop(numeric())
```
## Using `browser()`
The `browser()` function is a powerful tool for interactive debugging. When R
encounters a `browser()` statement, it pauses execution and allows you to
inspect the current environment.
```{r}
calculate_mean_browser <- function(data) {
browser()
result <- mean(data)
return(result)
}
calculate_mean_browser(1:5)
```
When you run this function, R will pause at the `browser()` call, allowing you
to examine and manipulate variables. This is particularly useful for
understanding the state of your program at a specific point in time.
When you're in the browser, you can navigate through the code using the
following commands:
- `ls()` to list the objects in the current environment.
- `print(object)` to display the value of an object.
- `c` to continue execution by exiting the browser.
- `f` to finish the current loop or function.
- `n` to step to the next line.
- `s` to step into a function call.
- `where` to display the call stack.
- `r` to restart if you want to re-run the function.
- `Q` to quit the debugger.
We'll see in @sec-rstudio-debugging-tools that RStudio provides a more
user-friendly interface for this type of debugging through icons and breakpoints.
## Using an interactive debugger
The `debug()` function allows you to step through a function line by line. We
can use `debug()` to immediately start debugging a function when it is
called. For example:
```{r}
debug(calculate_mean)
calculate_mean(c(1, 2, 3, 4, 5))
```
Notice, the function `calculate_mean` is now in debug mode. When you run the
function, R will pause at the first line of the function and allow you to step
through the code. You can use similar commands as we saw with `browser()` to
inspect variables and control the flow of the program.
If we want to remove the debugger, we would use:
```{r}
undebug(calculate_mean)
```
Though, if we only want to debug the next call, we can use `debugonce()` instead.
This will only debug the next call to the function without needing to remove the
debugger afterwards.
## Using `traceback()`
`traceback()` shows you the sequence of function calls that led to an error. It's particularly useful for understanding errors in complex, nested function calls.
```{r}
#| error: true
f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) x + "a"
f(10) # This will cause an error
```
```r
traceback()
# 3. h(x)
# 2. g(x)
# 1. f(10)
```
## Implementing Error Handling
:::todo
Add a section describing fail-fast principles
:::
Proper error handling can make debugging easier. Use `try()`, `tryCatch()`, and custom error messages to manage potential issues.
```{r}
safe_mean <- function(x) {
tryCatch(
mean(x),
error = function(e) {
message("An error occurred: ", e$message)
return(NA)
}
)
}
safe_mean("bad input")
```
In this example, `safe_mean()` attempts to calculate the mean of a vector.
If an error occurs, it prints a message and returns `NA` instead of halting
execution.
## Using RStudio's Debugging Tools {#sec-rstudio-debugging-tools}
If you're using RStudio, take advantage of its built-in debugging tools:
- Breakpoints: Click next to a line number to set a breakpoint.
- Environment pane: Inspect variable values during debugging.
- Debug toolbar: Step through code, examine the call stack, and more.
:::todo
Add screenshot of the debugger
:::
## Leveraging Package-Specific Debugging Tools
Some R packages provide their own debugging tools. For example, the `debugr`
package offers a more streamlined print debugging experience.
:::todo
Add an example of using debugr
:::
## Common Debugging Pitfalls
Be aware of common issues that can complicate debugging:
- Scope issues: Ensure you're looking at the correct environment.
- Data type mismatches: Check that your functions are receiving the expected data types.
- Missing values: Be cautious of `NA` values affecting your calculations.
## Best Practices for Debuggable Code
Write code that's easier to debug:
- Use meaningful variable names
- Break complex operations into smaller, testable functions
- Comment your code thoroughly
- Use version control to track changes
## Debugging in Quarto
:::todo
Add a section on debugging with Quarto
:::
## Summary
Through mastering these debugging techniques and tools, you'll be well-equipped to
tackle even the most challenging coding issues in R. Remember, effective
debugging is not just about fixing errors—it's about understanding your code
more deeply and writing more robust programs.