-
Notifications
You must be signed in to change notification settings - Fork 1
/
04-errors.qmd
148 lines (88 loc) · 7.79 KB
/
04-errors.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# Error checking and debugging
In this chapter, you'll learn how to use AI to help you identify and fix coding errors. I'd argue this is probably the best use case of AI - using it to help and debug code you've written yourself - because it's human expertise that wrote the code, but with the AIs attention to detail and ability to spot missing commas.
AI is generally good at this task, although the more complicated your code, the more likely it is that it will run into trouble. This chapter will give a few examples to help you with your prompt engineering.
## Activity 1: Set-up {#sec-setup-errors}
So that you can reproduce the same errors, let's create a reproducible example and load some packages and a dataset. Open an Rmd (not a script, we'll use the Rmd in Activity 3) and run the below (you may need to install the package `palmerpenguins` if you don't have it already).
```{r warning=FALSE, message=FALSE}
library(tidyverse)
library(palmerpenguins)
data("penguins")
```
## Activity 2: Simple errors {#sec-simple-errors}
Unlike the other chapters you don't need to do any set-up, in fact, you can often just copy and paste the code and error in and it will figure out that you want it to fix it without even needing to explicitly ask.
Here's a simple error where we have given it the wrong function name:
```{r error=TRUE}
ggplot(penguins, aes(x = species)) +
geom_barchart()
```
* Give your AI of choice both the code and the error. One without the other is likely to result in a poor or incomplete answer (whether you ask a human or an AI).
Both AIs identified, explained, and fixed this error. Copilot also gives you some follow up questions you could ask to understand more about the code, which is nice.
```{r img-simple, echo=FALSE, fig.cap="Fixing a simple error"}
knitr::include_graphics("include/images/04-error/simple.png")
```
## Activity 3: Contextual errors
Something that catches many people out is when the error is actually being caused by code further up your script rather than the bit that is results in the error.
For example, in this code, what we intended to do was to create a dataset that just has penguins from Biscoe Island and then calculate their mean body mass. This code will run, but it produces `NaN` as the value.
```{r}
biscoe_penguins <- penguins %>%
filter(island == "biscoe")
biscoe_penguins %>%
summarise(mean_mass = mean(body_mass_g))
```
If you just give an AI the code and the table and ask it to explain what's happening, it will do its best but without knowing the dataset or what code has preceded it, it won't give you the exact answer, although in this case it hints at it.
```{r img-context1, echo=FALSE, fig.cap = "ChatGPT giving its best guess"}
knitr::include_graphics("include/images/04-error/context1.png")
```
There's a couple of things you can do at this point:
* Give the AI all the code you've used so far
* Give the AI more information about the dataset.
You can manually type out a description but there's some functions you can use that can automate this.
`summary()` is useful because it provides a list of all variables with some descriptive statistics so that the AI has a sense of the type and range of data:
```{r}
summary(penguins)
```
`str()` is also useful because it lists the variables, their data type, and the initial values for each variable. However, that means that you are giving it at least some of the raw data so you have to be very careful if you have sensitive / confidential data and you must ensure that any use of AI is in line with your data management plan. Using Copilot Enterprise means the data won't be stored and used to train the AI further so it's potentially the best option (which is not to say it's safe or problem free, please be careful and critical!).
```{r}
str(penguins)
```
Finally, `ls()` provides a list of all the variables in a given object. It doesn't provide any info on the variable type or sample, but that does mean it's the most secure and depending on the task, this might be all the info you really need to give the AI. I would suggest starting with `ls()` and only scaling up if necessary (and your data isn't sensitive):
```{r}
ls(penguins)
```
* Run `summary(biscoe_penguins)` and give the AI the output so that it better understands the structure and contents of the datasets
If you haven't spotted it by now, the error is that in the filter `biscoe` should be `Biscoe` with a capital B. It still doesn't have the information it needs to tell you this explicitly, but it will get you very close. **There is no shortcut for knowing your data**.
```{r img-context2, echo=FALSE, fig.cap="Copilot getting very close"}
knitr::include_graphics("include/images/04-error/context2.png")
```
## Activity 4: Incorrect (but functional) code
Sometimes (often) when we write code, the issue isn't that our code doesn't work, but that it doesn't do what we intended to do and we can't figure out why.
For example, let's say that we want to calculate the average body_mass_g for each species by sex. We're feeling a bit lazy and we copy and paste in the following from a previous script we have:
```{r message=FALSE}
penguins %>%
group_by(sex, species) %>%
summarise(mean_body_mass = sd(body_mass_g, na.rm = TRUE))
```
We know something isn't right here. Because we're responsible researchers, we've taken time to understand our dataset and what plausible values should be and we know there's no way that the average body mass of a penguin is 269 grams (unless the penguin is made of chocolate). But the code is running fine, we know it's worked before, and we can't see what we've done wrong.
You can ask the AI to help you but you can't just give it the code and output, you also need to tell it what you intended to do. In this case, all three AIs correctly identified that I had used `sd` instead of `mean`. The more complex your code, the more information you will need to give it in order for it to help you find the error.
```{r img-functional, echo=FALSE, fig.cap="Fixing a functional error"}
knitr::include_graphics("include/images/04-error/functional_error.png")
```
This is a good example of why **there is no AI tool that allows you to skip understanding the data you're working with and knowing what it is you're trying to do**.
## Activity 5: Rmd errors
If you're working in R Markdown or Quarto, sometimes the errors will stem from your code chunk settings or YAML.
In your Rmd file, create a new code chunk and copy and paste the following:
```{r eval = FALSE}
penguins %>%
count()
```
But then delete one of the final back ticks (`) from the code chunk.
The code is fine, it provides a simple count of the number of observations in the dataset. But if you try and knit the file, you'll get a long `! attempt to use zero-length variable name`. Copilot wasn't that helpful because all of its suggestions relate to checking your code:
```{r img-rmd1, echo=FALSE, fig.cap="Missing the mark"}
knitr::include_graphics("include/images/04-error/rmd1.png")
```
::: {.callout-important}
I know I might be starting to sound like a broken record but **please** remember that artificial intelligence is not actually intelligent. It's not thinking, it's not making conscious decisions, it has no expert subject matter knowledge. No matter how helpful it is, you must always check the output of the code it gives you.
:::
::: {.callout-caution}
This book was written in Spring 2024 and should be considered a **living document**. The functionality and capability of AI is changing rapidly and the most recent advances may not reflect what is described in this book. Given the brave new world in which we now live, all constructive feedback and suggestions are welcome! If you have any feedback or suggestions, please provide it [via Forms](https://forms.office.com/Pages/ResponsePage.aspx?id=KVxybjp2UE-B8i4lTwEzyKAHhjrab3lLkx60RR1iKjNUM0VCMUUxUUFLMTdNM0JTS09PSDg2SFQ3US4u).
:::