-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-workflow_code_style.Rmd
351 lines (267 loc) · 8.88 KB
/
04-workflow_code_style.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
# Workflow: code style
> "Good coding style is like correct punctuation: you can \ \ manage without it, butitsuremakesthingseasiertoread." — Tidyverse Style Guide
**Learning objectives:**
- Use the styler package to apply style rules.
- Select good variable names.
- Appropriately utilize spaces ` ` between mathematical operators.
- Apply the `|>` operator to make our code more readable.
## Use styler {-}
{styler} 📦 by Lorenz Walthert helps quickly apply styles
- Install with `install.packages("styler")`
- Use via RStudio’s command palette
- **Cmd/Ctrl + Shift + P** to open pallette
- Type **“styler”** to see all styler shortcuts
## Names {-}
Names should use only:
1. lowercase letters
2. numbers
3. underscore
## Use snake_case for names {-}
`snake_case` = preferred style
```{r eval=FALSE}
# Strive for:
short_flights <- flights |> filter(air_time < 60)
# Avoid:
SHORTFLIGHTS <- flights |> filter(air_time < 60)
```
## Names should be descriptive {-}
- Good: Long, descriptive names
- Easy to understand
- Bad: Concise names
- Fast to type, but RStudio autocompletes long names anyway
Be consistent with related names!
## Spaces {-}
Spaces around:
- Math operators (except `^`)
- Assignment operator (`<-`)
```{r eval=FALSE}
# Strive for
z <- (a + b)^2 / d
# Avoid
z<-( a + b ) ^ 2/d
```
## Spaces in function calls {-}
- No spaces inside/outside parentheses
- Spaces after commas (like regular English)
```{r eval=FALSE}
# Strive for
mean(x, na.rm = TRUE)
# Avoid
mean (x ,na.rm=TRUE )
```
## Use extra spaces for alignment {-}
```{r eval=FALSE}
flights |>
mutate(
speed = air_time / distance,
dep_hour = dep_time %/% 100,
dep_minute = dep_time %% 100
)
```
## Pipes: Spacing {-}
```{r eval=FALSE}
# Strive for space before |>, end line with |>
flights |>
filter(!is.na(arr_delay), !is.na(tailnum)) |>
count(dest)
# Avoid
flights|>filter(!is.na(arr_delay), !is.na(tailnum))|>count(dest)
```
Easier to:
1. add new steps
2. rearrange existing steps
3. modify elements within a step
4. to get a 50,000 ft view by skimming the verbs on the left-hand side
## Pipes: Newlines {-}
- Put named arguments on newline
- Put unnamed arguments on one line if they fit
```{r eval=FALSE}
# Strive for
flights |>
group_by(tailnum) |>
summarize(
delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
# Avoid
flights |>
group_by(
tailnum
) |>
summarize(delay = mean(arr_delay, na.rm = TRUE), n = n())
```
## Pipes: Indentation {-}
```{r eval=FALSE}
# Strive for
flights |>
group_by(tailnum) |>
summarize(
delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
# Avoid
flights|>
group_by(tailnum) |>
summarize(
delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
flights|>
group_by(tailnum) |>
summarize(
delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
```
## Pipes: Breaking the rules {-}
OK to shirk some rules to fit on one line
```{r eval = FALSE}
# This fits compactly on one line
df |> mutate(y = x + 1)
# While this takes up 4x as many lines, it's easily extended to
# more variables and more steps in the future
df |>
mutate(
y = x + 1
)
```
## Pipe length {-}
- Limit to 10-15 lines
- Break up into smaller sub-tasks
- Give each task an informative name
## ggplot2 spacing {-}
Same rules for `+` as for `|>`
```{r eval=FALSE}
flights |>
group_by(month) |>
summarize(
delay = mean(arr_delay, na.rm = TRUE)
) |>
ggplot(aes(month, delay)) +
geom_point() +
geom_line()
```
## ggplot2 spacing 2 {-}
```{r eval=FALSE}
flights |>
group_by(dest) |>
summarize(
distance = mean(distance),
speed = mean(air_time / distance, na.rm = TRUE)
) |>
ggplot(aes(distance, speed)) +
geom_smooth(
method = "loess",
span = 0.5,
se = FALSE,
color = "white",
size = 4
) +
geom_point()
```
## Sectioning comments {-}
Use sectioning comments (**Cmd/Ctrl + Shift + R**) to break up long files
```{r load-data-comment}
# Load data --------------------------------------
# Plot data --------------------------------------
```
## Exercises {-}
1. Restyle the following pipelines following the guidelines above.
```{r exercises, eval=FALSE}
flights|>filter(dest=="IAH")|>group_by(year,month,day)|>summarize(n=n(),
delay=mean(arr_delay,na.rm=TRUE))|>filter(n>10)
flights|>filter(carrier=="UA",dest%in%c("IAH","HOU"),sched_dep_time>
0900,sched_arr_time<2000)|>group_by(flight)|>summarize(delay=mean(
arr_delay,na.rm=TRUE),cancelled=sum(is.na(arr_delay)),n=n())|>filter(n>10)
```
Solution
```{r exercises solution, eval=FALSE}
flights |> # space before/ after pipe and the last thing on a line
filter(dest == "IAH") |> # indent twice, space between operators
group_by(year, month, day) |> # space after comma
summarize( # nam
n = n(), # add spaces so that all the = line up
delay = mean(arr_delay, na.rm = TRUE)
) |>
filter(n > 10)
flights |>
filter(
carrier == "UA",
dest %in% c("IAH","HOU"),
sched_dep_time > 900,
sched_arr_time < 2000
) |>
group_by(flight) |>
summarize(
delay = mean(arr_delay, na.rm=TRUE),
cancelled = sum(is.na(arr_delay)),
n = n()
) |>
filter(n > 10)
```
## Meeting Videos {-}
### Cohort 5 {-}
`r knitr::include_url("https://www.youtube.com/embed/JW1DilkJ3-0")`
<details>
<summary> Meeting chat log </summary>
```
00:12:57 Jon Harmon (jonthegeek): https://en.wikipedia.org/wiki/The_Treachery_of_Images
00:16:47 Jon Harmon (jonthegeek): https://cran.r-project.org/package=magrittr
00:20:53 Jon Harmon (jonthegeek): https://smile.amazon.com/Mathematical-Notation-Guide-Engineers-Scientists/dp/1466230525/ref=sr_1_2?keywords=Mathematical+Notation&qid=1641665745&sr=8-2
00:22:16 Jon Harmon (jonthegeek): |>
00:23:04 Ryan Metcalf: And, is this the “Walrus” operator? `:=`
00:23:19 Jon Harmon (jonthegeek): %>%
00:23:28 Jon Harmon (jonthegeek): |>
00:24:14 Jon Harmon (jonthegeek): a |> f()
00:24:23 Jon Harmon (jonthegeek): f(a)
00:41:01 Becki R. (she/her): Ryan, can you put a link to the video in the chat?
00:42:27 Jon Harmon (jonthegeek): a %>% myfun(a = 10, b = .)
00:42:59 Jon Harmon (jonthegeek): a |> myfun(a = 10, b = .)
00:43:00 Ryan Metcalf: Jenny Bryant Youtube Video: https://youtu.be/4MfUCX_KpdE
00:43:07 Becki R. (she/her): Thanks
00:44:57 Jon Harmon (jonthegeek): a %>% myfun(arg1 = 10, arg2 = .)
00:45:48 Jon Harmon (jonthegeek): a %>% myfun(arg2 = .)
00:47:40 Federica Gazzelloni: https://www.r-bloggers.com/2021/05/the-new-r-pipe/
00:50:00 Jon Harmon (jonthegeek): https://twitter.com/hadleywickham/status/1359852563726819332
00:51:58 Federica Gazzelloni: https://community.rstudio.com/t/why-is-r-core-creating-a-new-pipe-operator/89020
01:03:35 Federica Gazzelloni: you can use: set.seed(123) with some numbers inside to recall the same random set
01:03:56 Federica Gazzelloni: set.seed(123)
01:04:06 Federica Gazzelloni: rnorm(100)…
01:08:46 Jon Harmon (jonthegeek): a <- a %>%
a %<>%
01:10:41 Jon Harmon (jonthegeek): mtcars <- head(mtcars)
01:10:50 Jon Harmon (jonthegeek): mtcars <- mtcars %>% head()
01:10:57 Jon Harmon (jonthegeek): mtcars %<>% head()
01:13:56 Becki R. (she/her): Where is the signup list for presenting chapters?
```
</details>
### Cohort 6 {-}
`r knitr::include_url("https://www.youtube.com/embed/jwnQkHtUiOY")`
<details>
<summary> Meeting chat log </summary>
```
00:09:32 Marielena Soilemezidi: =
00:26:49 Daniel Adereti: Yes, I agree. It appears the dot allows to insert any argument
00:52:38 Daniel Adereti: Thanks Marielena, also share Shannon's thoughts, pipes had always been taught like a must use operator, glad to see it is meant to be selective
00:55:20 Daniel Adereti: works for me too, it is quite a large chapter
00:55:39 Daniel Adereti: Sorry, my mic seems problematic, hence my typing
```
</details>
### Cohort 7 {-}
`r knitr::include_url("https://www.youtube.com/embed/wlYh2KewQcI")`
<details>
<summary> Meeting chat log </summary>
```
00:21:34 Oluwafemi Oyedele: https://towardsdatascience.com/understanding-the-native-r-pipe-98dea6d8b61b
00:22:15 Christine (she/her): Thanks, Oluwafemi!
00:22:51 Oluwafemi Oyedele: https://stackoverflow.com/questions/67633022/what-are-the-differences-between-rs-new-native-pipe-and-the-magrittr-pipe
00:37:22 Oluwafemi Oyedele: https://github.com/daranzolin/ViewPipeSteps
00:49:42 timnewby: https://towardsdatascience.com/an-introduction-to-the-pipe-in-r-823090760d64 Talks a bit about when pipes work well and when they might not
00:50:13 Oluwafemi Oyedele: http://127.0.0.1:22120/session/Rvig.11e05a207606.html
01:03:47 Dolleen Osundwa: thank you Kumar
```
</details>
`r knitr::include_url("https://www.youtube.com/embed/LHYUEAXaYoM")`
### Cohort 8 {-}
`r knitr::include_url("https://www.youtube.com/embed/uEYKR3xs4Y0")`
`r knitr::include_url("https://www.youtube.com/embed/MvzrbyCuFLw")`