-
Notifications
You must be signed in to change notification settings - Fork 10
/
005-intro-knitr.qmd
566 lines (446 loc) · 17 KB
/
005-intro-knitr.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
---
title: "Dynamic reports with knitr"
author: "Jeff Oliver"
date: "`r format(Sys.time(), '%d %B, %Y')`"
---
```{r library-check, echo = FALSE, results = "hide"}
#| output: false
libs <- c("knitr")
libs_missing <- libs[!(libs %in% installed.packages()[, "Package"])]
if (length(libs_missing) > 0) {
for (l in libs_missing) {
install.packages(l)
}
}
```
An introduction to using the `knitr` package in R to produce reproducible,
dynamic reports.
#### Learning objectives
1. Install and use third-party packages for R
2. Become familar with R Markdown syntax
3. Write dynamic reports including code and visualizations
## Literate programming
What if there was a way to include text _and_ all the code we used for our
analyses in a single document? What about a report that includes not only data
visualization, but the actual code used to produce those visualizations? With
literate programming, we write text and code in a single document - this way,
we can update reports and manuscripts with new data or corrections with minimal
effort.
***
## Getting started
To create these reports, we will make heavy use of the `knitr` package for R.
So if you have not already installed it, run this command in your R console:
```{r install-knitr, eval = FALSE}
install.packages("knitr")
```
To make these reports, which are ultimately output in HTML, PDF, or Word
format, we use a text format called R Markdown. The concept is to use pure text
to indicate formatting like **bold**, _italics_, and ^superscripts^, and to
combine this formatting with code that can be executed and output displayed.
More on how we do that later. For now, let's start by creating a new R Markdown
file via File > New File > R Markdown... You should then be prompted with a
window like:
![](images/r-markdown-dialog.png)
<br />
For the title, enter "knitr lesson" and add your name to the author field.
Leave the default output format as HTML.
At the top of the file is the header section, which includes basic information
about your document. The only field that is absolutely required is the `output`
field, but it is best to include the title, author, and date information, too.
Note that immediately below this header is a chunk of code:
<pre>
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
</pre>
Followed by text:
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for
authoring HTML, PDF, and MS Word documents. For more details on using R
Markdown see <http://rmarkdown.rstudio.com>.
```
We will start with formatting in R Markdown syntax, followed by how to include
R code in your document.
## R Markdown
To try out these formatting examples, start by deleting everything after the
header section, so your document only includes:
```
---
title: "Knitr lesson"
author: "Jeff Oliver"
date: "April 27, 2017"
output: html_document
---
```
Below the header, add this text to your file:
```
# Introduction to knitr
This is my first knitr document.
Bulleted lists
+ Regular font
+ **bold font**
+ _italic font_
Numbered lists
1. one
2. two
2. three
```
And the output file is created when we press the **Knit** button in the
top-left part of the screen (or by pressing Shift-Ctrl-K or Shift-Cmd-K):
> # Knitr lesson
> #### _Jeff Oliver_
> #### _April 27, 2017_
>
> # Introduction to knitr
>
> This is my first knitr document.
>
> Bulleted lists
>
> + Regular font
> + **bold font**
> + _italic font_
>
> Numbered lists
>
> 1. one
> 2. two
> 2. three
Notice the large font of "Introduction to knitr". Because we used a single
pound sign (#) at the start of the line, this text is formated as a level 1
header. To format lower headers, we add pound signs:
```
# header 1
## header 2
### header 3
#### header 4
```
Which are rendered as:
> # header 1
> ## header 2
> ### header 3
> #### header 4
We can also add hyperlinks to our document, using this syntax:
`[text we want to link](url address)`. So to create a link to the University of
Arizona homepage, we write `[University of Arizona](http://www.arizona.edu)`.
When we run Knit, this is displayed in our document as [University of Arizona](http://www.arizona.edu).
Images are also supported, whether they are local files or images on the web.
The syntax is almost identical to that for hyperlinks, but in the case of
images, we prefix the statement with an exclamation point (!): `![Caption for image](filename)`
Here I use an image that I downloaded from
[Wikimedia](https://commons.wikimedia.org) into the folder "images" and include
a caption:
```
![The white rhinoceros (_Ceratotherium simum_) (photo by Rob
Hooft)](images/640px-Rhinoceros_male_2003.jpg)
```
> ![The white rhinoceros (_Ceratotherium simum_) (photo by Rob Hooft)](images/640px-Rhinoceros_male_2003.jpg)
Subscripts and superscripts are also supported by wrapping the font in tildes
(~) and carets (^), respectively:
```
Subscript: log~10~
Superscript: r^2^
```
> Subscript: log~10~
> Superscript: r^2^
Now what happened there? Why aren't those two on separate lines? When the R
Markdown file is interpreted, it assumes adjascent lines should all be part of
the same paragraph, unless you indicate otherwise. The way we do this is by
adding two blank spaces at the end of a line to indicate a paragraph break:
```
Subscript: log~10~ <!-- Two spaces at end of line -->
Superscript: r^2^
```
> Subscript: log~10~
> Superscript: r^2^
And the last thing to mention about formatting is that if you want to include
equations, you can use [LaTeX](http://www.latex-project.org/) syntax,
surrounded by dollar signs (\$). Use single dollar signs for in-line equations,
`$E = mc^2$` is rendered as $E = mc^2$. Equations in double dollar signs are
displayed on their own line, so `$$E = mc^2$$` shows up as
$$E = mc^2$$
You can also write more complex equations, too (remembering to bracket your
LaTeX code with double dollar signs):
```
$$
\begin{aligned}
\begin{array}{l}
\displaystyle \int 1 = x + C\\
\displaystyle \int x = \frac{x^2}{2} + C \\
\displaystyle \int x^2 = \frac{x^3}{3} + C
\end{array}
\end{aligned}
$$
```
$$
\begin{aligned}
\begin{array}{l}
\displaystyle \int 1 = x + C\\
\displaystyle \int x = \frac{x^2}{2} + C \\
\displaystyle \int x^2 = \frac{x^3}{3} + C
\end{array}
\end{aligned}
$$
***
## What about code?
The best part of knitr is the ability to include code and the output of that
code. Let's start by making a new R Markdown file via File > New File > R
Markdown... and give it the title "Iris shape analyses". If the author field is
blank, add your name to the author field.
```
---
title: "Iris shape analyses"
author: "Jeff Oliver"
date: "April 27, 2017"
output: html_document
---
```
Start with a brief description of what this report is about:
```
## Introduction
In this report we test for a relationship between different parts of morphology
in _Iris_ flowers.
```
> ## Introduction
> In this report we test for a relationship between different parts of
morphology in _Iris_ flowers.
Next we can add information about the data we will be using; in this case it is
the built-in `iris` dataset. Add this to your R Markdown file:
```
## Materials & methods
Analyses are based on built-in data for three _Iris_ species. We used linear
regression to test for relationships.
```
> ## Materials & methods
> Analyses are based on built-in data for three _Iris_ species. We used linear
regression to test for relationships.
Now let's actually do some R. We start by plotting the relationship between
petal width and petal length. To write an R code block, we use triple-backticks
(`` ``` ``) and braces to indicate the language (R in our case, but other
languages such as python and bash can also be supported). Here we add code to
indicate the start of the Results section, as well as a plot of the two
variables:
## Results
```{r}`r ''`
plot(x = iris$Petal.Length,
y = iris$Petal.Width,
xlab = "Petal length (cm)",
ylab = "Petal width (cm)")
```
<!-- An alternative approach to get backticks to show up in pdf and html formats
````
## Results
```{r}`r ''`
plot(x = iris$Petal.Length,
y = iris$Petal.Width,
xlab = "Petal length (cm)",
ylab = "Petal width (cm)")
```
````
-->
When we knit our document, the above code is rendered as:
> ## Results
```{r}
plot(x = iris$Petal.Length,
y = iris$Petal.Width,
xlab = "Petal length (cm)",
ylab = "Petal width (cm)")
```
***
We can also do analyses, and reference the output with in-line code. For this
example, let's find the correlation coefficient (r^2^) for the relationship
between petal width and petal length. We can then reference this in the text of
our report.
The R code for a linear model is:
```{r lm, echo = FALSE}
iris_model <- lm(Petal.Width ~ Petal.Length, data = iris)
iris_model_summary <- summary(iris_model)
r_squared <- iris_model_summary$r.squared
```
```{r echo = FALSE}`r ''`
iris_model <- lm(Petal.Width ~ Petal.Length, data = iris)
iris_model_summary <- summary(iris_model)
r_squared <- iris_model_summary$r.squared
```
Note for this purpose, we added the qualifier `echo = FALSE` which tells knitr
_not_ to include the actual code in the output. Even though the code runs, and
because this code produces no output, we won't really see any changes to our
document. Save and Knit the document to see this for yourself.
You can also control code chunks in a number of other ways:
+ `eval = FALSE` to show code but not to execute it
+ `results = "hide"` to suppress any results from being included in output
+ `warning = FALSE` and `message = FALSE` to suppress warnings and messages,
respectively, from being shown
Even though we set `echo = FALSE`, the code is still executed and we can
reference products of that code through in-line code chunks. In this case, we
want to reference the value stored in our `r_squared` variable in our document
text. We use in-line code to do so. In-line code is wrapped in single backticks
(`) and we skip the braces, as opposed to triple backticks and braces we used
for separate code blocks. So we add this to our Results section:
```
Petal width and petal length were highly correlated (r^2^ = `r '\x60r r_squared\x60'`).
```
> Petal width and petal length were highly correlated (r^2^ = `r r_squared`).
Hmmm...maybe we don't need r^2^ to seven digits, so update the code to only
include two digits (using R's `round()` function):
```
Petal width and petal length were highly correlated (r^2^ = `r '\x60r round(r_squared, 2)\x60'`).
```
> Petal width and petal length were highly correlated (r^2^ = `r round(r_squared, 2)`).
We can also include tables in our report, either by creating them manually or
by using the `kable()` function of knitr. In this lesson, I will just show the
manual creation, and leave it to you to look into the `kable()` function later
(see #Additional-resources below for a link to learn more about `kable()`).
We use pipes (`|`) and minus signs (`-`) to create tables; we generally start
with a header row, followed by a separator row of minus signs, then add rows of
data:
| Column 1 | Column 2 |
|------------|------------|
| Row 1 data | Row 1 data |
| Row 2 data | Row 1 data |
| Row 3 data | Row 1 data |
Which will show up in your report as:
| Column 1 | Column 2 |
|------------|------------|
| Row 1 data | Row 1 data |
| Row 2 data | Row 1 data |
| Row 3 data | Row 1 data |
***
Let's add one more thing to this report. Since these data were were collected
by the botanist Edgar Anderson, let's provide a link to the Wikipedia page with
information about the data set. So go back to the Materials & Methods section
and update it with a link:
Change:
```
Analyses are based on built-in data for three _Iris_ species. We used linear
regression to test for relationships.
```
To:
```
Analyses are based on data for three _Iris_ species collected by
[Edgar Anderson](https://en.wikipedia.org/wiki/Iris_flower_data_set). We used
linear regression to test for relationships.
```
And the markdown for the Materials & methods sections will be rendered as:
> ## Materials & methods
> Analyses are based on data for three _Iris_ species collected by
[Edgar Anderson](https://en.wikipedia.org/wiki/Iris_flower_data_set). We used
linear regression to test for relationships.
Our R Markdown file should look like this:
````
---
title: "Iris shape analyses"
author: "Jeff Oliver"
date: "April 27, 2017"
output: html_document
---
## Introduction
In this report we test for a relationship between different parts of morphology
in _Iris_ flowers.
## Materials & methods
Analyses are based on data for three _Iris_ species collected by
[Edgar Anderson](https://en.wikipedia.org/wiki/Iris_flower_data_set). We
used linear regression to test for relationships.
## Results
```{r}`r ''`
plot(x = iris$Petal.Length,
y = iris$Petal.Width,
xlab = "Petal length (cm)",
ylab = "Petal width (cm)")
```
```{r echo = FALSE}`r ''`
iris_model <- lm(Petal.Width ~ Petal.Length, data = iris)
iris_model_summary <- summary(iris_model)
r_squared <- iris_model_summary$r_squared
```
Petal width and petal length were highly correlated (r^2^ = `r '\x60r round(r_squared, 2)\x60'`).
````
***
Which produces the following, albeit short, report:
> # Iris shape analyses
> #### _Jeff Oliver_
> #### _April 27, 2017_
>
> ## Introduction
> In this report we test for a relationship between different parts of morphology in _Iris_ flowers.
>
> ## Materials & methods
> Analyses are based on data for three _Iris_ species collected by
[Edgar Anderson](https://en.wikipedia.org/wiki/Iris_flower_data_set). We used
linear regression to test for relationships.
>
## Results
> ```{r}
plot(x = iris$Petal.Length,
y = iris$Petal.Width,
xlab = "Petal length (cm)",
ylab = "Petal width (cm)")
> ```
>
> ```{r echo = FALSE}
>iris_model <- lm(Petal.Width ~ Petal.Length, data = iris)
>iris_model_summary <- summary(iris_model)
>r_squared <- iris_model_summary$r.squared
> ```
>
> Petal width and petal length were highly correlated (r^2^ = `r round(r_squared, 2)`).
There is a lot more one can do with R Markdown. Check out the
[additional resources](#additional-resources) listed below for more
information.
***
## Other formats
These HTML reports are great (as a matter of fact, all these lessons are
written in R Markdown and converted to HTML with the `knitr` package), but what
about other formats? The other two commonly used formats are documents for word
processing (i.e. Word .doc files) and PDF files. These other formats require
additional software to be installed on your machine:
+ For Word documents, you will need Word or another piece of software that can
interpret .docx files (e.g. LibreOffice or OpenOffice).
+ If you want the resulting document to have styles other than the default
styles produced by `knitr`, first create a .docx file with the styles you
want to apply to your output document, then refer to that file in the
header (see example below and links in [Additional resources](#additional-resources)).
+ For PDF documents, the requirement is dependent on your operating system:
+ Windows: [Tex for Windows](http://miktex.org/2.9/setup)
+ Mac OS X: [Tex for Mac](http://tug.org/mactex)
+ Linux/Unix: Most likely you will need [pandoc](http://pandoc.org/installing.html);
if you try to Knit an R Markdown file into a PDF and get error messages,
they should indicate which additional software may be necessary.
To change the output format, you can update the header information, changing
the value of the `output` field from `html_document` to `word_document` or
`pdf_document`. You can also use the triangle next to the Knit button to open a
drop-down menu and select the format you want.
A Word document header:
```
---
title: "Iris shape analyses"
author: "Jeff Oliver"
date: "April 27, 2017"
output:
word_document:
reference_docx: docs/word-template.docx
---
```
A PDF document header:
```
---
title: "Iris shape analyses"
author: "Jeff Oliver"
date: "April 27, 2017"
output: pdf_document
---
```
***
## Additional resources
+ [knitr documentation](https://yihui.name/knitr/)
+ [R Markdown documentation](http://rmarkdown.rstudio.com/)
+ An awesome [cookbook of R Markdown solutions](https://holtzy.github.io/Pimp-my-rmd/)
+ The [documentation for the `knitr()` function](https://bookdown.org/yihui/rmarkdown-cookbook/kable.html)
+ Creating [Word templates](http://rmarkdown.rstudio.com/articles_docx.html) to
apply to knitr documents
+ A handy [cheatsheet for R Markdown syntax](http://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf)
+ Guide to writing [bibliography sections](http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html) in R Markdown documents
+ Software Carpentry's [knitr lesson](http://swcarpentry.github.io/r-novice-gapminder/15-knitr-markdown/)
+ A [PDF version](https://jcoliver.github.io/learn-r/005-intro-knitr.pdf) of
this lesson