-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path3-4_functions_custom.qmd
283 lines (198 loc) · 6.88 KB
/
3-4_functions_custom.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
# Creating your own functions {.unnumbered}
To write your own function, you need to use the function `function()`. In the brackets `()` you specify argument(s) that will be used in the function (with our without defaults), and in curly brackets `{ }` you specify what the function should do, referring to the argument(s).
The general form is:
```{r}
myFun <- function(arg1, arg2) {
## Here you type expressions that use the arguments
}
```
### Example of custom function
Each line inside the function is an object assignment, a function call, a subsetting, a conditional statement, an if/else statement, a for loop, etc. - basically anything you have now learned how to do in R that you want the function to do!
Below is an easy example that calculates the mean of two values (`x` and `y`):
```{r}
mean_xy <- function(x, y){
(x + y)/2
}
```
This function can be used the same way as how you have been using functions before:
```{r}
mean_xy(2,6)
```
Or:
```{r}
mean_xy(x = 2, y = 6)
```
### Use return() in a function
To have a function output something, you must return something. Either the value of the last command is returned (as in `mean_xy`) or you can use `return()`.
```{r}
mean_xy <- function(x, y){
z <- (x + y)/2
return(z)
}
mean_xy(x = 2, y = 6)
```
Here are a few other examples.
```{r}
mean_xy_2 <- function(x, y){
z <- (x + y)/2
x
z
}
mean_xy_2(x = 1, y = 3)
```
Note that x is not returned. Only the last expression is returned.
```{r}
mean_xy_3 <- function(x, y){
z <- x + y
return(x)
z
}
mean_xy_3(x = 1, y = 3)
```
Note that z is not returned, if a return statement is encountered in the function anything after that statement is [not executed]{.underline}.
You can create functions with a variable number of arguments using `...`. For example, here’s a function that returns the mean of all the values in a vector of arbitrary length:
```{r}
mean_vector <- function(...){
z <- mean(c(...))
return(z)
}
mean_vector(1,2,3)
mean_vector(1,2,3,4,5,6,7,8,9,10)
```
The arguments in a vector do not have to be single values. Functions can be vectorized:
```{r}
mean_x <- function(x){
z <- mean(x)
return(z)
}
x <- c(1,2,3,4,5)
mean_x(x)
```
Custom functions in R are useful if you have a bunch of commands that you have to use multiple times. By combining them in a function you 1) save time, 2) keep your code concise, and 3) make less coding mistakes.
In the next example a function called `my_descriptives` is made to calculate a mean of a vector only for the positive values.
### Build a custom function
```{r}
my_descriptives <- function(x){
x.trim <- x[x>=0]
out <- mean(x.trim)
return(out)
}
```
In the first line inside the function a sub sample of the vector is taken `x.trim`, with only values `>=0`. In the second line, the mean of this `x.trim` is taken.
This function can be used to describe a vector in my data set, but there are negative values where only positive values are allowed.
```{r, echo=FALSE}
PatientID <- 1:25
Ages <- round(c(rnorm(15, mean = 45, sd = 10), -50, rnorm(9, mean = 45, sd = 10) ) )
data <- data.frame(PatientID = PatientID, Ages = Ages)
data$Sex <- as.factor(c(rbinom(24, 1, 0.5),-1))
```
In this data set, there is a variable `Ages`:
```{r}
data$Ages
```
There is one value `-50`, that is clearly an error.
```{r}
my_descriptives(data$Ages)
```
Compare the output with using the standard function `mean()`:
```{r}
mean(data$Ages)
```
In the standard `mean` function, the negative outlier is included and influences the mean!
The output of a function does not need to be a scalar. This version of the function `my_decriptives()` provides the whole summary of the variable, instead of only the mean.
```{r}
my_descriptives <- function(x){
x.trim <- x[x>=0]
out <- summary(x.trim)
return(out)
}
```
```{r}
my_descriptives(data$Ages)
```
Again, let's compare the output to the standard `summary()` function.
```{r}
summary(data$Ages)
```
If you have multiple objects to return, you have to put them in an object container, like a list, vector, array or data.frame. It is not possible to return multiple individual objects like this:
`return(x,y)`
but it is possible to return them in a vector or list like this:
`return(c(x,y))`
`return(list(x,y)`
Here is an example of the function with multiple outputs:
```{r}
my_descriptives2 <- function(x){
x.trim <- x[x>0]
below0 <- sum(x<0)
meanX <- mean(x.trim)
return(list(below0, meanX))
}
```
The function additionally returns how many values were negative.
```{r}
my_descriptives2(data$Ages)
```
There was `1` value below zero, as provided by the first element in the list.
Notice how the function gives an error if you do not put the items in a list:
```{r, error=TRUE}
my_descriptives2_wrong <- function(x){
x.trim <- x[x>0]
below0 <- sum(x<0)
meanX <- mean(x.trim)
return(below0, meanX)
}
my_descriptives2_wrong(data$Ages)
```
Specifying default arguments of a function, can be done by filling in the default value in the `function()` call. Here is an example of a function with a default argument (`y = 2`).
```{r}
calc4 <- function(x, y = 2){
z1 <- x + y
z2 <- x * y
return(c(z1, z2))
}
calc4(x = 1) ## uses y = 2
calc4(x = 1, y = 3) ## overwrites default value of y
```
## Function environments
Each function, whether built-in or user-defined, has an associated environment, which can be thought of as a container that holds all of the objects present at the time the function is created.
When a function is created on the command line, it's environment is the so-called "Global Environment":
```{r}
w <- 2
f <- function(y) {
d <- 3
return(d * (w + y))
}
environment(f)
```
The function `objects()` (or `ls()`), when called from the command line, lists the objects in the Global Environment:
```{r}
objects()
```
### Global and Local Variables
In the function `f()` defined above, the variable `w` is said to be global to `f()` and the variable `d`, because it's created within `f()`, is said to be local to `f()`. Global variables (like `w`) are visible from within a function, but local variables (like `d`) aren't visible from outside the function. In fact, local variables are temporary, and disappear when the function call is completed:
```{r, eval=FALSE}
f(y = 1)
d
```
You get an error: [Error in eval(expr, envir, enclos) : object 'd' not found]{style="color: red"}, indicating that the variable d does not exist in the 'Global Environment'.
When a global and local variable share the same name, the local variable is used:
```{r}
w <- 2
d <- 4
f <- function(y) {
d <- 3
return(d * (w + y))
}
f(y = 1)
```
Note also that when an assignment takes place within a function, and the local variable shares its name with an existing global variable, only the local variable is affected:
```{r}
w <- 2
d <- 4 # This value of d will remain unchanged.
f <- function(y) {
d <- 3 # This doesnt affect the value of d in the global environment
return(d * (w + y))
}
f(y = 1)
d
```