-
Notifications
You must be signed in to change notification settings - Fork 3
/
S4-objects.Rmd
292 lines (199 loc) · 10.1 KB
/
S4-objects.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
title: "S4 Objects"
output: html_notebook
author: Ted Laderas
---
# Before you start
Make sure that you've clicked on the `oop_talk.rproj` file to open this folder as a project.
# What are S4 Objects?
S4 objects are like S3 objects, but are stricter. Whereas an S3 object is a list, and you can put anything in the list, S4 objects have *slots*: these are specific data types, such as `vectors`, `data.frames`. You can also put additional restrictions on these data frames.
In S3, you can literally change the class of any object by using the `class()` function:
```{r}
my_list <- list(a="blah", b=c(1,2,3))
class(my_list) <- "newclass"
class(my_list)
```
This flexibility can be a very bad thing when trying to get a data structure to work consistently across a bunch of packages. Things might not work like you expect them to, or worse, the loose definition of the object can cause compatibility issues down the line. So that's why you would want to be strict.
S4 also has *methods* like *S3*, and you also have to be strict about what arguments go into each method. S4 does this by first defining a generic function, and then specifying a method with strict typing.
# Properties of S4 Objects
## Loading an Object
We're going to load an object called `my_result_object`, which has the class `StatsPackageResult`.
```{r}
source("R/classes-S4.R")
#we use here
library(here)
my_result_object <- readRDS(here("data/s4_stat_result.rds"))
```
## What class is my object?
```{r}
class(my_result_object)
```
Okay we have an object, but how do we know it's an S4 object?
```{r}
isS4(my_result_object)
```
## Slots
Here we can see the `slots` of `my_result_object`. Slots hold different parts of the data object. We have to be explicit in what data types go in them.
```{r}
slotNames(my_result_object)
```
We can also ask about the S4 class - this gives us more information about what is in the individual slots:
```{r}
getSlots("StatPackageResult")
```
## We need more info
We can use the `getClass` function to get more info about the `StatsPackageResult` class.
```{r}
getClass("StatPackageResult")
```
## Looking at the object's structure
We can use the `str()` function to look at the overall structure of our object. We can see information about the actual `data.frame`s in the `data` slot and the `statistics` slot.
```{r}
str(my_result_object)
```
## Accessing a slot directly
S4 uses the `@` sign instead of the `$` sign to access the data in a slot.
```{r}
glimpse(my_result_object@data)
```
# Methods
We've defined two methods that work on our class: `get_statistics()` and `get_significant_results()`.
Let's try out and see what `get_statistics()` does:
```{r}
get_statistics(my_result_object)
```
```{r}
get_statistics
showMethods("get_statistics")
getMethod(f="get_statistics", signature = signature("StatPackageResult"))
```
How do we know about the methods that belong to a class? Hopefully, this is documented in the documentation about the class.
However, one big issue is that there is no easy way to find all the methods that act upon the class that are in other packages.
# PROBLEM
There is another method for our `StatsPackageResult` class: `get_significant_results()`. Try it out below:
```{r}
get_significant_results(my_result_object, cutoff = 0.1)
```
Show the method code for `get_significant_results()`. Hint: you'll have to find the signature for the method before you can use `getMethod()`
```{r}
##Space for your answer here
```
# Making a new `StatsPackageResult` object/Type Checking
Now that you see the basics behind the `StatsPackageResult` class, you can now make a `StatsPackageResult` object. You will probably do this in your package code when you're returning a result.
The `StatsPackageResult()` function is called an *constructor*. We need to give it values for each of the slots to make a valid object. (There is also a way to make a new object with the `new()` function, but it's currently not recommended.)
```{r}
#first define the statistics slot
stat_frame <- data.frame(group=c("setosa", "virginica", "versicolor"), pvalue=c(0.5, 0.2, 0.001))
my_s4_object_new <- StatPackageResult(data = iris, statistics = stat_frame)
```
We know that the required slots are for `data`, which needs to be a `data.frame`, and `statistics`, which also needs to be a `data.frame`. What if we make a new object where `statistics` is a `vector`?
```{r}
cw <- as.data.frame(ChickWeight)
new_obj <- StatPackageResult(data=cw, statistics=c(1,1,1))
```
What happens when you only provide a `data` argument? What happens?
```{r}
cw <- data.frame(ChickWeight)
new_obj2 <- StatPackageResult(data=cw)
```
We also have a requirement that the `statistics` slot has a `data.frame` with the name `pvalue`. We'll see how we did this in the `Validity` section below.
```{r}
new_obj3 <- StatPackageResult(data=cw, statistics=cw)
```
# How did we define the `StatsPackageResult` Class?
Take a look at `R/classes-S4.R`. The first thing we did was define the class with `setClass`, where we can specify the data types of each of the slots. Note that this also defines what's called the *constructor*, or the function that lets us make new objects.
```
StatPackageResult <- setClass("StatPackageResult",
slots = c(data = "data.frame",
statistics = "data.frame"),
prototype = c(data=NA, statistics=NA)
)
```
# Defining methods for `StatsPackageResult`
S4 methods are designed to be flexible. The downside to the flexibility is that they are a little harder to program with and set up. Whereas in S3, we could define a method for our S3 class by just naming a function with a dot (such as `print.stats_result`), S4 requires us to first establish a `generic` function.
The `generic` function is designed to use different code based on the class of the object, much like S4 generics. Take a look at the generic method `show`. You can see all of the different classes it works on. This `object="Module"` is called a *signature*, and it defines how the generic function decides which method code to dispatch.
Here are all the S4 methods for the `show` generic function based on their signature:
```{r}
showMethods("show")
```
Because methods can be dispatched on multiple signatures, you first have to define the method using `setGeneric()`:
```
#always need to define the generic method first
setGeneric(name= "statistics",
def= function(object) {
standardGeneric("statistics")
}
)
```
This creates a *generic* function called `statistics`, which has a single argument: `object`. Note that `object` isn't limited to our `StatsPackageResult` class. We could also define a `statistics` method for other S4 objects.
Once the generic is created, you can now define a method for your class.
```
#define the function
setMethod(f="statistics",
#signature is how the function is called
signature = signature("StatPackageResult"),
definition = function(object){
return(object@statistics)
})
```
# Validity
Say you need to have another requirement for the `data.frame` in the `statistics` slot, such as requiring a column called `pvalue`. We can do this by adding a validity function using `setValidity`:
```
check_stats_object <- function(object){
#check whether statistics data.frame contains "pvalue"
#as column name
if("pvalue" %in% colnames(get_statistics(object))) {TRUE}
else{ "statistics slot needs a pvalue column"}
}
setValidity("StatsResultObject", check_stats_object)
```
# Inheritance
The best thing about Object Oriented Programming is that you don't have to reinvent the wheel. You can *extend* a current class definition to make a new class. Here we're making a new class `AnovaResult` that is based on our `StatPackageResult`. You can see we use the `contains` argument to say we're extending `StatPackageResult`.
Also, note we're adding a `groups` slot. The other slots from `StatPackageResult`, `data` and `statistics` are *inherited* from `StatPackageResult`.
```
#Inheritance: making a new class from our old class
AnovaResult <- setClass("AnovaResult",
contains= "StatPackageResult",
slots=c(groups="character"))
```
```{r}
stat_frame <- data.frame(group=c("setosa", "virginica", "versicolor"), pvalue=c(0.5, 0.2, 0.001))
groups <- c("setosa", "virginica", "versicolor")
anova_result_s4 <- AnovaResult(data = iris, statistics = stat_frame, groups=groups)
#get the class definition
getClass("AnovaResult")
```
Show the new structure of our inherited object.
```{r}
str(anova_result_s4)
```
# Extending methods from a superlcass
Methods are class specific, but they also inherit from the *super* class that our new class is derived from. This means you can add extra code to a method by *overriding* the method for the new class and using `callNextMethod()` to execute the code belonging to the superclass (which is `StatPackageResult`).
```
#Overriding the get_statistics method for StatPackageResult
setMethod("get_statistics",
signature = signature(object = "AnovaResult"),
definition = function(object){
print(object@groups)
#now run the code that belongs to the method
#for the superclass ("StatPackageResult")
callNextMethod(object)
}
)
```
Try running `get_statistics()` on `anova_result_s4`. It will first print out `group` and then return the `statistics` slot.
```{r}
get_statistics(anova_result_s4)
```
# What we didn't cover today:
- Updating old objects
- Getting S3 and S4 to work together
- Multiple Inheritance and Dispatch
- The ability of a class to inherit from multiple classes
- Hadley covers this much better than I could: https://adv-r.hadley.nz/s4.html#s4-dispatch
# Resources
Portions of this notebook were gratefully adapted from:
- S4 Classes and Methods: https://kasperdanielhansen.github.io/genbioconductor/html/R_S4.html
- A practical Tutorial on S4 Programming: https://www.bioconductor.org/help/course-materials/2013/CSAMA2013/friday/afternoon/S4-tutorial.pdf
- https://www.cyclismo.org/tutorial/R/s4Classes.html
- Advanced R: https://adv-r.hadley.nz/