forked from andreamazzella/r4asme
-
Notifications
You must be signed in to change notification settings - Fork 0
/
r4asme06 time.Rmd
320 lines (239 loc) · 9.84 KB
/
r4asme06 time.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
---
title: "6: Stratifying by time"
subtitle: "R 4 ASME"
author: Author – Andrea Mazzella [(GitHub)](https://github.com/andreamazzella)
output: html_notebook
---
-------------------------------------------------------------------------------
## Contents
* Create a "survival object" to analyse cohort studies
* Calculate incidence rates from a cohort study
* Stratify rates
* by a categorical variable
* by a time variable (by doing a Lexis expansion)
## Acknowledgements
Thank you Prof Ruth Keogh @LSHTM, for pointing me at the right direction with regards to `survSplit()` and `pyears()`.
-------------------------------------------------------------------------------
## 0. Packages and options
```{r message=FALSE, warning=FALSE}
# Load packages
library("haven")
library("magrittr")
library("survival")
library("tidyverse")
# Limit significant digits to 3, reduce scientific notation
options(digits = 3, scipen = 9)
```
## 1. Data management
Import and explore `whitehal.dta`. These are the variables of interest:
* *Outcome*: death from cardiac event (`chd`)
* *Exposure*: job grade (`grade4`, `grade`)
* *Dates*
* Date of birth: `timebth`
* Date of entry: `timein`
* Date of exit: `timeout`
* *Confounder*: smoking status (`smok`)
```{r}
# Import the dataset
whitehall <- read_stata("whitehal.dta")
# Preview
whitehall
```
```{r include=FALSE}
# Explore data types
glimpse(whitehall)
```
As you can see, there are no value labels, so I'll add them using the dataset help file.
```{r data_management, include=FALSE}
# Rename and factorise variables, label values
whitehall %<>%
mutate(
id = as.integer(id),
all = as.integer(all), # NB outcomes must remain numerical
chd = as.integer(chd),
grade4 = factor(
grade4,
levels = c(1, 2, 3, 4),
labels = c("admin", "profess", "clerical", "other")
),
smok = factor(
smok,
levels = c(1, 2, 3, 4, 5),
labels = c("never", "ex", "1-14/day", "15-24/day", "25+/day")
),
grade = factor(grade,
levels = c(1, 2),
labels = c("higher", "lower")),
cholgrp = factor(cholgrp),
sbpgrp = factor(sbpgrp))
# Check it worked ok
glimpse(whitehall)
#Summarise
summary(whitehall)
```
-------------------------------------------------------------------------------
## 2. Calculating rates and RR
Calculate rates stratified by exposure (the two grades of employment: `grade`).
You create a survival object with `Surv()`; it contains duration of follow-up and status at end of follow-up. [equivalent to `stset` in Stata]
You then calculate stratified rates `pyears()`: in the formula, first you put the survival object you have just created, and then the stratification; you then pipe this into `summary()` [equivalent to `strate` in Stata].
(NB: `pyears()` automatically scales from days to years - if you don't want this to happen, for example because your Surv object is already set in years, you need to indicate it with argument `scale = 1`.)
```{r}
# Create survival object
surv_white <- whitehall %$% Surv(time = as.numeric(timein) / 365.25,
time2 = as.numeric(timeout) / 365.25,
event = chd)
# Calculate rates
pyears(surv_white ~ grade, data = whitehall, scale = 1) %>%
summary(n = F, rate = T, ci.r = T, scale = 1000)
```
Calculate the cardiac mortality rate ratio in these two job groups:
```{r}
8.8 / 4.4
```
-------------------------------------------------------------------------------
## 3. Age as timescale
In order to change the timescale to current age, we need to change the `origin` argument in the `Surv()` function to time of birth.
```{r}
# Create survival object
surv_white_age <- whitehall %$% Surv(time = as.numeric(timein) / 365.25,
time2 = as.numeric(timeout) / 365.25,
event = chd,
origin = as.numeric(timebth) / 365.25) ###
# Check rates haven't changed
pyears(surv_white_age ~ grade, data = whitehall, scale = 1) %>%
summary(n = F, rate = T, ci.r = T, scale = 1000)
```
-------------------------------------------------------------------------------
## 4-7. Lexis expansion
Now let's split the follow-up times into intervals that are specific to sidderent agebands.
To check what R is doing, we'll check record 5001 before and after splitting.
```{r}
whitehall %>% filter(id == "5001") %>% select(-2, -(4:7), -(9:11))
```
Use the `survSplit()` function to create 5-year groups of current age between age 50 and 80, and 10-year groups for the youngest and oldest groups [equivalent to `stsplit` in Stata].
```{r}
# Split
white_split <- survSplit(surv_white_age ~ .,
data = whitehall,
cut = c(40, seq(50, 80, 5), 90),
episode = "ageband")
```
What happened to person record 5001?
It has been expanded into five, and two new columns have been added, indicating the ageband.
```{r}
white_split %>% filter(id == "5001") %>% select(-2, -(4:7), -(9:11))
```
The Lexis expansion won't change the original dataset, therefore the information on events and rates isn't corrupted.
```{r}
# Stratify by grade
pyears(surv_white_age ~ grade,
data = white_split,
scale = 1) %>%
summary(n = F, rate = T, ci.r = T, scale = 1000)
```
-------------------------------------------------------------------------------
## 8. Stratifying by age band
Now we can use this newly created variable to stratify the rates by age band.
What is the effect of age on cardiac-related mortality?
```{r}
# Stratify by ageband
pyears(surv_white_age ~ ageband,
data = white_split,
scale = 1) %>%
summary(n = F, rate = T, ci.r = T)
```
-------------------------------------------------------------------------------
## 9. Further stratification
You can stratify by another categorical variable by adding it after a + in the right hand-side of the formula in `pyears()`. You then change the options in `summary()` to hide events and person-time. There are options to calculate RR instead of showing the risks, but I can't get them to work.
```{r}
# Calculate rates stratified by age and grade
rates_age_grade <- pyears(surv_white_age ~ ageband + grade,
data = white_split,
scale = 1)
summary(rates_age_grade, n = F, event = F, pyears = F, rate = T, scale = 1000)
# does not work
# summary(rates_age_grade, rr = T, ci.rr = T, n = F, event = F, pyears = F)
# Calculate rate ratios manually
tribble(
~age_band, ~RR,
"2", 2.13 / 0,
"3", 3.86 / 1.64,
"4", 3.95 / 2.53,
"5", 10.17 / 4.67,
"6", 13.36 / 7.24,
"7", 8.70 / 14.21,
"8", 12.72 / 20.44,
"9", 39.35 / 23.70
)
```
*Issue* I also don't know how to calculate an overall MH rate ratio, MH χ², and test for interaction [stuff that Stata's `stmh` does as part of the same command]
`epiR::epi.2by2()` apparently can calculate MH rate ratios by setting the option "method" to "cohort.time". The downside: it requires a very specific input, a 3-way table containing cases and person-years stratified by ageband and the other categorical variable, and I'm not sure how to convert the output from `pyears()` into this very specific format without doing it manually.
```{r does not work}
#test <- summary(py_grade_age, n = F)
#epiR::epi.2by2(test)
```
*Workaround*: use regression methods instead
-------------------------------------------------------------------------------
# Optional exercises
## 10. Smoking
Examine the effect of smoking on cardiac-related mortality.
First, relevel the smoking variable into three: never/ex/current smokers.
```{r}
# Check levels
whitehall %$% table(smok, useNA = "ifany")
# Recode
whitehall %<>%
mutate(smok3 = as.factor(case_when(smok == "never" ~ "never",
smok == "ex" ~ "ex",
smok == "1-14/day" ~ "current",
smok == "15-24/day" ~ "current",
smok == "25+/day" ~ "current")))
# Order levels
whitehall$smok3 <- fct_relevel(whitehall$smok3, "never", "ex", "current")
# Check it worked
whitehall %$% table(smok3, smok)
```
```{r}
# Rates stratified by smoking
pyears(surv_white_age ~ smok3, data = whitehall, scale = 1) %>%
summary(n = F, rate = T, ci.r = T, scale = 1000)
```
The mortality rate is higher in smokers than in never-smokers.
Does smoke confound the relationship between job grade and cardiac-related mortality?
```{r}
# Create a new survival object
surv_white_age2 <- whitehall %$% Surv(time = as.numeric(timein) / 365.25,
time2 = as.numeric(timeout) / 365.25,
event = chd,
origin = as.numeric(timebth) / 365.25)
# Split
white_split2 <- survSplit(surv_white_age2 ~ .,
data = whitehall,
cut = c(40, seq(50, 80, 5), 90),
episode = "ageband")
# Stratified rates
pyears(surv_white_age2 ~ smok3 + grade,
data = white_split2,
scale = 1) %>%
summary(n = F, event = F, pyears = F, rate = T, scale = 1000)
5.1 / 1.3
7.4 / 4.5
10.6 / 6.3
```
-------------------------------------------------------------------------------
## 11. Statifying on three variables
Examine the effect of job grade on cardiac mortality, adjusting for both age and smoking at the same time. What can you conclude?
*ERROR* code breaks, not sure why
```{r}
# Stratified rates
# pyears(surv_white_age2 ~ ageband + grade + smok3,
# data = white_split2,
# scale = 1) %>%
# summary(n = F, event = F, pyears = F, rate = T, scale = 1000)
```
-------------------------------------------------------------------------------
## 12. Standardised mortality rate
Not sure how to do this
```{r}
```
-------------------------------------------------------------------------------