-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path1RHC.Rmd
279 lines (216 loc) · 9.32 KB
/
1RHC.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
# RHC data description
```{r setup01, include=FALSE}
require(tableone)
require(Publish)
require(MatchIt)
require(cobalt)
```
There is a widespread belief among cardiologists that the right heart catheterization (RHC hereafter; a monitoring device for measurement of cardiac function) is helpful in managing critically ill patients in the intensive care unit. @connors1996effectiveness examined the association of
- *RHC use* during the first 24 hours of care in the intensive care unit and
- a number of health-outcomes such as *length of stay* (hospital).
## Data download
```{block, type='rmdcomment'}
Data is freely available from [Vanderbilt Biostatistics](https://hbiostat.org/data/).
```
```{r, cache=TRUE}
# load the dataset
ObsData <- read.csv("https://hbiostat.org/data/repo/rhc.csv", header = TRUE)
saveRDS(ObsData, file = "data/rhc.RDS")
```
## Analytic data
Below we show the process of creating the analytic data (optional).
```{r, warning=FALSE}
# add column for outcome Y: length of stay
# Y = date of discharge - study admission date
# Y = date of death - study admission date if date of discharge not available
ObsData$Y <- ObsData$dschdte - ObsData$sadmdte
ObsData$Y[is.na(ObsData$Y)] <- ObsData$dthdte[is.na(ObsData$Y)] -
ObsData$sadmdte[is.na(ObsData$Y)]
# remove outcomes we are not examining in this example
ObsData <- dplyr::select(ObsData,
!c(dthdte, lstctdte, dschdte, death, t3d30, dth30, surv2md1))
# remove unnecessary and problematic variables
ObsData <- dplyr::select(ObsData,
!c(sadmdte, ptid, X, adld3p, urin1, cat2))
# convert all categorical variables to factors
factors <- c("cat1", "ca", "cardiohx", "chfhx", "dementhx", "psychhx",
"chrpulhx", "renalhx", "liverhx", "gibledhx", "malighx",
"immunhx", "transhx", "amihx", "sex", "dnr1", "ninsclas",
"resp", "card", "neuro", "gastr", "renal", "meta", "hema",
"seps", "trauma", "ortho", "race", "income")
ObsData[factors] <- lapply(ObsData[factors], as.factor)
# convert our treatment A (RHC vs. No RHC) to a binary variable
ObsData$A <- ifelse(ObsData$swang1 == "RHC", 1, 0)
ObsData <- dplyr::select(ObsData, !swang1)
# Categorize the variables to match with the original paper
ObsData$age <- cut(ObsData$age,breaks=c(-Inf, 50, 60, 70, 80, Inf),right=FALSE)
ObsData$race <- factor(ObsData$race, levels=c("white","black","other"))
ObsData$sex <- as.factor(ObsData$sex)
ObsData$sex <- relevel(ObsData$sex, ref = "Male")
ObsData$cat1 <- as.factor(ObsData$cat1)
levels(ObsData$cat1) <- c("ARF","CHF","Other","Other","Other",
"Other","Other","MOSF","MOSF")
ObsData$ca <- as.factor(ObsData$ca)
levels(ObsData$ca) <- c("Metastatic","None","Localized (Yes)")
ObsData$ca <- factor(ObsData$ca, levels=c("None",
"Localized (Yes)","Metastatic"))
# Rename variables
names(ObsData) <- c("Disease.category", "Cancer", "Cardiovascular",
"Congestive.HF", "Dementia", "Psychiatric", "Pulmonary",
"Renal", "Hepatic", "GI.Bleed", "Tumor",
"Immunosupperssion", "Transfer.hx", "MI", "age", "sex",
"edu", "DASIndex", "APACHE.score", "Glasgow.Coma.Score",
"blood.pressure", "WBC", "Heart.rate", "Respiratory.rate",
"Temperature", "PaO2vs.FIO2", "Albumin", "Hematocrit",
"Bilirubin", "Creatinine", "Sodium", "Potassium", "PaCo2",
"PH", "Weight", "DNR.status", "Medical.insurance",
"Respiratory.Diag", "Cardiovascular.Diag",
"Neurological.Diag", "Gastrointestinal.Diag", "Renal.Diag",
"Metabolic.Diag", "Hematologic.Diag", "Sepsis.Diag",
"Trauma.Diag", "Orthopedic.Diag", "race", "income",
"Y", "A")
saveRDS(ObsData, file = "data/rhcAnalytic.RDS")
```
## Notations
|Notations| Example in RHC study|
|---|---|
|$A$: Exposure status | RHC |
|$Y$: Observed outcome | length of stay |
|$L$: Covariates | See below |
## Variables
```{r vars, cache=TRUE, echo = TRUE}
baselinevars <- names(dplyr::select(ObsData,
!c(A,Y)))
baselinevars
```
## Table 1 stratified by RHC exposure
```{block, type='rmdcomment'}
Only for some demographic and co-morbidity variables; match with Table 1 in @connors1996effectiveness.
```
```{r tab0, cache=TRUE, echo = TRUE}
require(tableone)
tab0 <- CreateTableOne(vars = c("age", "sex", "race", "Disease.category", "Cancer"),
data = ObsData,
strata = "A",
test = FALSE)
print(tab0, showAllLevels = FALSE, )
```
```{block, type='rmdcomment'}
Only outcome variable (Length of stay); slightly different than Table 2 in @connors1996effectiveness (means 20.5 vs. 25.7; and medians 13 vs. 17).
```
```{r tab1, cache=TRUE, echo = TRUE}
tab1 <- CreateTableOne(vars = c("Y"),
data = ObsData,
strata = "A",
test = FALSE)
print(tab1, showAllLevels = FALSE, )
median(ObsData$Y[ObsData$A==0]); median(ObsData$Y[ObsData$A==1])
```
## Basic regression analysis
### Crude analysis
```{r reg1, cache=TRUE, echo = TRUE, results='hide'}
# adjust the exposure variable (primary interest)
fit0 <- lm(Y~A, data = ObsData)
require(Publish)
crude.fit <- publish(fit0, digits=1)$regressionTable[2,]
```
```{r reg1c, cache=TRUE, echo = TRUE}
crude.fit
```
### Adjusted analysis
```{r reg2, cache=TRUE, echo = TRUE, results='hide'}
# adjust the exposure variable (primary interest) + covariates
out.formula <- as.formula(paste("Y~ A +",
paste(baselinevars,
collapse = "+")))
fit1 <- lm(out.formula, data = ObsData)
adj.fit <- publish(fit1, digits=1)$regressionTable[2,]
```
```{r, cache=TRUE, echo = TRUE}
saveRDS(fit1, file = "data/adjreg.RDS")
```
```{r reg2a, cache=TRUE, echo = TRUE}
out.formula
adj.fit
```
### Regression diagnostics
```{r reg2a578, cache=TRUE, echo = TRUE}
plot(fit1)
```
```{block, type='rmdcomment'}
Diagnostics do not necessarily look so good.
```
## Comparison with literature
```{block, type='rmdcomment'}
@connors1996effectiveness conducted a propensity score matching analysis.
```
Table 5 in @connors1996effectiveness showed that, after propensity score pair (1-to-1) matching, means of length of stay ($Y$), when stratified by RHC ($A$) were not significantly different ($p = 0.14$).
### PSM in RHC data
We also conduct propensity score pair matching analysis, as follows.
```{block, type='rmdcomment'}
**Note**: In this workshop, we will not cover Propensity Score Matching (PSM). If you want to learn more about this, feel free to check out this other workshop: [Understanding Propensity Score Matching](https://ehsanx.github.io/psw/) and the [video recording](https://www.youtube.com/watch?v=u4Nl7gnDEAY) on youtube.
```
```{r ps16854, cache=TRUE, echo = TRUE}
set.seed(111)
require(MatchIt)
ps.formula <- as.formula(paste("A~",
paste(baselinevars, collapse = "+")))
PS.fit <- glm(ps.formula,family="binomial",
data=ObsData)
ObsData$PS <- predict(PS.fit,
newdata = ObsData, type="response")
```
```{r ps2, cache=TRUE, echo = TRUE}
logitPS <- -log(1/ObsData$PS - 1)
match.obj <- matchit(ps.formula, data =ObsData,
distance = ObsData$PS,
method = "nearest", replace=FALSE,
ratio = 1,
caliper = .2*sd(logitPS))
```
#### PSM diagnostics
```{r ps2x, cache=TRUE, echo = TRUE}
require(cobalt)
bal.plot(match.obj,
var.name = "distance",
which = "both",
type = "histogram",
mirror = TRUE)
bal.tab(match.obj, un = TRUE,
thresholds = c(m = .1))
```
```{r ps2b, cache=TRUE, echo = TRUE, fig.height=10, fig.width=5}
love.plot(match.obj, binary = "std",
thresholds = c(m = .1))
```
The love plot suggests satisfactory propensity score matching (all SMD < 0.1).
#### PSM results
##### p-value
```{r ps3, cache=TRUE, echo = TRUE}
matched.data <- match.data(match.obj)
tab1y <- CreateTableOne(vars = c("Y"),
data = matched.data, strata = "A",
test = TRUE)
print(tab1y, showAllLevels = FALSE,
test = TRUE)
```
```{block, type='rmdcomment'}
Our conclusion based on propensity score pair matched data ($p \lt 0.001$) is different than Table 5 in @connors1996effectiveness ($p = 0.14$). Variability in results for 1-to-1 matching is possible, and modelling choices may be different (we used caliper option here).
```
##### Treatment effect
- We can also estimate the effect of `RHC` on `length of stay` using propensity score-matched sample:
```{r ps12ryy, cache=TRUE, echo = TRUE}
fit.matched <- glm(Y~A,
family=gaussian,
data = matched.data)
publish(fit.matched)
```
```{r, cache=TRUE, echo = TRUE}
saveRDS(fit.matched, file = "data/match.RDS")
```
### TMLE in RHC data
There are other papers that have used RHC data [@keele2021comparing;@keele2018pre].
```{block, type='rmdcomment'}
@keele2021comparing used TMLE (with super learner SL) method in estimating the impact of RHC on length of stay, and found point estimate $2.01 (95\% CI: 0.6-3.41)$.
```
In today's workshop, we will learn about this **TMLE-SL** methods.