-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
209 lines (138 loc) · 10.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
title: "Depression and Anxiety in Medical Students: Analyses Report "
author: Juan C. López Tavera
output:
html_document:
keep_md: yes
theme: cosmo
---
```{r settings, include=FALSE, warning=FALSE}
source('~/github/medstudents-depression/data/processed/medstudents-depression.R')
source('~/github/medstudents-depression/R/cleane.R')
source('~/github/medstudents-depression/R/loader.R')
loader("VIM", "ggplot2", "readr", "ggthemes", "knitr", "DiagrammeR")
```
## Background and Objectives
There's no health without mental health; especially since mental illnesses account for one of the leading --if not the leading-- causes of years lived with disability across the globe. Yet, the burden of mental health is often underestimated.
Obesity and overweight are also a heavy burden on public health; they are important risk factors for a considerable number of chronic illnesses --notably cardiovascular diseases-- and are a leading cause of disability as well.
Given how common and relevant these health issues are, we were motivated to look deeper into them, and we asked ourselves: if someone has obesity or overweight, is she more likely to suffer depression or anxiety? How do obesity and overweight interact with mental health issues?
If we could reject the null hypothesis that a higher body-mass index (BMI) does not predict the presence of depression or anxiety disorders, we would help to more easily spotlight individuals who might need further psychiatric assistance. To test this hypothesis, we designed an observational study to measure demographic, BMI and mental health variables of Mexican Medical Students.
In this report, I'm focusing on communicating what we found, which is the basis of a more formal publication (in the making).
# Methods
We designed and ran an observational research study to assess the relation between clinically detectable depressive and anxiety disorders and body-mass index.
To achieve this goal, we randomly selected 1250 undergrad medical students from all grades, from first through fourth. Grouped by grade enrollment, the number of sampled students was proportional to the actual number of students enrolled in each grade.
In November 2013, in _sunny_ Guadalajara during a in a four-day window, we personally distributed 1250 surveys to undergraduate medical studentes, and collected 783 undergrad medical students (from 1 through 4 school years), from which we discarded `r nrow(raw_data) - complete.cases(raw_data) %>% sum` to get a final sample size of `r nrow(data)`. [Open question: should we keep them?]
With a response rate of `r round(nrow(raw_data) / 1250, 1) * 100`%, we got to a final sample size of n = `r nrow(raw_data)`. In the following table, we show the preogression of how we arrived to the final sample size.
```{r table1, echo=FALSE}
Grade <- 1:4
N <- c(1299, 1285,1061,930)
Intended_n <- c(339, 379, 296, 236)
Sampled_n <- tapply(X = raw_data$year, INDEX = raw_data$year, length) %>% as.numeric
Complete_n <- tapply(X = data$year, INDEX = data$year, length) %>% as.numeric
df <- cbind.data.frame(Grade, N, Intended_n, Sampled_n, Complete_n)
df <- rbind(df,colSums(df))
df$Grade <- c(Grade, "total")
write_csv(x = df, path = "~/github/medstudents-depression/data/processed/sample.csv")
kable(df, format = "html")
```
The following diagram depicts how we arrived at the final sample size.
```{r sample size flowchart, echo=FALSE}
mermaid("
graph LR
A(4575 students)-->B(1250 sampled students)
B-->C(783 returned surveys)
C-->D(757 complete surveys)
"
)
```
Distinguish prespecified from exploratory analyses, including subgroup analyses.
## Ethics Statement
## Variables
Clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable
* Socio-demographic questions and self-reported body measurements: age, binary gender, height and weight.
* Patient Health Questionnaire 9: Self-reported questionaire to spot cases of clinically detectable depressive disorders.
* General Anxiety Disorder 7: Self-reported questionaire to detect cases of clinical anxiety.
* Epworth Scale: To assess levels of self-reported day-sleepiness.
* Clinical attention history: Have you ever been diagnosed with X? Have you ever prescribed to treat X?
## Data sources and measurement
About BMI, PHQ, GAD y Epworth
See the [reprinted the original survey](medstudents-depression/reference/Cuestionario final español - Google Forms.pdf) in Spanish.
## Bias
The main limitation of this study was [survivorship bias](https://en.wikipedia.org/wiki/Survivorship_bias]): what happened to the `r 1250 - nrow(data)` missing observations?
Another potential limitation of the study might be the length of the survey affecting the number of missing responses, but, as can be seen in the right plot, the number of NAs in the data set is still low.
```{r NA distribution, echo=FALSE, cache=TRUE, warning=FALSE, fig.align='center', fig.width = 13, fig.height= 8}
par(mar = c(2,2,2,2), lty = 0)
nas_plot <- VIM::aggr(x = raw_data,
col = c('powderblue', 'tomato1'),
numbers = TRUE,
sortVars = FALSE,
labels = names(raw_data),
cex.axis = .5,
gap = 3,
ylab = c("Histogram of missing data", "Pattern of missing data"),
only.miss = TRUE
)
```
## Statistical methods
For observational studies, authors are required to clearly specify (a) What specific hypotheses the researchers intended to test, and the analytical methods by which they planned to test them; (b) What analyses they actually performed; and (c) When reported analyses differ from those that were planned, authors must provide transparent explanations for differences that affect the reliability of the study's results
(a) Describe all statistical methods, including those used to control for confounding
(b) Describe any methods used to examine subgroups and interactions
(c) Explain how missing data were addressed
(d)Cross-sectional study—If applicable, describe analytical methods taking account of sampling strategy
(e) Describe any sensitivity analyses
I used `r R.Version()$version.string` -- `r R.Version()$nickname` on a x86_64-apple-darwin13.4.0 (64-bit) platform. The packages used are:
# Results
We randomly selected 1250 Medical students out of ~3000 enrolled at the time, and stratified this target population by school year, weighting the subsampling process according to the size of each stratum, which made the subsample size of proportional to the subpopulation size. We got a population (see Table 1)
The Results section should include all primary and secondary outcome measures analyzed. The section may be divided into subsections, each with a concise subheading. Tables and figures central to the study should be included in the main paper. The Results section should be written in past tense.
PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.
Large data sets, including raw data, may be deposited in an appropriate public repository. [See our list of recommended repositories.](http://journals.plos.org/plosmedicine/s/data-availability#loc-recommended-repositories)
For smaller data sets and certain data types, authors may provide their data within [Supporting Information files ](http://journals.plos.org/plosmedicine/s/supporting-information)accompanying the manuscript. Authors should take care to maximize the accessibility and reusability of the data by selecting a file format from which data can be efficiently extracted (for example, spreadsheets or flat files should be provided rather than PDFs when providing tabulated data).
*Give numeric results not only as derivatives (for example, percentages) but also as the absolute numbers from which the derivatives were calculated, and specify the statistical significance attached to them, if any. Restrict tables and figures to those needed to explain the argument of the paper and to assess supporting data. Use graphs as an alternative to tables with many entries; do not duplicate data in graphs and tables. Avoid nontechnical uses of technical terms in statistics, such as "random" (which implies a randomizing device), “normal,” “significant,” “correlations,” and “sample.”*
## Participants
(a) Report numbers of individuals at each stage of study—eg numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analysed
(b) Give reasons for non-participation at each stage
(c) Consider use of a flow diagram
## Descriptive data
(a) Give characteristics of study participants (eg demographic, clinical, social) and information on exposures and potential confounders
(b) Indicate number of participants with missing data for each variable of interest
## Depressive disorders data
```{r Depression density scaled by severity, echo=FALSE, cache=TRUE, warning=FALSE, fig.align='center', fig.width = 13, fig.height= 8}
ddp <-
ggplot(
data = data,
mapping = aes(x = phq.score, fill = phq.scale, colour = phq.scale)
)
ddp <- ddp +
geom_density(alpha = 0.5, adjust = 1.5) +
theme_base() +
scale_fill_ptol(name = "Depression Severity") +
scale_color_ptol(name = "Depression Severity") +
labs(
list(title = "Distribution of PHQ-9 Scores by Depression Severity", x = "PHQ-9 Score", y = "Density")
) +
expand_limits(x = c(0,27))
ddp
```
## Anxiety disorders data
```{r Anxiety density scaled by severity, echo=FALSE, cache=TRUE, warning=FALSE, fig.align='center', fig.width = 13, fig.height= 8}
gdp <- ggplot(
data = data,
mapping = aes(x = gad.score, fill = gad.scale, colour = gad.scale)
)
gdp <- gdp +
geom_density(alpha = 0.5, adjust = 1.5) + theme_base() +
scale_fill_ptol(name = "Anxiety Severity") +
scale_color_ptol(name = "Anxiety Severity") +
labs(
list(title = "Distribution of GAD-7 Scores by Anxiety Severity", x = "GAD-7 Score", y = "Density")
) + expand_limits(x = c(0,21))
gdp
```
## Main results
(a) Give unadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (eg, 95% confidence interval). Make clear which confounders were adjusted for and why they were included
(b) Report category boundaries when continuous variables were categorized
(c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period
## Other analyses
Report other analyses done—eg analyses of subgroups and interactions, and sensitivity analyses
# Conclusions
Interpretation of the results with any important recommendations for future research.