-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathproportions_notebook.Rmd
227 lines (173 loc) · 6.43 KB
/
proportions_notebook.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
---
title: "Proportions Notebook"
output:
md_document:
toc: yes
html_document:
highlight: tango
theme: cerulean
toc: yes
toc_depth: 4
toc_float: yes
---
This code was taken from Statistical Analysis and Reporting in R, Jacob O. Wobbrock, Ph.D.
available here: http://depts.washington.edu/acelab/proj/Rstats/index.html
Proportions.R
(Tests of Proportion & Association)
One Sample
### Binomial test
When to use this test?
*Notes taken from here:https://www.youtube.com/watch?v=WOoS7nVkfDk by Matthew E.Clapham
**Type of Data**: Categorial, only have two categories in a single sample
**Purpose**: To test if Category has an expected count(Goodness of fit).
The **goodness of fit** of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question.
```{r setup}
# pull in the csv
df <- read.csv("data/Proportions/0F0LBs_binomial.csv")
head(df)
```
##### Create factors
Subject id is nominal and Y is categorical
```{r factors}
df$S = factor(df$S) # Subject id is nominal (unused)
df$Y = factor(df$Y) # Y is an outcome of exactly two categories
```
Column Y is made up of two values: `r unique(df$Y)`
```{r xtabs}
xt = xtabs( ~ Y, data=df) # make counts
```
`r xt`
binom.test
**Exact Binomial Test**
Performs an exact test of a simple null hypothesis about the probability of success in a Bernoulli experiment.
https://www.rdocumentation.org/packages/stats/versions/3.1.1/topics/binom.test
```{r binomial_test}
binom.test(xt, p=0.5, alternative="two.sided")
```
### Multinomial test
**Type of Data**: Categorial, 3+ categories in a single sample
**Purpose**: To test if Categories match with expected count(Goodness of fit).
We will use the XNomial library
https://cran.r-project.org/web/packages/XNomial/index.html
```{r Multinomial_library}
# df is a long-format data table w/columns for subject (S) and N-category outcome (Y)
if (!require("XNomial")) install.packages("XNomial");
library(XNomial) # for xmulti
df_multi <- read.csv("data/Proportions/0F0LBs_multinomial.csv")
head(df_multi)
```
```{r Multinomial_factor}
df_multi$S = factor(df_multi$S) # Subject id is nominal (unused)
df_multi$Y = factor(df_multi$Y) # Y is an outcome of ≥2 categories
xt = xtabs( ~ Y, data=df_multi) # make counts
xt # 3 categorical variables
```
```{r Multinomial_test}
xmulti(xt, rep(1/length(xt), length(xt)), statName="Prob")
```
```{r Multinomial_test2}
# the following gives the equivalent result
if (!require("RVAideMemoire")) install.packages("RVAideMemoire");
library(RVAideMemoire) # for multinomial.test
multinomial.test(df_multi$Y)
```
Multinomial post hoc test
```{r Multinomial_post_hoc}
# xt is a table of counts for each category of Y
library(RVAideMemoire) # for multinomial.multcomp
multinomial.multcomp(xt, p.method="holm") # xt shows levels
```
### Chi-squared test
*notes taken from http://www.sthda.com/english/wiki/chi-square-goodness-of-fit-test-in-r
The **chi-square goodness of fit test** is used to compare the observed distribution to an expected distribution, in a situation where we have two or more categories in a discrete data. In other words, it compares multiple observed proportions to expected probabilities.
```{r Multinomial_chi_data}
## One-Sample Chi-Squared test
# df is a long-format data table w/columns for subject (S) and N-category outcome (Y)
df <- read.csv("data/Proportions/0F0LBs_multinomial.csv")
head(df)
```
```{r Multinomial_tab}
df$S = factor(df$S) # Subject id is nominal (unused)
df$Y = factor(df$Y) # Y is an outcome from two or more categories
xt = xtabs( ~ Y, data=df) # make counts
xt
```
```{r Multinomial_chi}
result <-chisq.test(xt)
result
```
The p-value of the test is 7.869e-05, which is less than the significiance level alpha =0.05. We can conclude that the outcomes are not evenly distributed p-value of 7.869e-05.
```{r chi_expected}
# access of the expected value
result$expected
```
```{r chi_p_value}
# access of the expected value
result$p.value
```
```{r chi_mult}
## Chi-Squared post hoc test
# xt is a table of counts for each category of Y
library(RVAideMemoire) # for chisq.multcomp
chisq.multcomp(xt, p.method="holm") # xt shows levels
# for the Chi-Squared values, use qchisq(1-p, df=1), where p is the pairwise p-value.
```
**One-sample post hoc tests**
A different kind of post hoc test for one sample. For Y's response categories (x,y,z), test each proportion against chance.
Two Samples
### Fisher's exact test
*notes from https://www.youtube.com/watch?v=WTDWk4eJIw0 by
Matthew E. Clapham
Method for checking for independance, generally in small samples and small contingency table.
**Types of data**: Categorical, 2x2 contingency table. Technically both row and column marginals should be fixed prior to data collection.
**Purpose**: To test for assocation between the two counts.
**null hypothesis**: Category frequencies are independent of one another between the samples (i.e, there is no association amoung categories)
```{r fishers_df}
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/Proportions/1F2LBs_multinomial.csv")
head(df)
```
```{r fishers_df_factor}
df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
```
```{r fishers_levels}
unique(df$X)
unique(df$Y)
```
```{r fishers_tabs}
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt
```
```{r fishers_test}
res <- fisher.test(xt)
res
```
```{r fishers_post}
## Fisher's post hoc test
# xt is an m × n crosstabs with categories X and Y
library(RVAideMemoire) # for fisher.multcomp
fisher.multcomp(xt, p.method="holm")
```
### G-test
```{r g_test_data}
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
library(RVAideMemoire) # for G.test
df <- read.csv("data/Proportions/1F2LBs_multinomial.csv")
head(df)
```
```{r g_test_factor}
df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
```
```{r g_test_tabs}
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
```
```{r g_test}
G.test(xt)
```
References:
http://www.sthda.com/english/wiki/comparing-proportions-in-r
http://www.sthda.com/english/wiki/chi-square-goodness-of-fit-test-in-r#what-is-chi-square-goodness-of-fit-test