-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
170 lines (134 loc) · 7.29 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
output:
md_document:
variant: markdown_github
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# MCHT
*Version `r packageVersion("MCHT")`*
[![GitHub issues](https://img.shields.io/github/issues/ntguardian/MCHT.svg)](https://github.com/ntguardian/MCHT/issues)
[![GitHub forks](https://img.shields.io/github/forks/ntguardian/MCHT.svg)](https://github.com/ntguardian/MCHT/network)
[![GitHub stars](https://img.shields.io/github/stars/ntguardian/MCHT.svg)](https://github.com/ntguardian/MCHT/stargazers)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Github All Releases](https://img.shields.io/github/downloads/ntguardian/MCHT/total.svg)](https://github.com/ntguardian/MCHT)
**MCHT** is a package implementing an interface for creating and using Monte
Carlo tests. The primary function of the package is `MCHTest()`, which creates
functions with S3 class `MCHTest` that perform a Monte Carlo test.
## Installation
**MCHT** is not presently available on CRAN. You can download and install
**MCHT** from GitHub using [**devtools**](https://github.com/r-lib/devtools) via
the R command `devtools::install_github("ntguardian/MCHT")`.
## Monte Carlo Hypothesis Testing
Monte Carlo testing is a form of hypothesis testing where the $p$-values are
computed using the empirical distribution of the test statistic computed from
data simulated under the null hypothesis. These tests are used when the
distribution of the test statistic under the null hypothesis is intractable or
difficult to compute, or as an exact test (that is, a test where the
distribution used to compute $p$-values is appropriate for any sample size, not
just large sample sizes).
Suppose that $s_n$ is the observed value of the test statistic and large values
of $s_n$ are evidence against the null hypothesis; normally, $p$-values would be
computed as $1 - F(s_n)$, where $F(s_n) = F(S_n \leq s_n)$ is the cumulative
distribution functions and $S_n$ is the random variable version of $s_n$. We
cannot use $F$ for some reason; it's intractable, or the $F$ provided is only
appropriate for large sample sizes.
Instead of using $F$ we will use $\hat{F}_N$, which is the empirical CDF of
the same test statistic computed from simulated data following the distribution
prescribed by the null hypothesis of the test. For the sake of simplicity in
this presentation, assume that $S$ is a continuous random variable. Now our
$p$-value is $1 - \hat{F}_N(s_n)$, where $\hat{F}_N(s_n) = \frac{1}{N}\sum_{j =
1}^{N}I_{\{\tilde{S}_{n,j} \leq s_n\}}$ where $I$ is the indicator function and
$\tilde{S}_{n,j}$ is an independent random copy of $S_n$ computed from simulated
data with a sample size of $n$.
The power of these tests increase with $N$ (see [1]) but modern computers are
able to simulate large $N$ quickly, so this is rarely an issue. The procedure
above also assumes that there are no nuisance parameters and the distribution of
$S_n$ can effectively be known precisely when the null hypothesis is true (and
all other conditions of the test are met, such as distributional assumptions). A
different procedure needs to be applied when nuisance parameters are not
explicitly stated under the null hypothesis. [2] suggests a procedure using
optimization techniques (recommending simulated annealing specifically) to
adversarially select values for nuisance parameters valid under the null
hypothesis that maximize the $p$-value computed from the simulated data. This
procedure is often called *maximized Monte Carlo* (MMC) testing. That is the
procedure employed here. (In fact, the tests created by `MCHTest()` are the
tests described in [2].) Unfortunately, MMC, while conservative and exact, has
much less power than if the unknown parameters were known, perhaps due to the
behavior of samples under distributions with parameter values distant from the
true parameter values (see [3]).
## Bootstrap Hypothesis Testing
Bootstrap statistical testing is very similar to Monte Carlo testing; the key
difference is that bootstrap testing uses information from the sample. For
example a parametric bootstrap test would estimate the parameters of the
distribution the data is assumed to follow and generate datasets from that
distribution using those estimates as the actual parameter values. A permutation
test (like Fisher's permutation test; see [4]) would use the original dataset
values but randomly shuffle the labeles (stating which sample an observation
belongs to) to generate new data sets and thus new simulated test statistics.
$p$-values are essentially computed the same way.
Unlike Monte Carlo tests and MMC, these tests are not exact tests. That said,
they often have good finite sample properties. (See [3].)
See the documentation for more details and references.
## Examples
`MCHTest()` is the main function of the package and can create functions with S3
class `MCHTest` that perform Monte Carlo hypothesis tests.
For example, the code below creates the Monte Carlo equivalent of a $t$-test.
```{r}
library(MCHT)
library(doParallel)
registerDoParallel(detectCores()) # Necessary for parallelization, and if not
# done the resulting function will complain
# on the first use
ts <- function(x, mu = 0) {sqrt(length(x)) * (mean(x) - mu)/sd(x)}
sg <- function(x, mu = 0) {
x <- x + mu
ts(x)
}
rg <- rnorm
mc.t.test <- MCHTest(ts, sg, rg, seed = 20181001, test_params = "mu",
lock_alternative = FALSE,
method = "Monte Carlo One Sample t-Test")
```
The object `mc.t.test()` is an S3 class, and a callable function.
```{r}
class(mc.t.test)
```
`print()` will print relevant information about the construction of the test.
```{r}
mc.t.test
```
Once this object is created, we can use it for performing hypothesis tests.
```{r}
dat <- c(2.3, -0.13, 1.42, 1.51, 3.43, -0.96, 0.59, 0.62, 1.28, 4.07)
t.test(dat, mu = 1, alternative = "two.sided") # For reference
mc.t.test(dat, mu = 1)
mc.t.test(dat, mu = 1, alternative = "two.sided")
```
This is the simplest example; `MCHTest()` can create more involved Monte Carlo
tests. See other documentation for details.
## Planned Future Features
* A function for making diagnostic-type plots for tests, such as a function
creating a plot for the rejection probability function (RPF) as described in
[5]
* A function that accepts a `MCHTest`-class object and returns a function that,
rather than returning a `htest`-class object, returns a function that will
give the test statistic, simulated test statistics, and a $p$-value, in a
list; could be useful for diagnostic work.
## References
1. A. C. A. Hope, *A simplified Monte Carlo test procedure*, JRSSB, vol. 30
(1968) pp. 582-598
2. J-M Dufour, *Monte Carlo tests with nuisance parameters: A general approach
to finite-sample inference and nonstandard asymptotics*, Journal of
Econometrics, vol. 133 no. 2 (2006) pp. 443-477
3. J. G. MacKinnon, *Bootstrap hypothesis testing* in *Handbook of computational
econometrics* (2009) pp. 183-213
4. R. A. Fisher, *The design of experiments* (1935)
5. R. Davidson and J. G. MacKinnon, *The size distortion of bootstrap test*,
Econometric Theory, vol. 15 (1999) pp. 361-376