-
Notifications
You must be signed in to change notification settings - Fork 75
/
Copy pathreal-life-example.qmd
60 lines (36 loc) · 3.13 KB
/
real-life-example.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Real-life example
This is a walk through one relatively simple simulation written to check whether the following two models would provide the same results:
* A generalized linear model based on a contingency table of counts (Poisson distribution).
* A generalized linear model with one line per observation and the occurrence of the variable of interest coded as 'Yes' or 'No' (binomial distribution).
I created this code while preparing my preregistration for a simple behavioural ecology experiment about methods for independently manipulating palatability and colour in small insect prey ([article](https://doi.org/10.1371/journal.pone.0231205), [OSF preregistration](https://osf.io/f8uk9?view_only=3943e7bb9c5f4effbf119ca5b062fe80)).
The R script screenshot below, `glm_Freq_vs_YN.R`, can be found in the folder [Ihle2020](https://github.com/lmu-osc/Introduction-Simulations-in-R/tree/main/Ihle2020).
This walkthrough will use the steps as defined on the page '[General structure](./general-structure.qmd)'.
1. **Define sample sizes** (within a dataset and number of repetitions), **experimental design** (fixed dataset structure, e.g. treatment groups, factors), and **parameters** that will need to vary (here, the strength of the effect).
<img src="../assets/define.png" width="1000">
<br/>
2. **Generate data** (here, using `sample()` and the probabilities defined in step 1) and format it in two different ways to accommodate the two statistical tests to be compared.
<img src="../assets/generate.png" width="1000">
<br/>
3. **Run the statistical test and save the parameter estimate of interest for that iteration**. Here, this is done for both statistical tests to be compared.
<img src="../assets/test.png" width="1000">
<br/>
4. **Repeat** steps 2 (data simulation) and 3 (data analyses) to get the distribution of the parameter estimates by wrapping these steps in a function.
Definition of the function at the beginning:
<br/>
<img src="../assets/replicate1.png" width="800">
<br/>
Output returned from the function at the end:
<br/>
<img src="../assets/replicate2.png" width="1000">
<br/>
Repeat the function `nrep` times. Here, `pbreplicate()` is used to provide a bar of progress for R to run this command.
<br/>
<img src="../assets/replicate3.png" width="1000">
<br/>
5. **Explore the parameter space**. Here, vary the probabilities of sampling between 0 and 1 depending on the treatment group category.
<img src="../assets/explore.png" width="1000">
<br/>
6. **Analyse and interpret the combined results of many simulation repetitions**. In this case, the results of the two models were qualitatively the same (comparison of results for a few different parameter values), and both models gave the same expected 5% false positive results when no effect was simulated. Varying the effect (the probability of sampling 0 or 1 depending on the experimental treatment) allowed us to find the minimum effect size for which the number of positive results of the tests is over 80%.
<img src="../assets/conclude.png" width="1000">
<br/>
***