-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path10-BayesEst.qmd
1505 lines (955 loc) · 84.6 KB
/
10-BayesEst.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Introduction to Bayesian Estimation
In this chapter, you will learn about the Bayesian approach to estimation by fitting regression models using the `r pkg("brms")` package [@burkner_brms_2017]. This is the most flexible approach to modelling as you can select your relevant outcome and predictors rather than relying on out-of-the-box statistical tests. We will be focusing on estimation and exploring the posterior of your model to make inferences. You will build on the skills you learnt in chapter 9, but extending it to more flexible priors and statistical models. We are mainly going to focus on simple and multiple linear regression in this chapter, but the final section outlines further resources to learn about more advanced distribution families and models.
## Learning objectives
By the end of this chapter, you should be able to:
1. Understand the steps involved in fitting and exploring Bayesian regression models.
2. Apply these steps to [simple linear regression](#simpleregression).
3. Apply these steps to [multiple linear regression](#multipleregression).
4. Create data visualisation to graphically communication the results of your Bayesian regression models.
To follow along to this chapter and try the code yourself, please download the data files we will be using in [this zip file](data/10_data.zip).
In this chapter, we need a few extra packages. The one most likely to cause trouble is the main `r pkg("brms")` package since it uses Stan and you need a C++ compiler. See the [installing R appendix](#installing-r) for guidance. If you are really struggling or its very slow on your computer, `r pkg("brms")` is available on the R Studio server. See the course overview page for a link if you have never used it before.
```{r packages, warning=FALSE, message=FALSE}
library(brms) # fitting Bayesian models
library(bayestestR) # helper functions for plotting and understanding the models
library(tidybayes) # helper functions for combining plotting and tidy data from models
library(tidyverse)
library(see) # helper functions for plotting objects from bayestestR
library(emmeans) # Handy function for calculating (marginal) effect sizes
library(patchwork) # Combine multiple plots
```
## Simple Linear Regression {#simpleregression}
### Guided example (Schroeder & Epley, 2015)
For this guided activity, we will use data from the study by @schroeder_sound_2015. We used this in the chapter 9 for the independent activity, so we will explore the data set as the guided example in this chapter to see how we can refit it as a Bayesian regression model.
As a reminder, the aim of the study was to investigate whether delivering a short speech to a potential employer would be more effective at landing you a job than writing the speech down and the employer reading it themselves. Thirty-nine professional recruiters were randomly assigned to receive a job application speech as either a transcript for them to read or an audio recording of them reading the speech.
The recruiters then rated the applicants on perceived intellect, their impression of the applicant, and whether they would recommend hiring the candidate. All ratings were originally on a Likert scale ranging from 0 (low intellect, impression etc.) to 10 (high impression, recommendation etc.), with the final value representing the mean across several items.
For this example, we will focus on the hire rating (variable `r hl("Hire_Rating")`) to see whether the audio condition would lead to higher ratings than the transcript condition (variable `r hl("CONDITION")`).
Remember the key steps of Bayesian modelling from lecture 10 [@heino_bayesian_2018]:
1. Identify data relevant to the research question
2. Define a descriptive model, whose parameters capture the research question
3. Specify prior probability distributions on parameters in the model
4. Update the prior to a posterior distribution using Bayesian inference
5. Check your model against data, and identify potential problems
#### Identify data
For this example, we have the data from Schroeder and Epley with one outcome and one categorical predictor. The data are coded 0 for those in the transcript group and 1 for those in the audio group.
```{r Schroeder data, warning=FALSE, message=FALSE}
Schroeder_data <- read_csv("data/Schroeder_hiring.csv") %>%
mutate(CONDITION = as.factor(CONDITION))
```
#### Define a descriptive model
The next step is to define a descriptive model. In chapter 9, we used the `r pkg("BayesFactor")` package to use out-of-the-box tests like a t-test, but we saw in the lecture with the <a href="https://lindeloev.github.io/tests-as-linear/" target="_blank">Lindelöv (2019) blog post</a>, common statistical models are just different expressions of linear models. So, we can express the same t-test as a linear model, using `r hl("CONDITION")` as a single categorical predictor of `r hl("Hire_Rating")` as our outcome. You can enter this directly in the `r hl(brm())` function below, but its normally a good idea to clearly outline each component.
```{r Schroeder model}
Schroeder_model1 <- bf(Hire_Rating ~ CONDITION)
```
#### Specify prior probability of parameters
Once you get used to the `r pkg("brms")` package, you start to learn which priors you need for simple cases, but now we have stated a model, we can see which parameters can be assigned a prior.
```{r}
get_prior(Schroeder_model1, # Model we defined above
data = Schroeder_data) # Which data frame are we using?
```
This tells us which priors we can set and what the default settings are. We have the prior, the class of prior, relevant coefficients, and the source which will all be default for now. The prior tells you what the default is. For example, there are flat uninformative priors on coefficients. When we set priors, we can either set priors for a whole class, or specific to each coefficient. With one predictor, there is only one coefficient prior to set, so it makes no difference. But when you have multiple predictors like later in chapter 10, it becomes more useful.
Coefficients are assigned flat priors, meaning anything is possible between minus infinity and infinity. We can visualise the priors to see what they expect one-by-one. You will see how you can plot the priors yourself shortly.
```{r default flat prior, echo=FALSE}
# Visualise default flat prior on coefficient
prior <- prior(normal(0, 10000), class = b) # Set prior and class
prior %>%
parse_dist() %>% # Function from tidybayes/ggdist to turn prior into a dataframe
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) + # Fill in details from prior and add fill
stat_slab(normalize = "panels") + # ggdist layer to visualise distributions
scale_fill_viridis_d(option = "plasma", end = 0.9) + # Add colour scheme
scale_x_continuous(breaks = c(-100, 0, 100), labels = c(expression(-infinity), 0, expression(infinity))) +
guides(fill = "none") + # Remove legend for fill
labs(x = "Value", y = "Density", subtitle = "b: flat") +
theme_classic() +
coord_cartesian(xlim = c(-100, 100)) # Trick to show flat prior - across a huge range but limit to smaller scale
```
The intercept and sigma are assigned student t distributions for priors, full for the intercept and a half student t for sigma. These are both quite weak priors to have minimal influence on the model, but they do not factor in your knowledge about the parameters. The default prior for the intercept peaks slightly above 0 and most likely between -5 and 15.
```{r plot default intercept prior, echo=FALSE}
prior <- c(prior(student_t(3, 4, 3), class = Intercept))
prior %>%
parse_dist() %>%
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) +
stat_slab(normalize = "panels") +
scale_fill_viridis_d(option = "plasma", end = 0.9) +
guides(fill = "none") +
labs(x = "Value", y = "Density", title = paste0(prior$class, ": ", prior$prior)) +
theme_classic()
```
The default prior for sigma is a half student t distribution which peaks at 0. This plot demonstrates the full student t distribution, but sigma cannot be smaller than 0, so it would extend from 0 to the positive values.
```{r plot default sigma prior, echo=FALSE}
prior <- c(prior(student_t(3, 0, 3), class = sigma))
prior %>%
parse_dist() %>%
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) +
stat_slab(normalize = "panels") +
scale_fill_viridis_d(option = "plasma", end = 0.9) +
guides(fill = "none") +
labs(x = "Value", y = "Density", title = paste0(prior$class, ": ", prior$prior)) +
theme_classic()
```
For our example, we can define our own informative priors using information from Schroeder and Epley. Their paper contains four studies and our data set focuses on the fourth where they apply their findings to professional recruiters. Study 1 preceded this and used students, so we can pretend we are the researchers and use this as a source of our priors for the "later" study.
Focusing on hire rating, they found (pg. 881):
> "Evaluators who heard pitches also reported being significantly more likely to hire the candidates (*M* = 4.34, *SD* = 2.26) than did evaluators who read exactly the same pitches (*M* = 3.06, *SD* = 3.15), *t*(156) = 2.49, *p* = .01, 95% CI of the difference = [0.22, 2.34], *d* = 0.40 (see Fig. 1)".
So, for our intercept and reference group, we can set a normally distributed prior around a mean of 3 and SD of 3 for the transcript group. Note the rounded values since these are approximations for what we expect about the measures and manipulations. We are factoring in what we know about the parameters from our topic and method knowledge.
It is normally a good idea to visualise this process to check the numbers you enter match your expectations. For the intercept, a mean and SD of 3 look like this when generating the numbers from a normal distribution:
```{r plot SE intercept prior, echo=FALSE}
prior <- c(prior(normal(3, 3), class = Intercept)) # Set prior and class
prior %>%
parse_dist() %>%
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) +
stat_slab(normalize = "panels") +
scale_fill_viridis_d(option = "plasma", end = 0.9) +
guides(fill = "none") +
labs(x = "Value", y = "Density", title = paste0(prior$class, ": ", prior$prior)) +
theme_classic()
```
This turns out to be quite a weak prior since the distribution extends below 0 (which is not possible for this scale) all the way to 10 which is the upper limit of this scale. It covers pretty much the entire measurement scale with the peak around 3, so it represents a lenient estimate of what we expect the reference group to be.
We can set something more informative for the sigma prior knowing what we do about standard deviations. A common prior for the standard deviation is using an exponential distribution as it cannot be lower than 0. This means the largest density is around zero and the density decreases across more positive values. There is only one value to enter for an exponential distribution: the rate parameter. Values closer to zero cover a wider range, while larger values cover a smaller range. For this, a value of 1 means we peak at 0 and it drops off by 2 and beyond.
```{r user sigma prior, echo=FALSE}
prior <- c(prior(exponential(1), class = sigma)) # Set prior and class
prior %>%
parse_dist() %>%
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) +
stat_slab(normalize = "panels") +
scale_fill_viridis_d(option = "plasma", end = 0.9) +
guides(fill = "none") +
labs(x = "Value", y = "Density", title = paste0(prior$class, ": ", prior$prior)) +
theme_classic()
```
**Note on the visualisation**: Credit to the visualisation method goes to Andrew Heiss who shared some <a href="https://gist.github.com/andrewheiss/a4e0c0ab2d735625ac17ec8a081f0f32" target="_blank">code on a Github Gist</a> to visualise different priors. I adapted the code to use here to help you visualise the priors you enter. You can adapt the code to show any kind of prior used in brms models. All you need to do is specify the distribution family and parameters. Like the original code, you can even present a bunch of options to compare side by side.
For the coefficient, the mean difference was around 1 (calculated manually by subtracting one mean from the other) and the 95% confidence interval was quite wide from 0.22 to 2.34. As we are working out what prior would best fit our knowledge, we can compare some different options side by side. We can compare a stronger prior (*SD* = 0.5) vs a weaker prior (*SD* = 1).
```{r plot coefficient priors}
priors <- c(prior(normal(1, 0.5), class = b),
prior(normal(1, 1), class = b)) # Set prior and class
priors %>%
parse_dist() %>% # Function from tidybayes/ggdist to turn prior into a dataframe
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) + # Fill in details from prior and add fill
stat_slab(normalize = "panels") + # ggdist layer to visualise distributions
scale_fill_viridis_d(option = "plasma", end = 0.9) + # Add colour scheme
guides(fill = "none") + # Remove legend for fill
facet_wrap(~prior) + # Split into a different panel for each prior
labs(x = "Value", y = "Density") +
theme_classic()
```
The stronger prior on the left shows we are expecting mainly positive effects with a peak over 1 but ranges between around -0.5 (transcript to be higher than audio) and 2 (audio to be higher than transcript). The weaker prior on the right shows we are still expecting the peak over 1, but it could span from -1.5 to around 3.5.
Lets say we think both positive and negatives effects are plausible but we expect the most likely outcome to be similar to study 1 from Schroeder and Epley. So, for this example we will go with the weaker prior. Now we have our priors, we can save them to a new object:
```{r Schroeder prior}
priors <- set_prior("normal(1, 1)", class = "b") +
set_prior("normal(3, 3)", class = "Intercept") +
set_prior("exponential(1)", class = "sigma")
```
::: {.info data-latex=""}
Remember it is important to check the sensitivity of the results to the choice of prior. So, once we're finished, we will check how stable the results are to an uninformative prior, keeping the defaults. Normally it is the opposite way around and using uninformative priors first, but I did not want to put off thinking about the priors.
:::
#### Update the prior to the posterior
This is going to be the longest section as we are going to fit the `brms` model and then explore the posterior.
As the process relies on sampling using MCMC, it is important to set a seed within the function for reproducibility, so the semi-random numbers have a consistent starting point. This might take a while depending on your computer, then you will get a bunch of output for fitting the model and sampling from the MCMC chains.
```{r Schroeder fit, eval=FALSE, message=FALSE, warning=FALSE}
Schroeder_fit <- brm(
formula = Schroeder_model1, # formula we defined above
data = Schroeder_data, # Data frame we're using
family = gaussian(), # What distribution family do we want for the likelihood function? Many examples we use in psychology are Gaussian, but check the documentation for options
prior = priors, # priors we stated above
sample_prior = TRUE, # Setting this to true includes the prior in the object, so we can include it on plots later
seed = 1908,
file = "Models/Schroeder_model1" #Save the model as a .rds file
)
```
::: {.info data-latex=""}
When you have lots of data or complicated models, the fitting process can take a long time. This means its normally a good idea to save your fitted model to save time if you want to look at it again quickly. In the brm function, there is an argument called `file`. You write a character string for any further file directory and the name you want to save it as. Models are saved as a .rds file - R's own data file format you can save objects in. Behind the scenes for this book, we must run the code every time we want to update it, so all the models you see will be based on reading the models as .rds files after we first fitted the models. If you save the objects, remember to refit them if you change anything like the priors, model, or data. If the file already exists though, it will not be overwritten unless you use the `file_refit` argument.
:::
If you save the model as a .rds file, you can load them again using the `r hl(read_rds())` function from `r pkg("readr")` in the tidyverse.
```{r load Schroeder model 1}
Schroeder_fit <- read_rds("Models/Schroeder_model1.rds")
```
There will be a lot of output here to explain the fitting and sampling process. For a longer explanation of how MCMC sampling works, see @van_ravenzwaaij_simple_2018, but for a quick overview, we want to sample from the posterior distribution based on the data and model. The default of `brms` is to sample from four chains, with each chain containing 2000 iterations (1000 of which are warm up / burn in iterations). If you get warning messages about model fit or convergence issues, you can increase the number of iterations. This becomes more important with more complex models, so all the defaults should be fine for the relatively simple models we fit in this chapter. We will return to chains and convergence when we see the trace plots later.
Now we have fitted the model, we can also double check the priors you set are what you wanted. You will see the source for the priors you set switched from default to user.
```{r check user priors}
prior_summary(Schroeder_fit)
```
Now we have our model, we can get a model summary like any old linear model in R.
```{r Schroeder posterior}
summary(Schroeder_fit)
```
At the top, we have information on the model fitting process, like the family, data, and draws from the posterior summarising the chain iterations.
Population-level effects is our main area of interest. This is where we have the posterior probability distribution summary statistics. We will look at the whole distribution soon, but for now, we can see the median point-estimate for the intercept is 3.01 with a 95% credible interval between 2.09 and 3.94. This is what we expect the mean of the reference group to be, i.e., the transcript group.
We then have the median coefficient of 1.57 with a 95% credible interval between 0.46 and 2.66. This means our best guess for the mean difference / slope is an increase of 1.57 for the audio group. Note, you might get subtly different values to the output here since it is based on a semi-random sampling process, but the qualitative conclusions should be the same.
For convergence issues, if Rhat is different from 1, it can suggest there are problems with the model fitting process. You can also look at the effective sample size statistics (the columns ending in ESS). These should in the thousands, or at the very least in the hundreds [@flores_beforeafter_2022] for both the bulk and tail. We will return to a final indicator of model fitting soon when we check the trace plots.
For a tidier summary of the parameters, we can also use the handy `r hl(describe_posterior())` function from `r pkg("bayestestR")`.
```{r Schroeder describe posterior}
describe_posterior(Schroeder_fit)
```
We can use this as a way to create ROPE regions for the effects and it tells us useful things like the probability of direction for the effect (how much of the posterior is above or below zero).
##### Plotting the posterior distributions
Until now, we have focused on point-estimates and intervals of the posterior, but the main strength of Bayesian statistics is summarising the parameters as a whole posterior probability distribution, so we will now turn to the various plotting options.
The first plot is useful for seeing the posterior of each parameter and the trace plots to check on any convergence issues.
```{r Schroeder parameters and trace, warning=FALSE, message=FALSE}
plot(Schroeder_fit)
```
For this model, we have three plots: one for the intercept, one for the coefficient/slope, and one for sigma. On the left, we have the posterior probability distributions for each. On the right, we have trace plots. By default, `brms` uses four chains - or series of samples using MCMC - and this shows how each chain moves around the parameter space. Essentially, we want the trace plots to look like fuzzy caterpillars with a random series of lines. If there are spike which deviate massively from the rest, or the lines get stuck in one area, this suggests there are convergence issues.
These plots are useful for an initial feel of the parameter posteriors, but there are a great series of functions from the `r pkg("bayestestR")` package [@Makowski2019] which you can use on their own, or wrap them in the `r hl(plot())` function after loading the `r pkg("see")` package [@Luedecke2021]. For example, we can see an overlay of the prior and posterior for the main parameters of interest. On its own, `r hl(p_direction())` tells you the probability of direction for each parameter, i.e., how much of the distribution is above or below 0? Wrapped in `r hl(plot())`, you can see the prior and posterior, with the posterior divided in areas above or below 0.
```{r Schroeder p direction, message=FALSE, warning=FALSE}
plot(p_direction(Schroeder_fit),
priors = TRUE)
```
::: {.warning data-latex=""}
For this to work, you must specify priors in `brms`. It does not work with the package default options for the coefficients.
:::
We can see the pretty wide prior in blue, then the posterior. Almost all of the posterior distribution is above zero to show we're pretty confident that audio is associated with higher hire ratings than transcript.
The next useful plot is seeing the 95% HDI / credible interval. On its own, `r hl(hdi())` will show you the 95% HDI for your parameters. Wrapped in `r hl(plot())`, you can visualise the HDI compared to zero for your main parameters. If the HDI excludes zero, you can be confident in a positive or negative effect, at least conditional on these data and model. Remember, there is a difference between the small world and big world of models. This is not the absolute truth, just the most credible values conditioned on our data and model.
```{r Schroeder HDI, warning=FALSE, message=FALSE}
plot(bayestestR::hdi(Schroeder_fit)) # Specify package to avoid clash with ggdist
```
::: {.warning data-latex=""}
These plots are informative for you learning about your model and the inferences you can learn from it. However, they would not be immediately suitable to enter into a report. Fortunately, they are created using `r pkg("ggplot")`, so you can customise them in the same way by adding layers of additional functions.
:::
For this example, the 95% HDI excludes 0, so we can be confident the coefficient posterior is a positive effect, with the audio group leading to higher hire ratings than the transcript group.
Finally, we might not be interested in comparing the coefficients to a point-value of 0, we might have a stronger level of evidence in mind, where the coefficient must exclude a range of values in the ROPE process we explored in chapter 9. For example, maybe effects smaller than 1 unit difference are too small to be practically/theoretically meaningful.
::: {.info data-latex=""}
Remember this is potentially the most difficult decision to make, maybe more so than choosing priors. Many areas of psychology do not have clear guidelines/expectations for smallest effect sizes of interest, so it is down to you to explain and justify your approach based on your understanding of the topic area.
:::
```{r Schroeder ROPE, warning=FALSE, message=FALSE}
plot(rope(Schroeder_fit,
range = c(-1, 1))) # What is the ROPE range for your smallest effects of interest?
```
For this example, for a sample size of 39, we have pretty strong evidence in favour of a positive effect in the audio group. The 95% HDI excludes zero, but if we set a ROPE of 1 unit, we do not quite exclude it. This means if we wanted to be more confident that the effect exceeded the ROPE, we would need more data. This is just for demonstration purposes, I'm not sure if the original study would consider an effect of 1 as practically meaningful, or whether they would just be happy with any non-zero effect.
##### Hypothesis testing in `r pkg("brms")`
Following from chapter 9, we saw we can also use Bayesian statistics to test hypotheses. This works in a modelling approach as `brms` has a function to test hypotheses. We must provide the fitted model object and state a hypothesis to test. This relies on a character description of the parameter and test value. For a full explanation, see the <a href="https://paul-buerkner.github.io/brms/reference/hypothesis.html" target="_blank">brms documentation online</a> for the function. Here, we will test the coefficient/slope against a point-null of 0.
```{r Schroeder hypothesis}
hypothesis(Schroeder_fit, # brms model we fitted earlier
hypothesis = "CONDITION1 = 0")
```
::: {.info data-latex=""}
We must state a character hypothesis which requires you to select a parameter. Here, we focus on the `r hl("CONDITION")` parameter, i.e., our slope, which must match the name in the model. We can then state values to test against, like here against a point-null of 0 for a Bayes factor. Alternatively, you can test posterior odds where you compare masses of the posterior like CONDITION > 0.
:::
The key part of the output is the evidence ratio (`Evid.Ratio`), but we also have the estimate and 95% credible interval. As we are testing a point-null of 0, we are testing the null hypothesis against the alternative of a non-null effect. As the value is below 1, it suggests we have evidence in favour of the alternative compared to the null. I prefer to express things above 1 as its easier to interpret. You can do this by dividing 1 by the ratio, which should provide a Bayes factor of 12.5 here.
Alternatively, you can calculate the posterior odds by stating regions of the posterior to test. For example, if we used "CONDITION1 > 0", this would provide a ratio of the posterior probability of positive effects above 0 to the posterior probability of negative effects below 0. For this example, this would be a posterior odds of 265.7 in favour of positive effects. Note, when all the posterior is above 0, you can get a result of Inf (infinity) as all the evidence is in favour of positive effects.
```{r Schroeder positive hypothesis}
hypothesis(Schroeder_fit, # brms model we fitted earlier
hypothesis = "CONDITION1 > 0")
```
##### Calculating and plotting conditional effects
For the final part of exploring the posterior, you might be interested in the estimates for each group or condition in your predictor. When you only have two groups, you can calculate the point estimate using the intercept and slope, but we can use the `r pkg("emmeans")` package [@Lenth2022] to calculate conditional effects on the posterior distribution.
```{r Schroeder marginal}
emmeans(Schroeder_fit, # add the model object
~ CONDITION) # What predictor do you want marginal means of?
```
This provides the median and 95% HDI values for the posterior for each group. The `r pkg("brms")` package also comes with a function called `r hl(conditional_effects())` which you can use to plot the conditional effects.
```{r Schroeder conditional effects plot}
conditional_effects(Schroeder_fit)
```
By default, it plots the median of the posterior for each group and the error bars represent the 95% HDI around the median. Behind the scenes, it uses ggplot, so you can customise the graphs to make them better suited for a report.
::: {.warning data-latex=""}
When you use the `conditional_effects()` function, the type of plot it produces will depend on the data type. All the way back when we read the data in, we turned CONDITION into a factor. If you left it numeric, all the modelling would work the same, but the plot here would be more of a scatterplot. There are additional arguments you can use, so [see the function help](http://paul-buerkner.github.io/brms/reference/conditional_effects.html) for further customisation options.
:::
```{r Conditional effects customisation}
conditional_plot <- conditional_effects(Schroeder_fit)
plot(conditional_plot,
plot = FALSE)[[1]] + #I don't know why you need this, but it doesn't work without
theme_classic() +
scale_y_continuous(limits = c(0, 10), breaks = seq(0, 10, 2)) +
scale_x_discrete(labels = c("Transcript", "Audio")) +
labs(x = "Speech Group", y = "Mean Hire Rating")
```
#### Model checking
Finally, we have our model checking procedure. We already looked at some information for this such as Rhat, effect sample size, and the trace plots. This suggests the model fitted OK. We also want to check the model reflects the properties of the data. This does not mean we want it exactly the same and overfit to the data, but it should follow a similar pattern to show our model captures the features of the data.
Bayesian models are generative, which means once they are fitted, we can use them to sample values from the posterior and make predictions from it. One key process is called a posterior predictive check which takes the model and uses is to generate new samples. This shows how you have conditioned the model and what it expects.
The plot below is a `r pkg("brms")` function for facilitating this. The thick blue line is your data for the outcome. The light blue lines are 100 samples from the posterior to show what the model expects about the outcome.
```{r Schroeder model check, warning=FALSE, message=FALSE}
pp_check(Schroeder_fit,
ndraws = 100) # How many draws from the posterior? Higher values means more lines
```
For this example, it does an OK job at capturing the pattern of data and the bulk of the observed data follows the generated curves. However, you can see the data are quite flat compared to the predicted values. As we expect a Gaussian distribution, the model will happily produce normal curves. The model also happily expects values beyond the range of data as our scale is bound to 0 and 10. This is hugely common in psychological research as we expect Gaussian distributions from ordinal bound data. So, while this model does an OK job, we could potentially improve it by focusing on an ordinal regression model so we can factor in the bounded nature of the measure if we had the raw measures.
##### Check model sensitivity to different priors
The final thing we will check for this model is how sensitive it is to the choice of prior. A justifiable informative prior is a key strength of Bayesian statistics, but it is important to check the model under at least two sets of priors. For this example, we will compare the model output under the default package priors and our user defined priors we used all along.
In the code below, we have omitted the prior argument, so we are fitting the exact same model as before but using the default package priors.
```{r Schroeder model 2, eval=FALSE, message=FALSE, warning=FALSE}
Schroeder_fit2 <- brm(
formula = Schroeder_model1,
data = Schroeder_data,
family = gaussian(),
seed = 1908,
file = "Models/Schroeder_model2" #Save the model as a .rds file
)
```
```{r load Schroeder 2, echo=FALSE}
Schroeder_fit2 <- read_rds("Models/Schroeder_model2.rds")
```
If we run the `r hl(summary())` function again, you can check the intercept and predictor coefficients to see how they differ to the first model we fitted. Ideally, they should provide us with similar inferences, such as a similar magnitude and in the same direction. It is never going to be exactly the same under different priors, but we want our conclusions robust to the choice of prior we use.
```{r summarise Schroeder 2}
summary(Schroeder_fit2)
```
To make it easier to compare, we can isolate the key information from each model and present them side by side. You can see below how there is little difference in the intercept between both models. The median is similar, both probability of direction values are 100%, and the 95% HDI ranges across similar values. For our user prior, the coefficient is a little more conservative, but the difference is also small here, showing how our results are robust to the choice of prior.
```{r side by side summary, echo=FALSE, warning=FALSE, message=FALSE}
model1 <- describe_posterior(Schroeder_fit) %>%
dplyr::mutate(Model = "User prior") %>%
dplyr::select(Model, Parameter, Median, CI_low, CI_high, pd)
model2 <- describe_posterior(Schroeder_fit2) %>%
dplyr::mutate(Model = "Default prior") %>%
dplyr::select(Model, Parameter, Median, CI_low, CI_high, pd)
dplyr::bind_rows(model1, model2) %>%
arrange(desc(Parameter)) %>%
knitr::kable(digits = 2,
col.names = c("Model", "Parameter", "Median Estimate", "Lower 95% HDI", "Upper 95% HDI", "Prob Direction"))
```
### Independent activity (Brandt et al., 2014)
For an independent activity, we will use data from the study by [@brandt_does_2014]. The aim of Brandt et al. was to replicate a relatively famous social psychology study (Banerjee et al., 2012) on the effect of recalling unethical behaviour on the perception of brightness.
In common language, unethical behaviour is considered as "dark", so the original authors designed a priming experiment where participants were randomly allocated to recall an unethical behaviour or an ethical behaviour from their past. Participants then completed a series of measures including their perception of how bright the testing room was. Brandt et al. were sceptical and wanted to replicate this study to see if they could find similar results.
Participants were randomly allocated (`r hl("ExpCond")`) to recall an unethical behaviour (n = 49) or an ethical behaviour (n = 51). The key outcome was their perception of how bright the room was (`r hl("welllit")`), from 1 (not bright at all) to 7 (very bright). The research question was: Does recalling unethical behaviour lead people to perceive a room as darker than if they recall ethical behaviour?
In the original study, they found that the room was perceived as darker in the unethical condition compared to the ethical condition. The means and standard deviations of Banerjee et al. are reproduced from Table 2 in Brandt et al. below and might be useful for thinking about your priors later.
```{r Brandt descriptive table, echo = FALSE}
knitr::kable(
tribble(~"Condition", ~"Mean (SD)",
"Unethical", "4.71 (0.85)",
"Ethical", "5.30 (0.97)")
)
```
::: {.try data-latex=""}
Using your understanding of the design, apply what you learnt in the guided example to this independent activity to address the research question. Following the Bayesian modelling steps, fit at least two models: one using the default priors and one using informative priors. Explore the model results, think about what you would conclude for the research question, and answer the questions below.
:::
```{r Brandt data, warning=FALSE, message=FALSE}
Brandt_data <- read_csv("data/Brandt_unlit.csv")
# Recode to dummy coding
# Turn to factor after recoding so we're working with groups
# 0 = Ethical
# 1 = Unethical
Brandt_data <- Brandt_data %>%
mutate(ExpCond = as.factor(case_when(ExpCond == 1 ~ 0,
ExpCond == -1 ~ 1)))
```
- Is the coefficient positive or negative? `r mcq(opts = c(answer = "Positive", x = "Negative"))`
```{r Brandt Q2, echo=FALSE, results='asis'}
opts = c(x = "Yes, the 95% HDI excludes 0",
answer = "No, the 95% HDI crosses 0")
cat("- Can we be confident in the direction of the coefficient?", longmcq(opts))
```
```{r Brandt Q3, echo=FALSE, results='asis'}
opts = c(x = "Recalling unethical behaviour lead people to perceive a room as darker.",
answer = "The effect was in the opposite direction but we would not be confident that the manipulation had an effect.")
cat("- What would your conclusion be for the research question?", longmcq(opts))
```
```{r Brandt Q4, echo=FALSE, results='asis'}
opts = c(answer = "No, there is little difference in the parameters and our conclusions do not change.",
x = "Yes, there is a qualitative difference in our conclusions and the parameters change substantially.")
cat("- Are the results sensitive to the choice between default and user priors?", longmcq(opts))
```
```{r Brandt Q5, echo=FALSE, results='asis'}
opts = c(answer = "No, assuming a normal distribution misses key features of the data.",
x = "Yes, assuming a normal distribution captures key features of the data.")
cat("- Does the normal model capture the features of the data?", longmcq(opts))
```
`r hide("Explain these answers")`
1. The experimental condition coefficient is a positive but small value.
2. Although the coefficient is positive, there is substantial overlap across 0.
3. Given the uncertainty around the coefficient, we would not be confident in the effect of experimental condition on perceived brightness.
4. The results should be robust to the choice of prior if you based it on the means and SDs from the original Banerjee et al. study. There was little difference in my user and default priors.
5. In contrast to the Schroeder and Epley data where the ordinal data was approximately normal, there is no getting away from the characteristic ordinal distribution with peaks at each integer. Really, we would need to explore something like ordinal regression to capture the properties of the data. It is not something we covered in the Bayesian lectures or activites, but [see the bonus section](#Brandt-bonus) showing what an ordinal model would look like applied to these data.
`r unhide()`
You can check your attempt to the solutions at [the bottom of the page](#Brandt-solution). Remember this is based on semi-random number generation, so there might be some variation in your precise values, but the qualitative conclusions should be consistent. If you want to double check your process is accurate, you can download our saved models from [the Github repository](https://github.com/BartlettJE/statsresdesign/tree/master/book/Models) and reproduce the results that way.
## Multiple Linear Regression {#multipleregression}
### Guided example (Heino et al., 2018)
For the second guided example we covered in the lecture, we will explore the model included in @heino_bayesian_2018 for their Bayesian data analysis tutorial. They explored the feasibility and acceptability of the ”Let’s Move It” intervention to increase physical activity in 43 older adolescents.
In this section, we will work through their multiple regression model following the Bayesian modelling steps. There will be less explanation than the simple linear regression section as we are following the same processes, but I will highlight if there is anything new or important to consider when we have two or more predictors.
#### Identify data
@heino_bayesian_2018 randomised participants into two groups (`r hl("intervention")`) for control (0) and intervention (1) arms (group sessions on motivation and self-regulation skills, and teacher training). Their outcome was a measure of autonomous motivation (`r hl("value")`) on a 1-5 scale, with higher values meaning greater motivation. They measured the outcome at both baseline (0) and six weeks after (1; `r hl("time")`).
Their research question was: To what extent does the intervention affect autonomous motivation?
```{r Heino data, warning=FALSE, message=FALSE}
# In contrast to the original article, deviation coding given the interaction
Heino_data <- read_csv("data/Heino-2018.csv") %>%
group_by(ID, intervention, time) %>%
summarise(value = mean(value, na.rm = TRUE)) %>%
mutate(intervention = factor(case_when(intervention == 0 ~ -0.5, .default = 0.5)),
time = factor(case_when(time == 0 ~ -0.5, .default = 0.5))) %>%
ungroup()
```
::: {.info data-latex=""}
Part of their tutorial discusses a bigger multi-level model considering different scenarios, but for this demonstration, we're just averaging over the scenarios to get the mean motivation. We also convert intervention and time to factors so they work nicely in plotting options later.
:::
#### Define a descriptive model
I recommend reading the article as they explain this process in more detail. We essentially have an outcome of autonomous motivation (`r hl("value")`) and we want to look at the interaction between `r hl("intervention")` and `r hl("time")`. They define a fixed intercept in the model with the `1 +` part. Its also technically a multi-level model as they define a random intercept for each participant (`(1 | ID)`) to ensure we recognise time is within-subjects.
::: {.info data-latex=""}
By default, R includes a fixed intercept (the `1 +` part) in the model, so you would get the same results without adding it to the model. However, people often include it so it is explicit in the model formula.
:::
```{r Heino model}
Heino_model <- bf(value ~ 1 + time * intervention + (1 | ID))
```
#### Specify prior probability of parameters
Compared to simple linear regression, as you add predictors, the number of priors you can set also increase. In the output below, you will see how you can enter a prior for all beta coefficients or one specific for each predictors. There are also different options for setting a prior for standard deviations since we now have the group-level standard deviation for the random effect and sigma for the distribution family since we are assuming the outcome is normal.
```{r Heino prior options}
get_prior(Heino_model, data = Heino_data)
```
Note, you get a warning about missing data but since its a multi-level model, we just have fewer observations in some conditions instead of the whole case being removed.
This is another place where I recommend reading the original article for more information. They discuss their choices and essentially settle on wide weak priors for the coefficients to say small effects are more likely but they allow larger effects. The two standard deviation classes are then assigned relatively wide Cauchy priors.
```{r Heino priors}
Heino_priors <- prior(normal(0, 5), class = "b") +
prior(cauchy(0, 1), class = "sd") +
prior(cauchy(0, 2), class = "sigma")
```
```{r Heino plot priors, echo=FALSE}
prior <- c(prior(normal(0, 5), class = "b"))
colours <- viridis::plasma(3, end = 0.9)
b_plot <- prior %>%
parse_dist() %>%
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) +
stat_slab(normalize = "panels") +
scale_fill_manual(values = colours[1]) +
guides(fill = "none") +
labs(x = "Value", y = "Density", title = paste0(prior$class, ": ", prior$prior)) +
theme_classic()
prior <- c(prior(cauchy(0, 1), class = "sd"))
sd_plot <- prior %>%
parse_dist() %>%
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) +
stat_slab(normalize = "panels") +
scale_fill_manual(values = colours[2])+
guides(fill = "none") +
labs(x = "Value", y = "Density", title = paste0(prior$class, ": ", prior$prior)) +
theme_classic() +
coord_cartesian(xlim = c(-15, 15))
prior <- c(prior(cauchy(0, 2), class = "sigma"))
sigma_plot <- prior %>%
parse_dist() %>%
ggplot(aes(y = 0, dist = .dist, args = .args, fill = prior)) +
stat_slab(normalize = "panels") +
scale_fill_manual(values = colours[3]) +
guides(fill = "none") +
labs(x = "Value", y = "Density", title = paste0(prior$class, ": ", prior$prior)) +
theme_classic() +
coord_cartesian(xlim = c(-30, 30))
b_plot + sd_plot + sigma_plot + plot_layout(ncol = 2)
```
#### Update prior to posterior
This is going to be the longest section as we are going to fit the `brms` model and then explore the posterior.
As the process relies on sampling using MCMC, it is important to set a seed for reproducibility, so the semi-random numbers have a consistent starting point. This might take a while depending on your computer, then you will get a bunch of output for fitting the model and sampling from the MCMC chains. Remember, we save all the models using the `file` argument, so its easier to load them later. If you update the model, you must use the `file_refit` argument or it will not change when you use the same file name.
```{r Heino fit, eval=FALSE, message=FALSE, warning=FALSE}
Heino_fit <- brm(
formula = Heino_model,
data = Heino_data,
prior = Heino_priors,
family = gaussian(),
seed = 2108,
file = "Models/Heino_model"
)
```
```{r load Heino model, echo=FALSE}
Heino_fit <- read_rds("Models/Heino_model.rds")
```
Now we have fitted the model, let's have a look at the summary.
```{r Heino summary}
summary(Heino_fit)
```
The model summary is very similar to the examples in the simple linear regression section, but we also have a new section for group-level effects since we added a random intercept for participants.
Exploring the coefficients, all the effects are pretty small, with the largest effect being 0.10 units. There is quite a bit of uncertainty here, with 95% credible intervals spanning negative and positive effects, but the sample size is quite small to learn anything meaningful from two groups.
In more complicated models like this, plotting is going to be your best friend for understanding what is going on. First up, we can check the posteriors and trace plots, although we will work through model checking in the next section.
```{r Heino trace plot}
plot(Heino_fit)
```
The posteriors are quite wide and spread over 0 for the coefficients. The trace plots do not suggest there are cause for concern around convergence in the model.
The next key plot is seeing the probability of direction and with the priors superimposed.
```{r Heino pd plot}
plot(p_direction(Heino_fit),
priors = TRUE) # plot the priors
```
On this plot, you can see how wide the priors were. They are almost flat to cover coefficients from -10 to 10, with the posterior distributions peaking around 0. These plots also show how there is not much we can conclude from the results.
Finally, we can take a closer look at the 95% HDI of the posterior distributions.
```{r Heino HDI plot}
plot(bayestestR::hdi(Heino_fit)) # Specify to avoid clash with ggdist
```
Now we zoom in a little more without the scale of the wide priors and there is further indication the mass of the coefficient posteriors are centered over 0. We would need more data to make firm conclusions about the effectiveness of the intervention. The data comes from a feasibility study, so the sample size was pretty small and its mainly about how receptive participants are to the intervention.
##### Calculating and plotting conditional effects
As a bonus extra since its not included in Heino et al., you can also use the `r pkg("emmeans")` package to calculate marginal effects on the posterior distribution. Its not important here as there is little we can learn from breaking down the interaction further, but it might come in handy in future.
```{r Heino marginal}
# Surround with brackets to both save and output
(Heino_means <- emmeans(Heino_fit, # add the model object
~ time | intervention)) # We want to separate time by levels of intervention
```
This provides the median value of the posterior for the combination of time and intervention. We can see pretty clearly there is not much going on, with very little difference across the estimates and all the 95% credible intervals overlapping.
Depending on how you want to express the marginal means, you can also use the `r pkg("emmeans")` object to calculate contrasts, expressing the effects as differences in the median posterior value for each group/condition. Just keep in mind which comparisons would best address your research question and hypothesis. We entered the difference in time for each intervention, but you might be interested in the difference in intervention for each time.
```{r Heino contrasts}
contrast(Heino_means)
```
Finally, we can plot the conditional effects which is normally a good idea to help your reader understand your results. In this object, I have used the `effects` argument to specify which population level effect I want plotting. If you omit the `effects` argument, you would receive three plots for this example: one for the partial effect of each predictor and one for the interaction.
```{r Heino conditional effects standard}
conditional_effects(Heino_fit,
effects = "time:intervention")
```
Like the simple linear regression example, this is useful for your own understanding, but it might not be quite appropriate for inserting immediately into a report. Once you save the plot as an object, you can add `r pkg("ggplot")` layers to make it easier for your reader to understand. For example, here I have tidied up the axis names and labels, changed the scale to reflect the range of the outcome, and added a colour scheme to differentiate the two intervention groups.
```{r Heino conditional effects modified}
# Save initial plot of the interaction
conditional_plot <- conditional_effects(Heino_fit,
effects = "time:intervention")
# Call the plot and stop legend being included to prevent duplication later
plot(conditional_plot,
plot = FALSE,
cat_args = list(show.legend = F))[[1]] + # No idea why, but doesn't work without the subsetting
theme_classic() +
scale_y_continuous(limits = c(1, 5), breaks = seq(1, 5, 1)) +
scale_x_discrete(labels = c("Baseline", "Six weeks")) +
labs(x = "Time", y = "Autonomous Motivation") +
scale_color_viridis_d(option = "D", begin = 0.1, end = 0.7,
name = "Group", labels = c("Control", "Intervention")) # Add neater legend labels
```
##### Model fit and comparison
Depending on your research question and theoretical understanding of the variables you are working with, you might be interested in comparing different models and assessing their fit. It is not something Heino et al. included, but you could compare their model to one without the interaction (lets pretend that is theoretically justified). Instead of refitting a whole new model, we can update the model to a change in formula. All other settings like the priors remain the same.
```{r Heino no interaction model, message=FALSE, eval = FALSE}
# Update model to a new formula
Heino_fit2 <- update(Heino_fit, # Original brms model object
formula. = ~ . - time:intervention) # tilda dot for the original formula, minus the interaction
```
```{r Heino save model 2, eval= FALSE, echo=FALSE}
write_rds(Heino_fit2, "Models/Heino_model2.rds")
```
```{r Heino load model 2, echo=FALSE}
Heino_fit2 <- read_rds("Models/Heino_model2.rds")
```
First, we can calculate the $R^2$ estimate for the proportion of variance in your outcome that your predictors explain. `r pkg("brms")` has a specific function to get the model $R^2$ and its 95% credible interval.
```{r Heino R2}
#R2 for first model object with interaction
bayes_R2(Heino_fit)
```
We can also compare the two models side by side. The second model actually has a slightly higher $R^2$ estimate, but there is very little to choose between the two models.
```{r Heino R2 comparison}
R2_model1 <- as.data.frame(bayes_R2(Heino_fit))
R2_model2 <- as.data.frame(bayes_R2(Heino_fit2))
R2_table <- bind_rows(R2_model1, R2_model2)
rownames(R2_table) <- c("Model with interaction", "Model without interaction")
knitr::kable(R2_table,
digits = 2,
row.names = TRUE,
col.names = c("R2 Estimate", "Estimated Error", "Lower 95% HDI", "Upper 95% HDI"))
```
#### Model check
In previous output, there were no immediate causes of concern. Trace plots showed good mixing of the chains, R-hat values were no higher than 1.01, and effective sample size values were close to the thousands or higher.
As the final step, we can look at the posterior predictive check to make sure the model is capturing the features of the data. The model maps onto the data quite well, with the samples largely following the underlying data. We are still using metric models to analyse ultimately ordinal data (despite calculating the mean response), so the expected values go beyond the range of data (1-5), but it is good enough with that caveat in mind.
```{r Heino pp check}
pp_check(Heino_fit,
ndraws = 100) # 100 draws from the model
```
::: {.try data-latex=""}
If you scroll to the end of the Heino et al. article, they demonstrate how you can fit an ordinal model to the data when you do not average over the different situations.
:::
##### Check model sensitivity to different priors
The final thing we will check for this model is how sensitive it is to the choice of prior. For this example, we will compare the model output under the default package priors and the user defined priors from Heino et al.
In the code below, we have omitted the prior argument, so we are fitting the exact same model as before but using the default package priors. This time we can't just update the model, we need to refit it.
```{r Heino model 3, eval=FALSE, message=FALSE, warning=FALSE}
Heino_fit3 <- brm(
formula = Heino_model,
data = Heino_data,
family = gaussian(),
seed = 2108,
file = "Models/Heino_model3"
)
```
```{r load Heino 3, echo=FALSE}
Heino_fit3 <- read_rds("Models/Heino_model3.rds")
```
If we run the `r hl(summary())` function again, you can check the intercept and predictor coefficients to see how they differ to the first model we fitted. Ideally, they should provide us with similar inferences, such as a similar magnitude and in the same direction. It is never going to be exactly the same under different priors, but we want our conclusions robust to the choice of prior we use.
```{r summarise Heino 3}
summary(Heino_fit3)
```
To make it easier to compare, we can isolate the key information from each model and present them side by side. You can see below how there is little difference in the intercept and coefficients between both models. This suggests our results are robust to these two choices of prior.
```{r Heino summary comparison, echo=FALSE, warning=FALSE, message=FALSE}
model1 <- describe_posterior(Heino_fit) %>%
dplyr::mutate(Model = "User priors") %>%
dplyr::select(Model, Parameter, Median, CI_low, CI_high)
model2 <- describe_posterior(Heino_fit3) %>%
dplyr::mutate(Model = "Default priors") %>%
dplyr::select(Model, Parameter, Median, CI_low, CI_high)
dplyr::bind_rows(model1, model2) %>%
arrange(Parameter) %>%
knitr::kable(digits = 2,
col.names = c("Model", "Parameter", "Median Estimate", "Lower 95% HDI", "Upper 95% HDI"))
```
### Independent activity (Coleman et al., 2019)
For an independent activity, we will use data from the study by @coleman_absorption_2019. Coleman et al. contains two studies investigating religious mystical experiences. One study focused on undergraduates and a second study focused on experienced meditators who were part of a unique religious group.
The data set contains a range of variables used for the full model in the paper. We are going to focus on a small part of it for this exercise, but feel free to explore developing the full model as was used in study 1. The key variables are:
1. `r hl("Age")` - Measured in years
2. `r hl("Gender")` - 0 = male; 1 = female
3. `r hl("Week_med")` - Ordinal measure of how often people meditate per week, with higher values meaning more often
4. `r hl("Time_session")` - Ordinal measure of how long people meditate per session, with higher values meaning longer
5. `r hl("Absorption_SUM")` - Sum score of the Modified Tellegen Absorption scale, with higher values meaning greater trait levels of imaginative engagement
6. `r hl("EQ_SUM")` - Sum score of the Empathizing Quotient short form, with higher values meaning greater theory of mind ability
7. `r hl("Mscale_SUM")` - Sum score of the Hood M-scale, with higher values meaning more self-reported mystical experiences
Previous studies had explored these components separately and mainly in undergraduates, so Coleman et al. took the opportunity to explore a unique sample of a highly committed religious group. The final model included all seven variables, but for this example, we will just focus on absorption (`r hl("Absorption_SUM")`) and theory of mind (`r hl("EQ_SUM")`) as they were the main contributors, with the other variables as covariates.
If you follow the link to Coleman et al. above, you can see the results of study 2 which focused on undergraduate students. This study is presented second, but you can use it for this example to develop your understanding of the measures for your priors. Keep in mind they are partial effects since there are more predictors in the model, but these are the key parameters apart from the interaction. The interaction was not statistically significant, so it was not retained in the model or reported in the final table.
```{r Coleman descriptive table, echo = FALSE}
knitr::kable(
tribble(~"Parameter", ~"Estimate", ~"95% Confidence Interval",
"Intercept", "108.64", "103.81 - 113.46",
"Absorption", "0.42", "0.29 - 0.54",
"Theory of Mind", "0.22", "-0.11 - 0.55")
)
```
Our research question is: How are absorption (`r hl("Absorption_SUM")`) and mentalizing (`r hl("EQ_SUM")`) related to mystical experiences (`r hl("Mscale_SUM")`) as an outcome? The interaction was of theoretical interest here, so focus on the interaction first.
::: {.try data-latex=""}
Using your understanding of the design, apply what you learnt in the guided example to this independent activity to address the research question. Following the Bayesian modelling steps, fit at least three models: one using the default priors, one using informative priors, and one removing the interaction term. Explore the model results, think about what you would conclude for the research question, and answer the questions below.
:::
```{r Coleman data, warning=FALSE, message=FALSE}
Coleman_data <- read_csv("data/Coleman_2019.csv") %>%
mutate(Absorption_SUM = Absorption_SUM - mean(Absorption_SUM), # Mean center the predictors
EQ_SUM = EQ_SUM - mean(EQ_SUM))
```
- Is the coefficient for absorption positive or negative? `r mcq(opts = c(answer = "Positive", x = "Negative"))`
- Is the coefficient for theory of mind positive or negative? `r mcq(opts = c(answer = "Positive", x = "Negative"))`
```{r Coleman Q3, echo=FALSE, results='asis'}
opts = c(x = "No, the 95% HDI of both coefficients contain 0.",
x = "The 95% HDI of absorption contains 0, but theory of mind is positive and excludes 0.",
x = "The 95% HDI of theory of mind contains 0, but absorption is positive and excludes 0.",
answer = "Yes, both individual predictors are positive and their 95% HDI excludes 0.")
cat("- Can we be confident in the direction of the individual predictors?", longmcq(opts))
```
```{r Coleman Q4, echo=FALSE, results='asis'}
opts = c(x = "There is no clear interaction.",
answer = "For lower values of theory of mind, the slope becomes more positive.",
x = "For lower values of theory of mind, the slope becomes more negative.")
cat("- How can you interpret the interaction?", longmcq(opts))
```
**Hint: ** You will need to look at the conditional effects plot and see how one predictor moderates the effect of the other predictor.
```{r Coleman Q5, echo=FALSE, results='asis'}
opts = c(x = "The model with the interaction term clearly has the better fit.",
x = "The model without the interaction term clearly has the better fit.",
answer = "There is little difference between the two models, but we would retain the interaction for theoretical interest.")
cat("- Comparing the models with and without the interaction term, which would you retain?", longmcq(opts))
```
```{r Coleman Q6, echo=FALSE, results='asis'}
opts = c(x = "Yes, there is a qualitative difference in our conclusions and the parameters change substantially.",
answer = "No, there is almost no difference in the parameters and our conclusions do not change.")
cat("- Are the results sensitive to the choice between default and user priors?", longmcq(opts))
```
`r hide("Explain these answers")`
1. The partial effect of absorption is a positive predictor of mystical experiences.
2. The partial effect of theory of mind is a positive predictor of mystical experiences.
3. For both partial effects, they are positive and the 95% HDI clearly excludes zero. Particularly for absorption, we have little uncertainty and its a marginally stronger effect compared to theory of mind.
4. This is more of a complicated one and I would accept saying there is no clear interaction. Its difficult to interpret an interaction between two continuous predictors and you are relying on the conditional effects plot. The slope between mystical experiences and absorption is more positive for lower values of theory of mind, but the highest density intervals overlap particularly for higher values of absorption.
5. The key concept here is the interaction is of theoretical interest. There is little difference between the two models - at least by their $R^2$ estimates - but we are interested in the interaction and it had the slightly larger estimate.
6. We have a lot of data here for three predictors, so the choice of prior has very little impact. The posterior is entirely dominated by the data and we only get variation in the second or third decimal place.
`r unhide()`