0005_analyze_multi_subject_trial_data.Rmd

---
title: "Analyze multi-subject/trial RT data"
description: Simple tutorial and workflow for analyzing reaction time (RT) and accuracy data from multiple experimental subjects
output:
    radix::radix_article:
        toc: true
        toc_depth: 3
editor_options: 
  chunk_output_type: console
---

```{r setup, include=FALSE}
# this chunk is for knitting Rmarkdown; you don't need this chunk if you're not knitting
knitr::opts_chunk$set(echo = TRUE, cache = FALSE, comment = NA, message = FALSE, warning = FALSE, R.options = list(width = 80), tidy = TRUE, tidy.opts = list(width.cutoff = 80))
```

This tutorial shows you how to analyze simple reaction time and acccuracy data that resemble data collected from standard psychology experiments (note that the three subjects' data were actually simulated with the `rwiener` function from the [RWiener](https://cran.r-project.org/web/packages/RWiener/index.html) package). This package also assumes you know how to set your working directory; if not, check out my intro tutorials [here](https://hausetutorials.netlify.com/0001_rbasics.html#workingcurrent-directory-where-are-you-and-where-should-you-be-getwd). 

I use the [data.table](https://github.com/Rdatatable/data.table/wiki) package extensively in my analyses. `tidyverse` and `dplyr` are great, but I prefer the speed and concise syntax of `data.table`. I won't explain how `data.table` works in this tutorial, but you'll see in the code below how flexible and concise it is. If you want to learn more, see [my tutorials](https://hausetutorials.netlify.com/0002_tidyverse_datatable.html) and extra [resources](https://hausetutorials.netlify.com/posts/2019-04-11-datatable-resources/). 

## Consider being a patron and supporting my work?

[Donate and become a patron](https://donorbox.org/support-my-teaching): If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.

## Getting the data and code

* Download Rmarkdown script for this tutorial [here](https://raw.githubusercontent.com/hauselin/rtutorialsite/master/0005_analyze_multi_subject_trial_data.Rmd).
* Download data for all my tutorials [here](https://www.dropbox.com/sh/2ix85ij7chfuqdg/AABpkIMymvjOE5Dh9iV7SULga?dl=1).
* Inside the data folder, you should see a `rt_acc_data_raw` folder that contains three subjects' data, which we will analyze in this tutorial.
* Leave the folders and files in the unzipped folder as they are because we'll use R to read the csv files into our R environment.

## Dataset information

* 3 subjects: 30, 35, 39
* Completed a task where reaction time (rt) and accuracy (acc) were collected
* 2 blocks in the experiment (1, 2)
* 40 trials per block; 4 blocks in total; 80 trials in total
* 2 experimental manipulations: reward available on trial (high vs low); instructions for block (focus on responding carefully [caution] vs focusing on responding quickly (speed))

### Variables 

* subject: subject id
* completed: whether subject completed study (1, 0)
* rt: reaction time
* acc: accuracy
* reward: reward available on trial (high or low)
* instructions: block instructions (caution or speed)
* block: block number (1, 2)
* trialinblock: trial in block (1 to 4 0)
* overalltrial: overall trial number (1 to 80)

## Organizing folders/directory

This is how my directories/folders/files look like. You should make sure yours look similar before you continue.

— 0005_analyze_multi_subject_trial_data.Rmd

——— data

————— rt_acc_data_raw

——————— subject_30.csv

——————— subject_35.csv

——————— subject_39.csv

## Load libraries

```{r libraries}
library(tidyverse); library(data.table); library(lme4); library(lmerTest); library(ggbeeswarm); library(hausekeep)
```

<aside>
[hausekeep](https://hauselin.github.io/hausekeep/) is my package. It contains a few helper functions. You may or may not want to use them. If you want to, install the package by running `devtools::install_github("hauselin/hausekeep")` (you might need to install `devtools` package first). 
</aside>

## Read all subjects' data into R

```{r read data}
(files <- list.files(path = "./data/rt_acc_data_raw", pattern = "subject_", full.names = TRUE)) # find matching file names
dt1 <- bind_rows(lapply(files, fread)) %>% as.data.table() # read data, combine items in list, convert to data.table
```

Note and explanation: `lapply` loops through each item in `files` and applies the `fread` function to each item. `bind_rows` combines all the dataframes stored as separate items in the list, and `as.data.table()` converts the merged dataframe into `data.table` class.

## Summarize experimental design

```{r summarize design}
dt1[, .N, by = subject] # rows (trials) per subject
dt1[, .N, keyby = .(subject, block)] # blocks per subject and rows/trials per block
dt1[, .N, keyby = .(subject, instructions, reward)] # row (trials) per subject for instructions/reward conditions
```

## Remove outlier trials

```{r remove outliers}
dt1[rt < 0.2, `:=` (rt = NA, acc = NA)] # if rt < 0.2, convert rt and acc to NA (too fast response on that trial)
dt1[rt > 3.0, `:=` (rt = NA, acc = NA)] # if rt > 3.0, convert rt and acc to NA (too slow response on that trial)
summary(dt1$rt) # check range and number of NA
summary(dt1$acc) # check range and number of NA
```

## Compute mean RT and ACC for each subject, for each condition

```{r mean rt/acc}
dt1_avg <- dt1[, .(rt = mean(rt, na.rm = T), acc = mean(acc, na.rm = T)), keyby = .(subject, instructions, reward)]
dt1_avg
```

## Compute grand average RT and ACC (and within-subject error bars)

```{r grand average rt/acc}
dt1_grdavg <- seWithin(data = dt1_avg, measurevar = c("rt", "acc"), withinvars = c("instructions", "reward"), idvar = "subject")
```

<aside>
`seWithin` computes within-subject error bars and is a function from my `hausekeep` package.
</aside>

## Plot reaction time 

```{r plot rt}
plot_rt <- ggplot(dt1_grdavg$rt, aes(instructions, rt, col = reward)) +
  geom_quasirandom(data = dt1_avg, alpha = 0.8, dodge = 1, size = 1.5) +
  geom_point(position = position_dodge(1), shape = 95, size = 6) +
  geom_errorbar(aes(ymin = rt - ci, ymax = rt + ci), position = position_dodge(1), width = 0, size = 1.1) +
  labs(y = "reaction time (95% CI)")
plot_rt
```

## Plot accuracy

```{r plot acc}
plot_acc <- ggplot(dt1_grdavg$acc, aes(instructions, acc, col = reward)) +
  geom_quasirandom(data = dt1_avg, alpha = 0.8, dodge = 1, size = 1.5) +
  geom_point(position = position_dodge(1), shape = 95, size = 6) +
  geom_errorbar(aes(ymin = acc - ci, ymax = acc + ci), position = position_dodge(1), width = 0, size = 1.1) +
  scale_y_continuous(limits = c(0, 1.2), breaks = seq(0, 1.0, 0.25)) +
  labs(y = "accuracy (95% CI)")
plot_acc
```

## Fit linear mixed models to single-trial RT and ACC

```{r fit models}
model_rt <- lmer(rt ~ instructions + reward + (1 | subject), data = dt1)
summary(model_rt)
summaryh(model_rt) # summary function from my hausekeep package

model_acc <- glmer(acc ~ instructions + reward + (1 | subject), data = dt1, family = binomial)
summary(model_acc)
summaryh(model_acc) # summary function from my hausekeep package
```

## Support my work

[Support my work and become a patron here](https://donorbox.org/support-my-teaching)!