-
Notifications
You must be signed in to change notification settings - Fork 3
/
finalReport.Rmd
459 lines (331 loc) · 21.7 KB
/
finalReport.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
---
title: "CARPS Reproducibility Report"
output:
html_document:
toc: true
toc_float: true
---
```{r}
articleID <- "7-3-2015_PS" # insert the article ID code here e.g., "10-3-2015_PS"
reportType <- 'final'
pilotNames <- "Danielle Boles, Michael Ko" # insert the pilot's name here e.g., "Tom Hardwicke". If there are multiple pilots enter both names in a character string e.g., "Tom Hardwicke, Bob Dylan"
copilotNames <- "Ben Peloquin" # insert the co-pilot's name here e.g., "Michael Frank". If there are multiple co-pilots enter both names in a character string e.g., "Tom Hardwicke, Bob Dylan"
pilotTTC <- 150 # insert the pilot's estimated time to complete (in minutes, fine to approximate) e.g., 120
copilotTTC <- 200 # insert the co- pilot's estimated time to complete (in minutes, fine to approximate) e.g., 120
pilotStartDate <- as.Date("10/27/17", format = "%m/%d/%y") # insert the pilot's start date in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
copilotStartDate <- as.Date("06/13/18", format = "%m/%d/%y") # insert the co-pilot's start date in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
completionDate <- as.Date("04/20/19", format = "%m/%d/%y") # copilot insert the date of final report completion (after any necessary rounds of author assistance) in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
```
-------
#### Methods summary:
Participants (N=21) completed a series of trials that required them to switch or stay from one task to the other. One task was to choose the larger value of the two values if surrounded by a green box. The other task was to choose the value with the larger font if surrounded by a blue box. Subliminal cues followed by a mask were presented before each trial. Cues included "O" (non-predictive cue), "M" (switch predictive cue), and "T" (repeat predictive cue). Reaction times and performance accuracy were measured.
------
#### Target outcomes:
> Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (836 vs. 689 ms) and lower accuracy rates (79% vs. 92%). If participants were able to learn the predictive value of the cue that preceded only switch trials and could instantiate relevant anticipatory control in response to it, the performance on switch trials preceded by this cue would be better than on switch trials preceded by the nonpredictive cue. This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; mean difference = 52 ms, 95% confidence interval, or CI = [19.5, 84.4]), two-tailed paired t(20) = 3.34, p < .01. However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%), p = .8.
------
```{r global_options, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)
```
## Step 1: Load packages
```{r}
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(ReproReports) # custom report functions
library(broom)
```
```{r}
# Prepare report object. This will be updated automatically by the reproCheck function each time values are compared.
reportObject <- data.frame(dummyRow = TRUE, reportedValue = NA, obtainedValue = NA, valueType = NA, percentageError = NA, comparisonOutcome = NA, eyeballCheck = NA)
```
## Step 2: Load data
```{r warning=FALSE}
# This reads all the participants data (each is in a seperate xls file) in and combines them into one dataframe
# Each xls has 250 rows, the rest is their calculations using excel, which we don't want in the data
files <- dir('data/Experiment 1')
data <- data.frame()
id <- 1
for (file in files){
if(file != 'Codebook.xls'){
temp_data <- read_xls(file.path('data/Experiment 1', file))
temp_data$id <- id
id <- id + 1
temp_data <- temp_data[1:250, ]
data <- rbind(data, temp_data)
}
}
```
```{r bp-load-codebook, echo=FALSE}
code_book_path <- 'data/Experiment 1/Codebook.xls'
d_codebook <- read_xls(code_book_path, col_names=FALSE) # Note that 2, 4, 8 aren't listed in codebook for Primes
```
Do we have data from 21 participants as expected?
```{r bp-check-num-participants, echo=FALSE}
assertthat::are_equal(length(unique(data$id)), 21)
```
## Step 3: Tidy data
Each row is an observation. The data is already in tidy format.
## Step 4: Run analysis
### Pre-processing
The codebook for Experiment 1 listed `O`, `T`, and `M` as the only primes that they used. However, we found that some participants had primes `2`, `4`, and `8` instead.
We inferred that `2` is the nonpredictive cue, `4` is the repeat predictive cue and `8` as the switch predictive cue based on how the other columns were named.
Thus we will proceed the analysis with this assumption by recoding the primes this way.
```{r bp-check-prime-codes, echo=FALSE, eval=FALSE}
#
# Note this, chunk is not a part of the evaluation,
# just for sanity checking
#
data %>% select(Block_Number) %>% unique
# Prime coding
# ------------
# Participants with numbered primes 2, 4, and 8
number_prime_ids <- data %>%
filter(Prime %in% c(2, 4, 8)) %>%
select(id) %>%
unique
# Participants with lettered primes 2, 4, and 8
letter_prime_ids <- data %>%
filter(Prime %in% c('O', 'M', 'T')) %>%
select(id) %>%
unique
# One person seems to have seen both???
intersect(number_prime_ids$id, letter_prime_ids$id)
# Occurences of numbered primes...
data %>%
filter(Prime %in% c(2, 4, 8)) %>%
select(Event_Number) %>%
group_by(Event_Number) %>%
summarise(cnt=n()) %>%
ggplot(aes(x=Event_Number, y=cnt)) +
geom_bar(stat='identity')
```
```{r recode}
data$originalPrime <- data$Prime
data$Prime <- recode(data$Prime, '2' = "O", '4' = "T", '8' = "M")
#recode variables to make referencing easier
data$originalTrialType <- data$TrialType
data$Prime <- recode(data$Prime, 'O' = "Nonpredictive Cue", 'M' = "Switch Predictive Cue", 'T' = "Repeat Predictive Cue")
data$TrialType <- recode(data$TrialType, '0' = "Neither", '1' = "Repeat Trials", '2' = "Switch Trials")
```
**Note**: `TrialType=0` is not listed in codebook.
### Descriptive statistics
We will first try to reproduce median reaction time of switch trials and repeat trials.
> Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (836 vs. 689 ms)
We used median as the author's instructed to use median for reaction time unless otherwise reported:
> unless stated otherwise, the statistical tests were performed on the more stable median values rather than mean values.
```{r median_RT}
med_RT <- data %>%
group_by(TrialType) %>%
summarise(median_RT = median(RT),
mean_RT=mean(RT))
kable(med_RT[-1, ])
```
```{r median_RT diff}
reportObject <- reproCheck(reportedValue = "836", obtained = filter(med_RT, TrialType == "Switch Trials")$median_RT, valueType = 'median')
reportObject <- reproCheck(reportedValue = "689", obtained = filter(med_RT, TrialType == "Repeat Trials")$median_RT, valueType = 'median')
```
Note than means would not have matched either.
Now we are checking if only including participants with 'letter' primes enables us to reproduce the target values...
```{r bp-check-RTs-filtered-on-OMT, echo=FALSE, eval=FALSE}
data %>%
filter(originalPrime %in% c("O", "M", "T")) %>%
group_by(TrialType) %>%
summarise(median_RT = median(RT),
n=n())
```
This would not have matched either.
-----------------
Next we will try to reproduce the accuracy of switch trials and repeat trials.
> Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in [...] lower accuracy rates (79% vs. 92%)
```{r mean_Correct_Response}
mean_RespCorr <- data %>%
group_by(TrialType) %>%
summarise(accuracy = mean(RespCorr))
kable(mean_RespCorr[-1, ])
```
```{r mean_acc diff}
reportObject <- reproCheck(reportedValue = "0.79", obtained = filter(mean_RespCorr, TrialType == "Switch Trials")$accuracy, valueType = 'mean')
reportObject <- reproCheck(reportedValue = "0.92", obtained = filter(mean_RespCorr, TrialType == "Repeat Trials")$accuracy, valueType = 'mean')
```
Minor errors: These values are extremeley close although rounding mean for accuracy repeat accuracy is slightly different.
--------------------
Now we will analyze Predicitve Switch Cues vs Nonpredictive Switch Cues. Let's start with reaction time.
> This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; ... )
Later the authors do a t test with 20 as the degrees of freedoms. So we will assume that these mean values come from individual RT averages.
```{r mean Prime RT}
mean_Prime_RT_Ind <- data %>%
filter(TrialType == "Switch Trials") %>%
group_by(id, Prime) %>%
summarise(meanRT = mean(RT),
medianRT = median(RT)) #Individual Means
mean_Prime_RT <- mean_Prime_RT_Ind %>% group_by(Prime) %>%
summarise(grandmeanRT = mean(meanRT),
grandMedianRT = median(medianRT)) #Grand Means
kable(mean_Prime_RT)
```
Minor errors here. Report:
```{r mean Prime RT diff}
npc_mean <- mean_Prime_RT %>% filter(Prime == "Nonpredictive Cue") %>% pull(grandmeanRT)
reportObject <- reproCheck(reportedValue = "871", obtained = npc_mean, valueType = 'mean')
spc_mean <- mean_Prime_RT %>% filter(Prime == "Switch Predictive Cue") %>% pull(grandmeanRT)
reportObject <- reproCheck(reportedValue = "819", obtained = spc_mean, valueType = 'mean')
```
```{r bp-check-switch-times, echo=FALSE, eval=FALSE}
# Check coding of `prime` type. No difference when we filter on lettered primes (e.g. `Prime = O, M, T`). We do see a difference filtering on `Prime = 2, 4, 8`...
# Filter with original prime (O, M, T)
data %>%
mutate(prime_code_type=ifelse(originalPrime %in% c("O", "M", "T"), 'letters', 'numbers')) %>%
filter(TrialType == "Switch Trials") %>%
group_by(id, prime_code_type, originalPrime) %>%
summarise(meanRT = mean(RT),
medianRT = median(RT)) %>%
group_by(prime_code_type, originalPrime) %>%
summarise(grandmeanRT = mean(meanRT),
grandMedianRT=median(medianRT)) #Grand Means
```
These would not have matched either.
---------------------------
Next we will try to reproduce error rates for switch predicitve cues vs switch nonpredictive cues.
> However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%)
```{r Prime Accuracy}
mean_Prime_RespCorr_Ind <- data %>% filter(TrialType == "Switch Trials") %>% group_by(id, Prime) %>% summarise(meanCorr = mean(RespCorr)) #Individual Means
mean_Prime_RespCorr <- mean_Prime_RespCorr_Ind %>% group_by(Prime) %>% summarise(grandmeanCorr = mean(meanCorr)) #Grand Means
kable(mean_Prime_RespCorr)
```
```{r bp-check-prime-accuracy, echo=FALSE, eval=FALSE}
df_accuracy <- data %>%
filter(TrialType == "Switch Trials") %>%
mutate(prime_coding=ifelse(originalPrime %in% c(2, 4, 6), 'numbers', 'letters')) %>%
group_by(prime_coding, id, originalPrime) %>%
summarise(meanCorr = mean(RespCorr)) #Individual Means
df_accuracy %>%
group_by(originalPrime) %>%
summarise(grandmeanCorr = mean(meanCorr)) #Grand
```
These numbers are fairly close to the reported numbers.
```{r Prime Accuracy Diff}
reportObject <- reproCheck(reported = ".789", obtained = .7994, valueType = 'mean')
reportObject <- reproCheck(reported = ".788", obtained = .7803, valueType = 'mean')
```
### Inferential statistics
The first claim is that in switch trials, predictive cues lead to statistically significant faster reaction times than nonpredictive cues.
> ... the performance on switch trials preceded by this cue would be better than on switch trials preceded by the nonpredictive cue. This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; mean difference = 52 ms, 95% confidence interval, or CI = [19.5, 84.4]), two-tailed paired t(20) = 3.34, p < .01.
```{r Prime RT test}
mean_Prime_RT_Ind <- mean_Prime_RT_Ind %>% select(id, Prime, meanRT) %>%
spread(Prime, meanRT) #spreading so that the cues are easier to compare
test <- t.test(mean_Prime_RT_Ind[['Nonpredictive Cue']], mean_Prime_RT_Ind[['Switch Predictive Cue']], paired = TRUE)
kable(tidy(test))
```
```{r}
reportObject <- reproCheck(reportedValue = "<.01", obtained = test$p.value, valueType = 'p', eyeballCheck = FALSE)
reportObject <- reproCheck(reportedValue = "19.5", obtained = test$conf.int[1], valueType = 'ci')
reportObject <- reproCheck(reportedValue = "84.4", obtained = test$conf.int[2], valueType = 'ci')
reportObject <- reproCheck(reportedValue = "20", obtained = test$parameter, valueType = 'df')
reportObject <- reproCheck(reportedValue = "3.34", obtained = test$statistic, valueType = 't')
```
We do not find the same p value as the original paper. There is ambiguity in how to calculate this statistic which is detailed below.
[INSUFFICIENT INFORMATION ERROR]
**Check** - note that differences in reaction time could be due to differences in `Prime` coding. This just supports the notion that we need to better understand the coding differences before trying to reproduce all the analyses.
```{r bp-prime-rt-ttest}
# Differences in reaction time t.test by prime coding
df_rt_ttest <- data %>%
filter(TrialType == "Switch Trials") %>%
group_by(id, originalPrime) %>%
summarise(meanRT = mean(RT)) %>%
spread(originalPrime, meanRT)
test1 <- t.test(df_rt_ttest$M, df_rt_ttest$O, paired=TRUE)
test2 <- t.test(df_rt_ttest$`2`, df_rt_ttest$`8`, paired=TRUE)
kable(tidy(test1))
kable(tidy(test2))
```
But none of these ways of doing the analysis produce outcomes matching the reported outcomes.
-----------------
Next we will test the second claim.
> However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%), p = .8.
```{r mean Prime accuracy test}
mean_Prime_RespCorr_Ind <-
mean_Prime_RespCorr_Ind %>%
spread(Prime, meanCorr) #spreading so that the cues are easier to compare
test <-
t.test(mean_Prime_RespCorr_Ind[['Nonpredictive Cue']],
mean_Prime_RespCorr_Ind[['Switch Predictive Cue']], paired = TRUE)
thisP <- test$p.value
kable(tidy(test))
```
```{r bp-check-prime-accraucy-ttest, echo=FALSE}
mean_Prime_RespCorr_Ind %>%
ungroup() %>%
summarise(nonPredMean=mean(`Nonpredictive Cue`),
predMean=mean(`Switch Predictive Cue`))
```
Although still not significant, the p value is very different from what was reported.
```{r mean Prime accuracy test diff}
reportObject <- reproCheck(reportedValue = ".8", obtained = thisP, valueType = 'p')
```
## Step 5: Conclusion
In our initial attempts we were not able to reproduce several target outcomes. Generally, all the reaction time statistics (whether we tried means or medians) were different from what was reported, although these were classified as 'minor numerical errors'. Ultimately though there were major errors for inferential test statistics, which could stem from a problem earlier in the analysis pipeline.
There are a number of aspects of the original analysis / data files we are unclear about. We have e-mailed the original authors several times but not recieved a response to our questions (we have recieved responses saying they will get back to us, but this has not happened > 9 months after the last message). We have thus classifed these issues as 4 'insufficient information errors'. The issues we are unclear about are as follows:
**Unclear labels for variables**
In the data file, there is a variable "CorrResp" (1 or 0) and another variable "RespCorr" (TRUE or FALSE). We used "RespCorr" because it was the only variable of the two that was included in the codebook where TRUE=Accurate response
and FALSE=Error. But we still don't know what "CorrResp" is and whether or not it was used in the original analyses.
**Unclear recoding of variables**
In the data file, there is 1 excel file per participant with all of their reaction times to the 250 trials. For some participants, the Prime was coded as the actual prime shown "O", "T", or "M". For other participants, the Prime was coded as "2", "4", and "8". However, we had to infer which number corresponded to each letter by looking at the variables names assigned to trial type and which cue followed ("stay_2", "stay_4", "swt_2", "swt_8").
We coded 2=O, 4=T, 8=M, but we are still unsure whether these are consistent with how the authors coded the prime variable. We should note that some differences do seem exaggerated while subsetting by these two coding schemes -- data that received a prime coding `Prime in c(2, 4, 8)` appeared to have larger `RT` and `Acc` differences compared to those coded with `Prime in c("O", "T", "M")`.
**Ambiguity between using means or medians**
In the article it states that, unless otherwise noted, statistical tests were performed on median values rather than mean values. We followed this according to the paper's protocol. However, we're not able to reproduce the following outcomes using either means or medians for the following findings:
* "performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (836 vs. 689 ms)"
* "mean RT - predictive cue: 819 ms; 95% confidence interval, or CI = [19.5, 84.4], two-tailed paired t(20) = 3.34, p < .01"
**Unclear whether descriptives of means/medians of means/medians of individuals, or means/medians across all trials**
When we tried to reproduce reaction time medians, we realized that the value could have been obtained by calculating a value (mean or median) for each individual, then summarizing those values to one value (mean or median), OR it could have been obtained by a value (mean or median) across ALL trials. We tried several combinations of means or medians with across individuals or across trials, and still could not replicate the descriptive reaction times.
```{r}
Author_Assistance = FALSE # was author assistance provided? (if so, enter TRUE)
Insufficient_Information_Errors <- 4 # how many discrete insufficient information issues did you encounter?
# Assess the causal locus (discrete reproducibility issues) of any reproducibility errors. Note that there doesn't necessarily have to be a one-to-one correspondance between discrete reproducibility issues and reproducibility errors. For example, it could be that the original article neglects to mention that a Greenhouse-Geisser correct was applied to ANOVA outcomes. This might result in multiple reproducibility errors, but there is a single causal locus (discrete reproducibility issue).
locus_typo <- 0 # how many discrete issues did you encounter that related to typographical errors?
locus_specification <- 1 # how many discrete issues did you encounter that related to incomplete, incorrect, or unclear specification of the original analyses?
locus_analysis <- 0 # how many discrete issues did you encounter that related to errors in the authors' original analyses?
locus_data <- 0 # how many discrete issues did you encounter that related to errors in the data files shared by the authors?
locus_unidentified <- 3 # how many discrete issues were there for which you could not identify the cause
# How many of the above issues were resolved through author assistance?
locus_typo_resolved <- 0 # how many discrete issues did you encounter that related to typographical errors?
locus_specification_resolved <- 0 # how many discrete issues did you encounter that related to incomplete, incorrect, or unclear specification of the original analyses?
locus_analysis_resolved <- 0 # how many discrete issues did you encounter that related to errors in the authors' original analyses?
locus_data_resolved <- 0 # how many discrete issues did you encounter that related to errors in the data files shared by the authors?
locus_unidentified_resolved <- 0 # how many discrete issues were there for which you could not identify the cause
Affects_Conclusion <- TRUE # Do any reproducibility issues encounter appear to affect the conclusions made in the original article? This is a subjective judgement, but you should taking into account multiple factors, such as the presence/absence of decision errors, the number of target outcomes that could not be reproduced, the type of outcomes that could or could not be reproduced, the difference in magnitude of effect sizes, and the predictions of the specific hypothesis under scrutiny.
```
```{r}
reportObject <- reportObject %>%
filter(dummyRow == FALSE) %>% # remove the dummy row
select(-dummyRow) %>% # remove dummy row designation
mutate(articleID = articleID) %>% # add the articleID
select(articleID, everything()) # make articleID first column
# decide on final outcome
if(any(!(reportObject$comparisonOutcome %in% c("MATCH", "MINOR_ERROR"))) | Insufficient_Information_Errors > 0){
finalOutcome <- "Failure without author assistance"
if(Author_Assistance == T){
finalOutcome <- "Failure despite author assistance"
}
}else{
finalOutcome <- "Success without author assistance"
if(Author_Assistance == T){
finalOutcome <- "Success with author assistance"
}
}
# collate report extra details
reportExtras <- data.frame(articleID, pilotNames, copilotNames, pilotTTC, copilotTTC, pilotStartDate, copilotStartDate, completionDate, Author_Assistance, finalOutcome, Insufficient_Information_Errors, locus_typo, locus_specification, locus_analysis, locus_data, locus_unidentified, locus_typo_resolved, locus_specification_resolved, locus_analysis_resolved, locus_data_resolved, locus_unidentified_resolved)
# save report objects
if(reportType == "pilot"){
write_csv(reportObject, "pilotReportDetailed.csv")
write_csv(reportExtras, "pilotReportExtras.csv")
}
if(reportType == "final"){
write_csv(reportObject, "finalReportDetailed.csv")
write_csv(reportExtras, "finalReportExtras.csv")
}
```
## Session information
```{r session_info, include=TRUE, echo=TRUE, results='markup'}
devtools::session_info()
```