-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathfinalReport.Rmd
427 lines (308 loc) · 19.9 KB
/
finalReport.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
---
title: "CARPS Reproducibility Report"
output:
html_document:
toc: true
toc_float: true
---
```{r}
articleID <- "8-7-2014_PS" # insert the article ID code here e.g., "10-3-2015_PS"
reportType <- 'final'
pilotNames <- "Mufan Luo, Sean Raymond Zion" # insert the pilot's name here e.g., "Tom Hardwicke". If there are multiple pilots enter both names in a character string e.g., "Tom Hardwicke, Bob Dylan"
copilotNames <- "Tom Hardwicke, Manuel Bohn" # insert the co-pilot's name here e.g., "Michael Frank". If there are multiple co-pilots enter both names in a character string e.g., "Tom Hardwicke, Bob Dylan"
pilotTTC <- 240 # insert the pilot's estimated time to complete (in minutes, fine to approximate) e.g., 120
copilotTTC <- 200 # insert the co- pilot's estimated time to complete (in minutes, fine to approximate) e.g., 120
pilotStartDate <- as.Date("01/11/17", format = "%m/%d/%y") # insert the pilot's start date in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
copilotStartDate <- as.Date("06/06/18", format = "%m/%d/%y") # insert the co-pilot's start date in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
completionDate <- as.Date("06/27/18", format = "%m/%d/%y") # copilot insert the date of final report completion (after any necessary rounds of author assistance) in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
```
-------
#### Methods summary:
This is a within-subject experiment where 24 kindergarten children learned in two classroom environments (decorated classroom vs. sparse classroom). Learning rate was assessed by comparing pre- and a post-test performance.
------
#### Target outcomes:
> Effect of classroom type on learning
Pretest accuracy was statistically equivalent in the sparse classroom
condition (M = 22%) and the decorated-classroom
condition (M = 23%), paired-samples t(22) < 1, and
accuracy in both conditions was not different from
chance, both one-sample ts (22) < 1.3, ps > .21. The children’s
posttest scores were significantly higher than their
pretest scores in both experimental conditions, both
paired-samples ts(22) > 4.72, ps ≤ .0001 (Fig. 4). Therefore,
in both experimental conditions, the children successfully
learned from the instruction. However, their learning
scores were higher in the sparse-classroom condition
(M = 55%) than in the decorated-classroom condition
(M = 42%), paired-samples t(22) = 2.95, p = .007; this
effect was of medium size, Cohen’s d = 0.65.
Analysis of gain scores corroborated the results of the
analysis of the posttest scores. Gain scores were calculated
by subtracting each participant’s pretest score from
his or her posttest score. Pairwise comparisons indicated
that the children’s learning gains were higher in the
sparse-classroom condition (M = 33%, SD = 22) than in
the decorated-classroom condition (M = 18%, SD = 19),
paired-sample t(22) = 3.49, p = .002, Cohen’s d = 0.73.
------
```{r global_options, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)
```
# Step 1: Load packages
```{r}
library(tidyverse)
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(ReproReports) # custom report functions
library(ggplot2)
library(lsr)
library(dplyr)# for data munging
```
```{r}
# Prepare report object. This will be updated automatically by the reproCheck function each time values are compared.
reportObject <- data.frame(dummyRow = TRUE, reportedValue = NA, obtainedValue = NA, valueType = NA, percentageError = NA, comparisonOutcome = NA, eyeballCheck = NA)
```
# Step 2: Load data
```{r}
d <- read_excel("data/VisualEnv-Attn-Learning-Data.xlsx", 2)
d.gain.pre <- read_excel("data/VisualEnv-Attn-Learning-Data.xlsx", 3, skip = 2)
d.gain.post <- read_excel("data/VisualEnv-Attn-Learning-Data.xlsx", 4,skip = 2)
```
# Step 3: Tidy data
```{r}
d.tidy <- d %>%
# exclude participant # 15 as reported in the paper
filter(ID != "cs15") %>%
# select columns for target analysis
select("ID",
"Pre-Test-Sparse-Classroom",
"Pre-Test-Decorated Classroom",
"Post-Test-Sparse-Classroom",
"Post-Test-Decorated Classroom") %>%
gather(condition,score,-ID) %>%
mutate(classroom = ifelse(grepl("Sparse",condition), "sparse","decorated"),
test = ifelse(grepl("Pre-Test",condition), "pre","post" ),
score = as.numeric(score)) %>%
arrange(ID) %>%
select(-condition)
d.tidy.gain.pre <- d.gain.pre %>%
select(-...11,-`#`) %>%
gather(pre_test_type,score, -ID)%>%
arrange(ID)%>%
mutate(classroom = ifelse(grepl("Sparse",pre_test_type), "sparse","decorated"),
test = "pre",
type = ifelse(grepl("Plate",pre_test_type), "plate",ifelse(grepl("Volcano",pre_test_type), "volcano",ifelse(grepl("Bugs",pre_test_type), "bugs",ifelse(grepl("Stone",pre_test_type), "stone",ifelse(grepl("Solar",pre_test_type), "solar",ifelse(grepl("Flight",pre_test_type), "flight","total")))))))%>%
filter(type != "total") %>%
select(-pre_test_type)
d.tidy.gain.post <- d.gain.post %>%
select(-...11,-`#`)%>%
gather(pre_test_type,score, -ID)%>%
arrange(ID)%>%
mutate(classroom = ifelse(grepl("Sparse",pre_test_type), "sparse","decorated"),
test = "post",
type = ifelse(grepl("Plate",pre_test_type), "plate",ifelse(grepl("Volcano",pre_test_type), "volcano",ifelse(grepl("Bugs",pre_test_type), "bugs",ifelse(grepl("Stone",pre_test_type), "stone",ifelse(grepl("Solar",pre_test_type), "solar",ifelse(grepl("Flight",pre_test_type), "flight","total")))))),
score = ifelse(score == "Absent",NA,score),
score = as.numeric(score))%>%
filter(type != "total") %>%
select(-pre_test_type)
d.tidy.gain <-bind_rows(d.tidy.gain.pre,d.tidy.gain.post)%>%
filter(ID != "cs15")%>%
arrange(ID)%>%
spread(test,score)%>%
na.omit()%>%
mutate(gain_by_type = post - pre)%>%
group_by(ID,classroom)%>%
summarise(gain = mean(gain_by_type))
```
There should be 23 participants in the final data file.
```{r}
d.tidy %>%
summarise(n = length(unique(ID))) %>%
kable()
```
# Step 4: Run analysis
## Descriptive statistics
Pretest:
> ... accuracy was statistically equivalent in the sparse classroom condition (M = 22%) and the decorated-classroom condition (M = 23%)...
Posttest:
> However, their learning scores were higher in the sparse-classroom condition (M = 55%) than in the decorated-classroom condition (M = 42%)"
Check values from data:
```{r}
descriptives <- d.tidy %>%
group_by(classroom,test) %>%
summarise(mean =mean(score))
descriptives %>%
kable(digits = 2)
```
```{r}
reportObject <- reproCheck(reportedValue = "0.22", obtainedValue = descriptives %>% filter(classroom == "sparse", test == "pre") %>% pull(mean), valueType = "mean")
reportObject <- reproCheck(reportedValue = "0.55", obtainedValue = descriptives %>% filter(classroom == "sparse", test == "post") %>% pull(mean), valueType = "mean")
reportObject <- reproCheck(reportedValue = "0.23", obtainedValue = descriptives %>% filter(classroom == "decorated", test == "pre") %>% pull(mean), valueType = "mean")
reportObject <- reproCheck(reportedValue = "0.42", obtainedValue = descriptives %>% filter(classroom == "decorated", test == "post") %>% pull(mean), valueType = "mean")
```
Minor deviation for one reported value.
Gain scores:
> ...gains were higher in the sparse-classroom condition (M = 33%, SD = 22) than in the decorated-classroom condition (M = 18%, SD = 19).
Obtained values:
```{r}
gain.descriptives <- d.tidy.gain %>%
group_by(classroom) %>%
summarise(mean = mean(gain),
sd = sd(gain))
gain.descriptives %>%
kable(digits = 2)
```
```{r}
reportObject <- reproCheck(reportedValue = "0.33", obtainedValue = gain.descriptives %>% filter(classroom == "sparse") %>% pull(mean), valueType = "mean")
reportObject <- reproCheck(reportedValue = "0.22", obtainedValue = gain.descriptives %>% filter(classroom == "sparse") %>% pull(sd), valueType = "sd")
reportObject <- reproCheck(reportedValue = "0.18", obtainedValue = gain.descriptives %>% filter(classroom == "decorated") %>% pull(mean), valueType = "mean")
reportObject <- reproCheck(reportedValue = "0.19", obtainedValue = gain.descriptives %>% filter(classroom == "decorated") %>% pull(sd), valueType = "sd")
```
Minor deviations from reported values.
## Inferential statistics
> Pretest accuracy was statistically equivalent in the sparse classroom condition (M = 22%) and the decorated-classroom
condition (M = 23%), paired-samples t(22) < 1,..."
```{r}
pre <- d.tidy %>%
filter(test == "pre")
pre.comparison <- t.test(pre$score ~ pre$classroom, paired = TRUE, alternative = "two.sided")
```
There is no specific value to be compared. Obtained t value is smaller than 1.
```{r}
reportObject <- reproCheck(reportedValue = "<1", obtainedValue = pre.comparison$statistic, valueType = "t", eyeballCheck = TRUE)
```
Check df:
```{r}
reportObject <- reproCheck(reportedValue = "22", obtainedValue = pre.comparison$parameter[['df']], valueType = "df")
```
> ...and accuracy in both conditions was not different from chance, both one-sample ts (22) < 1.3, ps > .21.
```{r}
pre.chance.sparse <- t.test(pre$score[pre$classroom == "sparse"], mu = .25, alternative = "two.sided")
pre.chance.decorated <- t.test(pre$score[pre$classroom == "decorated"], mu = .25, alternative = "two.sided")
```
Both t-values are smaller than 1 as reported, however they are negative since the means are both numerically below chance (25%). Obtained p-values are larger than the reported .21 (smallest is .22).
```{r}
reportObject <- reproCheck(reportedValue = "22", obtainedValue = pre.chance.sparse$parameter[['df']], valueType = "df")
reportObject <- reproCheck(reportedValue = "<1.3", obtainedValue = pre.chance.sparse$statistic, valueType = "t",eyeballCheck = TRUE)
reportObject <- reproCheck(reportedValue = "> .21", obtainedValue = pre.chance.sparse$p.value, valueType = "p",eyeballCheck = TRUE)
reportObject <- reproCheck(reportedValue = "22", obtainedValue = pre.chance.decorated$parameter[['df']], valueType = "df")
reportObject <- reproCheck(reportedValue = "<1.3", obtainedValue = pre.chance.decorated$statistic, valueType = "t",eyeballCheck = TRUE)
reportObject <- reproCheck(reportedValue = "> .21", obtainedValue = pre.chance.decorated$p.value, valueType = "p",eyeballCheck = TRUE)
```
> The children’s posttest scores were significantly higher than their pretest scores in both experimental conditions, both paired-samples ts(22) > 4.72, ps ≤ .0001 (Fig. 4).
```{r}
pre.pos.comp.sparse <- t.test(d.tidy$score[d.tidy$classroom == "sparse"] ~ d.tidy$test[d.tidy$classroom == "sparse"], paired = TRUE, alternative = "two.sided")
pre.pos.comp.decorated <- t.test(d.tidy$score[d.tidy$classroom == "decorated"] ~ d.tidy$test[d.tidy$classroom == "decorated"], paired = TRUE, alternative = "two.sided")
```
Smaller obtained p-value (decorated room) is 4.70 and therefore not larger than 4.72 as reported. P-values are smaller or equal to .0001 as reported.
```{r}
reportObject <- reproCheck(reportedValue = "22", obtainedValue = pre.pos.comp.sparse$parameter[['df']], valueType = "df")
reportObject <- reproCheck(reportedValue = "> 4.72", obtainedValue = pre.pos.comp.sparse$statistic, valueType = "t",eyeballCheck = TRUE)
reportObject <- reproCheck(reportedValue = "≤ .0001", obtainedValue = pre.pos.comp.sparse$p.value, valueType = "p",eyeballCheck = TRUE)
reportObject <- reproCheck(reportedValue = "22", obtainedValue = pre.pos.comp.decorated$parameter[['df']], valueType = "df")
#reportObject <- reproCheck(reportedValue = "4.72", obtainedValue = pre.pos.comp.decorated$statistic, valueType = "t")
# only minor deviation from reported value.
reportObject <- reproCheck(reportedValue = "> 4.72", obtainedValue = pre.pos.comp.decorated$statistic, valueType = "t",eyeballCheck = TRUE)
reportObject <- reproCheck(reportedValue = "≤ .0001", obtainedValue = pre.pos.comp.decorated$p.value, valueType = "p",eyeballCheck = TRUE)
```
> However, their learning scores were higher in the sparse-classroom condition (M = 55%) than in the decorated-classroom condition (M = 42%), paired-samples t(22) = 2.95, p = .007; this effect was of medium size, Cohen’s d = 0.65."
```{r}
post <- d.tidy %>%
filter(test == "post")
# write equation this way to get t-value with the same sign.
post.comparison <- t.test(post$score[post$classroom == "sparse"],post$score[post$classroom == "decorated"], paired = TRUE, alternative = "two.sided")
post.comparison.d <- cohensD(post$score ~ post$classroom)
```
T- and p-value are as reported. Obtained effect size is slightly smaller than the reported (0.63 compared to 0.65)
```{r}
reportObject <- reproCheck(reportedValue = "22", obtainedValue = post.comparison$parameter[['df']], valueType = "df")
reportObject <- reproCheck(reportedValue = "2.95", obtainedValue = post.comparison$statistic, valueType = "t")
reportObject <- reproCheck(reportedValue = ".007", obtainedValue = post.comparison$p.value, valueType = "p")
reportObject <- reproCheck(reportedValue = "0.65", obtainedValue = post.comparison.d, valueType = "d")
```
> Pairwise comparisons indicated that the children’s learning gains were higher in the sparse-classroom condition (M = 33%, SD = 22) than in the decorated-classroom condition (M = 18%, SD = 19), paired-sample t(22) = 3.49, p = .002, Cohen’s d = 0.73."
```{r}
gain.comparison <- t.test(d.tidy.gain$gain[d.tidy.gain$classroom == "sparse"], d.tidy.gain$gain[d.tidy.gain$classroom == "decorated"], paired = TRUE, alternative = "two.sided")
gain.comparison.d <- cohensD(d.tidy.gain$gain ~ d.tidy.gain$classroom)
```
T-value and effect size as reported.
```{r}
reportObject <- reproCheck(reportedValue = "22", obtainedValue = gain.comparison$parameter[['df']], valueType = "df")
reportObject <- reproCheck(reportedValue = "3.49", obtainedValue = gain.comparison$statistic, valueType = "t")
reportObject <- reproCheck(reportedValue = ".002", obtainedValue = gain.comparison$p.value, valueType = "p")
reportObject <- reproCheck(reportedValue = "0.73", obtainedValue = gain.comparison.d, valueType = "d")
```
### Replicate Figure 4 in the paper
```{r}
library(ggthemes)
p1 <- d.tidy %>%
mutate(classroom = relevel(as.factor(classroom), ref = "sparse"),
test = relevel(as.factor(test), ref = "pre")) %>%
group_by(classroom,test)%>%
summarise(mean = mean(score)*100,
se = (sd(score)/(sqrt(length(ID))))*100)
ggplot(p1,
aes(x = classroom, y = mean, fill = test)) +
geom_bar(stat="identity", position = position_dodge(), color = 'black') +
labs(y = "Percentage Correct (%)") +
geom_errorbar(aes(ymin = mean-se, ymax = mean+se),position = position_dodge(width = 0.9), width = .1)+
ylim(0,100)+
geom_hline(yintercept = 25, lty=2)+
theme_few()
```
# Step 5: Conclusion
Initially we encountered major errors for the analysis of gain scores. The t-value and the effect size obtained were considerably smaller compared to the ones reported in the paper. We originally calculated the gain scores by averaging scores in each test phase (pre and post) in each condition for each subject and then subtracting the averages for the post from the ones for the pre test. This way of calculating scores was based on the following section of the paper:
> Gain scores were calculated by subtracting each participant’s pretest score from his or her posttest score.
When querying the authors with these discrepancies, they informed us that the gain scores were calculated in a different way, namely by subtracting pre from post test scores for each specific lesson (e.g. plate tectonics) and then averaging across lesson difference scores to get a mean gain score per subject and condition. This also included excluding some sub lessons for some of the participants due to absence. This way of calculating the gain scores was not obvious based on the information in the paper. The supplementary material also did not include any additional inforamtion about gain scores. After correcting the way the gain scores were calculated, all major issues were resolved.
```{r}
Author_Assistance = TRUE # was author assistance provided? (if so, enter TRUE)
Insufficient_Information_Errors <- 0 # how many discrete insufficient information issues did you encounter?
# Assess the causal locus (discrete reproducibility issues) of any reproducibility errors. Note that there doesn't necessarily have to be a one-to-one correspondance between discrete reproducibility issues and reproducibility errors. For example, it could be that the original article neglects to mention that a Greenhouse-Geisser correct was applied to ANOVA outcomes. This might result in multiple reproducibility errors, but there is a single causal locus (discrete reproducibility issue).
locus_typo <- 0 # how many discrete issues did you encounter that related to typographical errors?
locus_specification <- 4 # how many discrete issues did you encounter that related to incomplete, incorrect, or unclear specification of the original analyses?
locus_analysis <- 0 # how many discrete issues did you encounter that related to errors in the authors' original analyses?
locus_data <- 0 # how many discrete issues did you encounter that related to errors in the data files shared by the authors?
locus_unidentified <- 0 # how many discrete issues were there for which you could not identify the cause
# How many of the above issues were resolved through author assistance?
locus_typo_resolved <- 0 # how many discrete issues did you encounter that related to typographical errors?
locus_specification_resolved <- 4 # how many discrete issues did you encounter that related to incomplete, incorrect, or unclear specification of the original analyses?
locus_analysis_resolved <- 0 # how many discrete issues did you encounter that related to errors in the authors' original analyses?
locus_data_resolved <- 0 # how many discrete issues did you encounter that related to errors in the data files shared by the authors?
locus_unidentified_resolved <- 0 # how many discrete issues were there for which you could not identify the cause
Affects_Conclusion <- FALSE # Do any reproducibility issues encounter appear to affect the conclusions made in the original article? This is a subjective judgement, but you should taking into account multiple factors, such as the presence/absence of decision errors, the number of target outcomes that could not be reproduced, the type of outcomes that could or could not be reproduced, the difference in magnitude of effect sizes, and the predictions of the specific hypothesis under scrutiny.
```
```{r}
reportObject <- reportObject %>%
filter(dummyRow == FALSE) %>% # remove the dummy row
select(-dummyRow) %>% # remove dummy row designation
mutate(articleID = articleID) %>% # add the articleID
select(articleID, everything()) # make articleID first column
# decide on final outcome
if(any(!(reportObject$comparisonOutcome %in% c("MATCH", "MINOR_ERROR"))) | Insufficient_Information_Errors > 0){
finalOutcome <- "Failure without author assistance"
if(Author_Assistance == T){
finalOutcome <- "Failure despite author assistance"
}
}else{
finalOutcome <- "Success without author assistance"
if(Author_Assistance == T){
finalOutcome <- "Success with author assistance"
}
}
# collate report extra details
reportExtras <- data.frame(articleID, pilotNames, copilotNames, pilotTTC, copilotTTC, pilotStartDate, copilotStartDate, completionDate, Author_Assistance, finalOutcome, Insufficient_Information_Errors, locus_typo, locus_specification, locus_analysis, locus_data, locus_unidentified, locus_typo_resolved, locus_specification_resolved, locus_analysis_resolved, locus_data_resolved, locus_unidentified_resolved)
# save report objects
if(reportType == "pilot"){
write_csv(reportObject, "pilotReportDetailed.csv")
write_csv(reportExtras, "pilotReportExtras.csv")
}
if(reportType == "final"){
write_csv(reportObject, "finalReportDetailed.csv")
write_csv(reportExtras, "finalReportExtras.csv")
}
```
# Session information
```{r session_info, include=TRUE, echo=TRUE, results='markup'}
devtools::session_info()
```