forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
96 lines (60 loc) · 2.34 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
#### Loading and preprocessing the data
```{r}
data <- read.csv("activity/activity.csv", colClasses=c(NA,"Date",NA))
```
#### What is mean total number of steps taken per day?
Mean number of steps per day: `r mean(data$steps, na.rm=TRUE)`
Median number of steps per day: `r median(data$steps, na.rm=TRUE)`
Histogram of steps:
```{r}
hist(data$steps)
```
#### What is the average daily activity pattern?
Plot of average number of steps per interval:
```{r}
library(plyr)
intervalMeans <- ddply(data, .(interval), numcolwise(mean, na.rm=TRUE))
plot(intervalMeans$interval, intervalMeans$steps, type="l", xlab="5-minute interval", ylab="Number of steps")
```
```{r}
maxInterval <- intervalMeans[which.max(intervalMeans$steps),]
```
The interval with the highest average number of steps is `r maxInterval$interval` with `r maxInterval$steps` steps.
#### Imputing missing values
```{r}
completeRows <- complete.cases(data)
```
Total number of rows with NAs: `r summary(completeRows)["FALSE"]`
We will fill missing values with averages from the intervals.
```{r}
imputeData <- transform(data, steps = ifelse(is.na(steps), ave(steps, interval, FUN = function(x) mean(x, na.rm = TRUE)), steps))
```
Histogram of imputed steps:
```{r}
hist(imputeData$steps)
```
Mean number of steps per day (with imputed): `r mean(imputeData$steps, na.rm=TRUE)`
Median number of steps per day (with imputed: `r median(imputeData$steps, na.rm=TRUE)`
These are the same as pre-imputed values.
#### Are there differences in activity patterns between weekdays and weekends?
Add weekend and weekday columns:
```{r}
days <- weekdays(imputeData$date)
imputeData$weekend <- days %in% c("Saturday", "Sunday")
head(imputeData)
```
Plot weekend and weekday average steps:
```{r}
weekendSubset <- subset(imputeData, weekend==TRUE)
weekdaySubset <- subset(imputeData, weekend==FALSE)
weekendMeans <- ddply(weekendSubset, .(interval), numcolwise(mean, na.rm=TRUE))
weekdayMeans <- ddply(weekdaySubset, .(interval), numcolwise(mean, na.rm=TRUE))
plot(weekendMeans$interval, weekendMeans$steps, type="l", xlab="5-minute interval", ylab="Number of steps", main="Weekend")
plot(weekdayMeans$interval, weekdayMeans$steps, type="l", xlab="5-minute interval", ylab="Number of steps", main="Weekday")
```