GitHub

#run_analysis.R

##Purpose

Merge, label and summarize the "UCI HAR Dataset"
The output contains the calculated mean by subject and activity of the means and standard deviations from the raw dataset.
See CodeBook.md for details on input and output data

##Pre-requisites

The R code was developed and tested on "R version 3.1.2 (2014-10-31)" on Max OS X 10.9, it has not been tested on any other configuration
The working directory needs to set to the root of the "UCI HAR Dataset"

##How to run

##Input

##Output

A file named "tidy_df.txt" containting the tidy dataset will be created in the working directory
The file format is space-separated with quoted character vectors
The file has a header row describing the columns

##Logic

Load reference data (activity_labels.txt and features.txt)
Load and label test data set (subject_text.txt, X_test.txt and y_test.txt)
- Feature labels based on features.txt
Merge test data set into one dataframe
Load and label training data set (subject_text.txt, X_test.txt and y_test.txt)
- Feature labels based on features.txt
Merge training data set into one dataframe
Merge test and training data sets
Label activities based on activity_labels.txt
Filter features to keep only mean and SD measurements
- Keep feature names ending in "mean()"" or "sd()""
Create tidy dataframe by calculating mean of all features by subject and activity
Prepend feature names with "mean_" to indicate that they are mean values
Remove "()" and replace "-" with "_" in feature names to make them valid for R use
Store tidy dataframe in "tidy_df.txt" file in working directory

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback