#run_analysis.R
##Purpose
- Merge, label and summarize the "UCI HAR Dataset"
- The output contains the calculated mean by subject and activity of the means and standard deviations from the raw dataset.
- See CodeBook.md for details on input and output data
##Pre-requisites
- The R code was developed and tested on "R version 3.1.2 (2014-10-31)" on Max OS X 10.9, it has not been tested on any other configuration
- The working directory needs to set to the root of the "UCI HAR Dataset"
##How to run
- Start R
- Set working directory to the root of the "UCI HAR Dataset"
- Source the run_analysis.R script
##Input
- The "UCI HAR Dataset" with the following directory structure
- activitiy_labels.txt
- features.txt
- train/
- subject_text.txt
- X_test.txt
- y_test.txt
- test/
- subject_train.txt
- X_train.txt
- y_train.txt
##Output
- A file named "tidy_df.txt" containting the tidy dataset will be created in the working directory
- The file format is space-separated with quoted character vectors
- The file has a header row describing the columns
##Logic
- Load reference data (activity_labels.txt and features.txt)
- Load and label test data set (subject_text.txt, X_test.txt and y_test.txt)
- Feature labels based on features.txt
- Merge test data set into one dataframe
- Load and label training data set (subject_text.txt, X_test.txt and y_test.txt)
- Feature labels based on features.txt
- Merge training data set into one dataframe
- Merge test and training data sets
- Label activities based on activity_labels.txt
- Filter features to keep only mean and SD measurements
- Keep feature names ending in "mean()"" or "sd()""
- Create tidy dataframe by calculating mean of all features by subject and activity
- Prepend feature names with "mean_" to indicate that they are mean values
- Remove "()" and replace "-" with "_" in feature names to make them valid for R use
- Store tidy dataframe in "tidy_df.txt" file in working directory