This repository contains the documents asked for in the project from the Coursera Getting and Cleaning Data course.
The script presented in the file run_analysis.R
does the following:
- Checks if the dataset exists in the current directory and downloads it otherwise
- Imports the
plyr
package - Loads the necessary files to perform the analysis into R, using the function unz() to create the connection:
activity_labels.txt
.features.txt
.X_test.txt
.y_test.txt
.subject_test.txt
.X_train.txt
.y_train.txt
.subject_train.txt
.
- Merges the data from the training and test files, using the function rbind()
- Creates a boolean vector with the column names corresponding to the mean() and std() values, using the function grep to subtract this information from the features vector
- Creates a dataframe corresponding to the subset of the specified columns
- Updates the corresponding names to the activities in the data
- Correct the column names by making them lower case and adding the “subject” and “activity” columns
- Merge all data with the function cbind()
- Creates a new tidy dataset using the ddply() function and stores it in the variable
AveragesData
that contains the average of each variable for each activity and each subject - Writes the text file
tidy_dataset.txt
that contains the resulting tidy dataset