Skip to content

kimchitsigai/GettingAndCleaningData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

GettingAndCleaningData

The working directory is supposed to contain the global data files and the train and test directories.

The function run_analysis() first loads plyr as it uses the mapvalues() function for Question 3.

For Question 1, X_train.txt and X_test.txt files are read then rbind()ed.

For Question 2. Mean and Std deviation fields end by "-mean()" and "-std()", this is in in features.txt. The data frame from Question 1 is filtered to keep only the columns whose names contain "-mean()" or "-std()".

For Question 3. y_train.txt and y_test.txt files say to which activity each record corresponds. The activity number (between 1 and 6) is replaced by its label and is appended to each row of the data frame computed in Question 2.

For Question 4. Labels for the columns can be found in the features data frame read from features.txt in Question 2.

For Question 5. subjects_train.txt and subjects_test.txt say to which subject an observation corresponds. The Subject column is appended to the data frame computed in Question 3. Then, for each subject s, activity a, variable v, the mean is calculated. A record containing the Suject, Activity, Variable Name and Average is appended to the finale data frame (the tidy dataset). The tidy dataset contains 30(users) * 6 (activities) * 66 (mean and std variables)= 11880 records.

codebook.txt describes the features in the tidy dataset. The original data description and datasets can be found here : https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages