- Javad Rahimipour Anaraki
- Faramarz Dorani
- Henry Luan
- Machel Rayner
This document walks you through the all the required steps to extract, clean and process the generated data sourcing from Ethica, GENEActive (Original), Apple Watch (S2), Fitbit (Charge 2), and SenseDoc.
There are six (four + one) sub-folders named as follows under a folder called data:
- data
- Ethica
- GENEActive
- HealthData (for both Apple Watch and Fitbit)
- SenseDoc
- Merged
The Merged folder is the one which keeps participant specific merged file, a file containing all participants data with NAs, a file containing all participants data with no NAs in class labels and a file containing all participants data with no NAs at all. The data folder also contains a file called intervals.csv which stores information about each participants as follows:
herox
stores participants alias nameskit
is the package numberphone
is phone idwatch
is Apple watch idfitbit
is Fitbit idgeneactiv
is GENEActiv idsnesedoc
is SenseDoc idstart
is start date and time (in ####-##-## ##:##:## format)end
is end date and time (in ####-##-## ##:##:## format)userid
is a unique id for each participantwrist
is 1 if Apple Watch and GENEActiv are on the same wrist, otherwise 0age
,gender
,weight
andheight
are demographic datastreet
,city
,postal
are address information for each participant.
Note: This file should be kept updated throughout the experiment.
Each device's folder should contain a folder named with userid
(e.g. 301), and all the extracted data from each device should be stored in each device's folder under userid
, accordingly (see below example).
- data
- Ethica
- 301
- abc.csv
- 302
- def.csv
- ...
- 301
- GENEActive
- 301
- 123.csv
- 302
- 456.csv
- ...
- 301
- ...
- Ethica
Each device's folder contains an R program to clean and prepare the collected data for the main processing step. These programs are shown in the following:
- EthicaDataPrep.R
- GENEActivDataPrep.R
- HealthDataPrep.R
- SenseDocDataPrep.R
The whole data processing is divided into four steps as follows:
- Generating labels
- Preparing and labeling data
- Merging all data files
- Classifying the results
Each step is explained in detail in the following.
After storing all the extracted data files of each device for a participant, open GENEActivDataPrep.R program in R, update the timezone
, path
, intrPath
, and run the program. The results will be stored as an CSV file under each participant id's folder ending with _labeled.csv, containing cleaned and labeled GENEActiv data.
Info: More information on setting the timezone
can be found here.
The labels are ready and we can run the following programs after updating timezone
and path
variables for each one.
- EthicaDataPrep.R
- HealthDataPrep.R
- SenseDocDataPrep.R
Under Merged folder open merger.R, update timezone
and run it to create a file called finalData.csv under each participant folder. The generated file contains all the processed data files for each device in second-level from start
to end
indicated in intervals.csv. The mergeAllMerged.R creates three files under Merged folder, called mergedData.csv, mergedDataNoNAClass.csv and mergedDataNoNA.csv which contains all participants data with NAs, all participants data with no NAs in class labels and all participants data with no NAs at all, respectively.
Weka is a ready-to-use machine learning package which has been employed to apply a set of classification methods on the resulting dataset (i.e. mergedDataNoNA.csv). Based on the desired setup of data processing, a subset of features/attributes/columns of mergedDataNoNA.csv should be selected and stored in a new CSV file (e.g. subMergedDataNoNA.csv). For this experiment, we used Weka version 3.6.15.
We used Rotation Forest to apply principal component analysis (PCA) and decision tree (J48) classifier to the data using Explorer.
To do so, open Weka and click on Explorer, click on Open file... and browse for subMergedDataNoNA.csv.
Then click on Classify tab and choose Rotation Forest under classifier>meta category.
In the Test Options choose Percentage split and modify the percentage to 70%. This way the data is divided into 70% for training and 30% for testing and validation. Click on start button and the results will be shown in Classifier output window as follows.