Skip to content
Harshdeep edited this page Jan 12, 2020 · 4 revisions

In case one has issues working with the repository, please go through the following documents:

  1. Setting up the environment - This page lets you go step by step and set up the environment in which one can run this project/repository.

  2. Running the scripts - This page gives you the configuration and commands to run the required scripts on Spark Cluster. The current scripts have been run on Spark 1.6 with Hadoop 2.3.

  3. Generating the dataset - This page lets you go script by script in ordered sequence as to which script does what which helps you to generate the dataset.

  4. Extracting features of the dataset - This page lets you go script by script in ordered sequence as to which script does what which helps you to extract the features.

Once the dataset is generated and features are extracted, you can execute the models such as Random Forest and Neural Networks which are in the notebooks/ path.