Skip to content

A regression model based on LightGBM to predict the rating by parsing the review using NLTK

Notifications You must be signed in to change notification settings

sanyuzhang/TripAdvisorReviewRegressor

Repository files navigation

TripAdvisorReviewRegressor

A LightGBM model based regressor which can predict the overall rating of a hotel by parsing its reviews using NLTK

Env Setup

  1. Using Python3. All the commands are using python3.

  2. If you have both python2 and python3 env, do:

    • Replace python to python3
    • Replace pip to pip3
  3. Install all the packages listed in the requirements.txt

    • pip install -r requirements.txt

How to Run

  1. Type the command below.
    • python reviews_classifier.py
      • Train the processed data with features extracted, which takes about 10+ seconds.
    • python reviews_classifier.py full
      • Run the full code including feature extractions using NLTK, which takes 10+ minutes.

Folders

  1. All the raw data are under the data/
  2. processed_data.csv are data of features extracted from raw data using NLTK.
  3. prediction.csv are a table of true ratings and our predicted ratings.

About

A regression model based on LightGBM to predict the rating by parsing the review using NLTK

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages