A LightGBM model based regressor which can predict the overall rating of a hotel by parsing its reviews using NLTK
Env Setup
-
Using Python3. All the commands are using python3.
-
If you have both python2 and python3 env, do:
- Replace python to python3
- Replace pip to pip3
-
Install all the packages listed in the requirements.txt
pip install -r requirements.txt
How to Run
- Type the command below.
python reviews_classifier.py
- Train the processed data with features extracted, which takes about 10+ seconds.
python reviews_classifier.py full
- Run the full code including feature extractions using NLTK, which takes 10+ minutes.
Folders
- All the raw data are under the
data/
processed_data.csv
are data of features extracted from raw data using NLTK.prediction.csv
are a table of true ratings and our predicted ratings.