Welcome to the Hotel Review Sentiment Analysis repository! This project involves training Machine Learning models using a dataset of over 300 authentic hotel reviews to predict overall ratings and generate insightful visualizations.
This repository is structured to support ease of understanding, scalability, and modularity. It includes datasets, trained models, and a Jinja2-based visualization application.
Models trained have an astonising accuracy of around 99% 🤯
See these models getting implemented in real world at : Rate My Hotel
Now, you can get insight on the performance your trained model with Automatic Data Visualization with zero efforts.
Other then just training hotel review models, you also get an automatic detailed review generation and user sentiment analysis to further support the model rating.
Forget boring training screens with a colorful terminal output 😆
-
data
Contains datasets used for training and testing the ML models. The data is clean, structured, and ready for model consumption.- Purpose: To store and organize the data in CSV or other relevant formats.
-
models
Includes trained models along with their corresponding feature files (.pkl
) for consistency and reuse.- Purpose: To save pre-trained ML models for reuse and evaluation without retraining.
-
output
Houses HTML files generated via the Jinja2 application for visualizing the model's predictions.- Purpose: To provide a user-friendly, well-formatted summary of the results.
-
template
Contains Jinja2 templates used to generate HTML files in theoutput
folder.- Purpose: To ensure separation of content (data) and presentation (HTML formatting).
filtering.py
: Prepares and filters raw data for model input.processing.py
: Handles data transformation, normalization, and scaling.analytics.py
: Generates analytical insights, including metrics like accuracy and precision.reviews.py
: Generates detailed reviews (long or short) based on predictions.output.py
: Saves or displays the output of the model predictions.testing.py
: Facilitates model testing and evaluation.train_model.py
: Contains code for training the ML models and saving them for later use.
- Trained Models: Predict hotel ratings using highly accurate ML models.
- Data Visualization: Leverages Jinja2 templates to create visually appealing HTML reports.
- Modular Codebase: Well-organized scripts for data preprocessing, model training, and output visualization.
- Reusable Models: Pickle files for models and features to save retraining time.
- User-Friendly Output: Beautifully formatted HTML files to showcase model predictions.
- Python: Core programming language.
- Scikit-Learn: For machine learning model development.
- Jinja2: For templating HTML visualizations.
- Pandas & NumPy: For data handling and processing.
- Joblib: For saving and loading trained models.
- Textblob: For sentiment analysis
- Python 3.11+
- Libraries: Install dependencies from
req.txt
:pip install -r req.txt
- Train the Model: Use
training.py
to train the model with data from thedata
folder. - Test the Model: Run
testing.py
to evaluate the model and generate reviews. - Visualize Results: Check the
output
folder for beautifully formatted HTML files showing predictions.
- Place your dataset in the
data
folder. - Train your model using:
python train_model.py
- Test the model with:
python testing.py
- Open the generated HTML files in the
output
folder for a detailed visualization of the predictions.
- Accuracy: The model achieves a current accuracy of 74.48%, with room for improvement.
- Scalability: Easily extendable for larger datasets or additional features.
- Visualization: Simplifies decision-making with formatted HTML reports.
- Improve model accuracy using advanced algorithms.
- Add real-time predictions via a web or desktop app.
- Support multilingual reviews.
- Integrate more visualizations for better insights.
Feel free to submit issues or pull requests! Contributions are welcome to improve the repository further.