This is a book recommendation system based on the book rating data from GoodReads_100k dataset. The dataset contains 100k book.
recommendation_data_cleaning.ipynb
is used to clean the data. The data is cleaned by removing the books with less than 50 ratings and users with less than 50 ratings. After running .ipynb
file, It works TF-IDF Vectorizer and Cosine Similarity to find the similarity between books. The model is saved as cosine_sim_desc.pkl
in model folder and final_data.csv
also in model folder it contains the data after cleaning (25151 Books).
app.py
is used to run the web app. The web app is created using Streamlit.
- Python
- Pandas
- Numpy
- Scikit-learn
- Streamlit
- Clone the repository
- Install the requirements using
pip install -r requirements.txt
- Download the dataset from GoodReads_100k and place it in the dataset folder.
- Run
recommendation_data_cleaning.ipynb
to clean the data and train the model. - Run
app.py
usingstreamlit run app.py
It will open the web app in the browser.