This project is a web-based tool to scrape and filter books from Goodreads. It allows users to:
- Scrape book data from different genres on Goodreads.
- Apply filters like minimum number of ratings and genre-based selection.
- View scraped books with details like title, author, number of ratings, and genre. Sorted by avg ratings.
The backend scrapes Goodreads pages and stores book information in a JSON Lines file. The frontend allows users to filter and view the books using a simple web interface.
- Scraping: Scrape books from a specified genre on Goodreads (up to 25 pages).
- Filtering: Filter books based on genres and minimum ratings.
- Progress Bar: Real-time progress bar during scraping.
Filter feature is available on https://kardeepakkumar.github.io/goodreads-advanced-search
Click the image below to watch the demo video:
- Docker/Python
git clone https://github.com/kardeepakkumar/goodreads-advanced-search.git
cd goodreads-advanced-search
Go to a goodreads page on your browser, press F12 and copy cookie data. Store copied cookie data in goodreads-advanced-search/cookie.txt.
Use one of these two methods to run the app locally
Make sure docker is installed on your machine.
docker build -t goodreads-scraper .
docker run -p 5000:5000 -v $(pwd)/books_raw.jl:/app/books_raw.jl -v $(pwd)/cookie.txt:/app/cookie.txt goodreads-scraper
This will mount the books_raw.jl file and the cookie.txt file, allowing the scraper to store data and use the cookies for authentication.
Create a venv optionally
pip install -r requirements.txt
python app.py
Once the app is running, open your browser and go to http://localhost:5000.
The books_raw.jl file in the repo already has ~20k books metadata with it. Scraping more genres will automatically add to the local version of this file for you.
Use the filter options to narrow down the displayed books based on genres and minimum ratings.
- Choose a genre (e.g., Biography) and press "Scrape".
- The app will start scraping books from that genre.
- A progress bar will be shown and updated during the scraping process.
This project is licensed under the MIT License. See the LICENSE file for more information.
Important Notice: This project scrapes data from Goodreads. While the data is used solely for personal or non-commercial purposes, it is important to acknowledge Goodreads' Terms of Service. Please do not use the scraped data for commercial purposes without prior consent from Goodreads.