This is a Scrapy project to scrape information about books at http://books.toscrape.com/
This project is only meant for educational purposes.
This project extracts all data including title, price, product type etc...
A sample item:
{
'title': 'A Light in the Attic',
'upc': '£51.77',
'product_type': 'Books',
'price': '£51.77',
'tax': '£0.00',
'stock': 'In stock (22 available)',
'reviews': '0',
'rating': '3'
}
This project contains two spiders: bookscraper-css and bookscraper-xpath. Both work the same way the first one is implemented with Css selectors the other one is implemented with xpath.
You can learn more about web scraping with Scrapy by going through the original Scrapy Tutorial or Scrapy Tutorial Series on ScrapingAuthority.com.
This project contains four pipelines. One processes the "rating" field. The second one filters out books that have a stock number of more than five. The other two pipelines are meant to show you how to create json and csv files from the scraped data. You can disable pipelines in settings.py.You can run a spider using the scrapy crawl command:
$ scrapy crawl bookscraper-css
$ scrapy crawl bookscraper-xpath