Skip to content

A sample Scrapy project with pagination, item loader, pipelines...

Notifications You must be signed in to change notification settings

luccafonte/bookscraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

bookscraper

This is a Scrapy project to scrape information about books at http://books.toscrape.com/

This project is only meant for educational purposes.

Extracted data

This project extracts all data including title, price, product type etc...
A sample item:


{
    'title': 'A Light in the Attic',
    'upc': '£51.77',
    'product_type': 'Books',
    'price': '£51.77',
    'tax': '£0.00',
    'stock': 'In stock (22 available)',
    'reviews': '0',
    'rating': '3'
}

Spiders

This project contains two spiders: bookscraper-css and bookscraper-xpath. Both work the same way the first one is implemented with Css selectors the other one is implemented with xpath.

You can learn more about web scraping with Scrapy by going through the original Scrapy Tutorial or Scrapy Tutorial Series on ScrapingAuthority.com.

Pipelines

This project contains four pipelines. One processes the "rating" field. The second one filters out books that have a stock number of more than five. The other two pipelines are meant to show you how to create json and csv files from the scraped data. You can disable pipelines in settings.py.

Running the spiders

You can run a spider using the scrapy crawl command:


$ scrapy crawl bookscraper-css
$ scrapy crawl bookscraper-xpath

About

A sample Scrapy project with pagination, item loader, pipelines...

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%