bookscraper

This is a Scrapy project to scrape information about books at http://books.toscrape.com/

This project is only meant for educational purposes.

Extracted data

This project extracts all data including title, price, product type etc...
A sample item:


{
    'title': 'A Light in the Attic',
    'upc': '£51.77',
    'product_type': 'Books',
    'price': '£51.77',
    'tax': '£0.00',
    'stock': 'In stock (22 available)',
    'reviews': '0',
    'rating': '3'
}

Spiders

This project contains two spiders: bookscraper-css and bookscraper-xpath. Both work the same way the first one is implemented with Css selectors the other one is implemented with xpath.

You can learn more about web scraping with Scrapy by going through the original Scrapy Tutorial or Scrapy Tutorial Series on ScrapingAuthority.com.

Pipelines

This project contains four pipelines. One processes the "rating" field. The second one filters out books that have a stock number of more than five. The other two pipelines are meant to show you how to create json and csv files from the scraped data. You can disable pipelines in settings.py.

Running the spiders

You can run a spider using the scrapy crawl command:


$ scrapy crawl bookscraper-css
$ scrapy crawl bookscraper-xpath

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bookscraper		bookscraper
.gitignore		.gitignore
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bookscraper

Extracted data

Spiders

Pipelines

Running the spiders

About

Releases

Packages

Languages

luccafonte/bookscraper

Folders and files

Latest commit

History

Repository files navigation

bookscraper

Extracted data

Spiders

Pipelines

Running the spiders

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages