LeBonScrap is a spider which collect data from Leboncoin.fr, a french portal for selling new and second hand goods throughout the whole country.
The spider will crawl all the pagination links to scrap every ads of the list from one search result of the real-estate category.
To extract the data, LeBonScrap uses the open source and collaborative framework Scrapy.
To download the script, type the code below in a shell :
git clone git@github.com:wbwlkr/lebonscrap.git
Run the lebonscrap.py spider using the runspider command:
scrapy runspider lebonscrap.py -o data.json
For each ads,the data related to the following columns will be written in a json file or csv:
'Url':
'Titre'
'Prix'
'Surface'
'GES'
'Classe énergie'
'Auteur'
'Téléphone'
'Remarques'
- Python3
- Scrapy==1.4.0
This project is licensed under the MIT License - see the LICENSE.md file for details.