GitHub - nhat2008/vietnam-ecommerce-crawler: Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs

Project for crawling data from lazada, websosanh, compare.vn, cdiscount and cungmua with many cooling wrappers

1. good structure for scrapy with items and pipelines
2. automatically proxy changing
3. simply running - don't need to remember the command to run scrapy
4. flexible config- the crawler gets data by patterns in template/product.yml
5. save data to databases: mongo or es
6. applying pybloom for checking duplicate crawled data when crawling
7. stopping after time -

Install requirements.txt

$python app.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scrapy_service		scrapy_service
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project for crawling data from lazada, websosanh, compare.vn, cdiscount and cungmua with many cooling wrappers

Install requirements.txt

About

Releases

Packages

Languages

nhat2008/vietnam-ecommerce-crawler

Folders and files

Latest commit

History

Repository files navigation

Project for crawling data from lazada, websosanh, compare.vn, cdiscount and cungmua with many cooling wrappers

Install requirements.txt

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages