A dynamic configurable proxy IP crawler based on Scrapy. It makes it easy to crawl hundreds of thousands of proxy IPs in a short time. By maintaining a spider code and a few groups of website data extraction rules you can easily grab lots of proxy IPs of these sites. See the blogs for more detail.
For more details see requirements.txt
- Scrapy 1.2.1
- MySQL-python 1.2.5
- Redis 2.10.5
- SQLAlchemy 1.1.4
CentOS
$ sudo yum install python-devel
$ sudo yum install gcc libffi-devel openssl-devel
$ pip install scrapy
$ pip install SQLAlchemy
$ pip install redis