this is a crawler for lagou's company and job informations
- Nodejs v4.5.0
- Redis server v=3.2.3
- MongoDB v3.2.6
- There are 2 tasks to run. (1) Crawler to fetch informations and save them. (2) Proxy fetcher to fetch thousands of proxies on internet for free, you can make a proxy pool in reids for crawler.
- You can run this crawler in multiple servers for fetching informations and save all in one storage server. As above, you can get all what you want more quickly.
- About 12k jobs and 8k companies have been fetched with this crawler.
- First, run proxy fetcher, node proxyfetcher.js -tl
- Second, run crawler, node tasks.js -jc -r 5000,6000 >> lagou.log