-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ajax加载页面的爬虫问题 #1
Comments
想到的办法
|
scrapy提供了一个专门的scrapy-splash |
mac下docker安装方法,
实际上仍旧是虚拟机跑boot2docker |
即使加载时候去掉图片,但因为页面内容太多,没做剪枝的话大概10秒才能完成整个爬虫,太久了 |
加载入视频页面后,应要做
|
b站html5播放器地址 |
视频流量走的是ChinaNetCenter(网宿)的cdn,获取视频有几种思路
|
下午看了看查看js源码,官网已经将播放器托管至第三方
|
偷懒了,扔到服务器全站爬了 |
随便网上抓的代理ip基本都挂了。。。 |
数据量大概有500w-600w,最高效合理的选择是代理委托第三方,但这只是我随意写的项目,以后工作若有类似情况就直接买了,现在尽量抓取 |
网站是ajax加载页面,页面返回200时实际上仍在异步加载,爬虫并不能获取完整页面
The text was updated successfully, but these errors were encountered: