SZU Spider

a spider for szu news
深圳大学校内搜索引擎爬虫模块源码，爬取公文通，保存为Json格式的本地文本
索引模块及查询模块源码见 ins

定时脚本

# 每隔一小时抓取 szu 公文通信息（最近7天）
30 * * * * source ~/.bash_profile; cd /home/wwwroot/default/szuspider && /usr/local/bin/scrapy crawl szu -s LOG_FILE=/tmp/szunews/logs/logs-crawl-szu-news/crawl.log.`/bin/date "+\%Y-\%m-\%d~\%H:\%M:\%S"`

# 每隔一个小时更新索引
00 * * * * source ~/.bash_profile; curl http://localhost:8080/ins/createIndex > /tmp/szunews/logs/logs-create-index/`/bin/date "+\%Y-\%m-\%d~\%H:\%M:\%S"`.log

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
szuspider		szuspider
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SZU Spider

定时脚本

捐赠

About

Releases

Packages

Languages

License

StrickYan/szuspider

Folders and files

Latest commit

History

Repository files navigation

SZU Spider

定时脚本

捐赠

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages