Routers is a collection of web-crawlers for various popular technology news sources.
It exposes a command-line interface to these crawlers, allowing for the distinguishing tech-news enthusiast to avoid leaving the comfort of their terminal.
It Currently Supports:
Technology News Sources
- Ars Technica
- Wired.com
Major Technology Blogs
- TechCrunch
- Mashable
- Gizmodo
Personal Technology Blogs
- Codes From The Underground, my blog
Mainstream News Sources
- New York Times
- USA Today
- L.A. Times
Other Random Stuff
- Github
- The Oatmeal
- xkcd
(this categorization is loose, please feel free to shuffle stuff around.)
It is my hope that, by open-sourcing a collection of news scrapers, a community can be built around building a powerful set of real-time news aggregation tools.
npm install routers-news -g
Listing News Sources
routers-news --sources
Outputs
Routers News Sources:
news:
major:
NewYorkTimes: The New York Times Bits blog.
LATimes: The business and culture of our digital lives, from the L.A. Times.
USAToday: Power up with breaking news on personal technology, electronics, gaming and computers.
tech:
Wired.com: Wired magazine is a monthly US technology publication.
ArsTechnica: Ars Technica is a technology news site catering to PC enthusiasts.
TechCrunch: A network of technology-oriented blogs and other web properties.
other:
Github: Trending and featured repos on Github.com
Displaying Headlines
routers-news --source=github
Outputs
[1] MacLemon / CongressChecklist
https://github.com/MacLemon/CongressChecklist
[2] dejan / rails_panel
https://github.com/dejan/rails_panel
[3] feross / md5-password-cracker.js
https://github.com/feross/md5-password-cracker.js
[4] shadowsocks / shadowsocks-go
https://github.com/shadowsocks/shadowsocks-go
[5] bcoe / routers-news
https://github.com/bcoe/routers-news
[6] andrew / 24pullrequests
https://github.com/andrew/24pullrequests
[7] nkohari / jwalk
https://github.com/nkohari/jwalk
[8] lockitron / selfstarter
https://github.com/lockitron/selfstarter
[9] twitter / bower
https://github.com/twitter/bower
[10] Spaceman-Labs / SMPageControl
https://github.com/Spaceman-Labs/SMPageControl
Loading Articles
routers-news --source=github --article=5
Outputs:
bcoe / routers-news:
A crawler for various popular tech news sources. Read technology news from the comfort of your CLI.
— Read more
---------
https://github.com/bcoe/routers-news
The news crawlers used by Routers come in two varieties:
- Page scrapers which use CSS selectors to extract content from news sources.
- RSS/Atom feed parsers, which crawl articles using an RSS or Atom news feed.
Examples of both can be found in the lib/sources directory.
It's easy to add a new news source:
- fork the routers news repo.
- clone it locally.
- run npm install to install the libraries locally.
- create a new crawler in the lib/sources directory (everything in this hierarchy is automatically loaded).
- to test your crawler run: node ./bin/routers-news.js.
You can also help a ton by:
- reporting when crawlers are broken.
- extending on the crawelrs, I'd love to have:
- Dates.
- Authors.
- Better image extraction.
- improving on the CLI client.
Help make our dreams of a collaborative web-crawler a reality :)
Copyright (c) 2012 Benjamin Coe and Joshua Hull and Gabriel Silk. See LICENSE.txt for further details.