Crawling GitHub Trending Pages every day.
The program is highly recommend to be deployed on a Linux server, which can crawl information about popular repositories of languages you are interested in on GitHub every day. Then it will create a markdown file to record those information and generate a wordcloud image according to repositories' descriptions.
This crawler is designed to help me keep track of the latest trends in technology and discover some new and interesting repositories. In fact, reading the newest markdown file has become a part of my daily routines. More importantly, it increases contributions of GitHub :P
The idea was inspired by LJ147.
- python 3.6+
- git
- screen
- unzip
-
Fork my repo or create your own repo for uploading markdown files.
-
If you don't have ssh keys, generating a new SSH key and adding it to the ssh-agent.
$ sudo apt install -y unzip screen python3-pip
$ sudo apt-get install -y python-tk python3-tk
# the `release` branch is stable, and there is only code.
$ wget https://github.com/fgksgf/GitHub-Trending-Crawler/archive/release.zip
$ unzip release.zip
$ cd GitHub-Trending-Crawler-release/
$ mkdir img
$ git init
$ git remote add origin <YourGitHubRepoURL>
# using virtual environment is highly recommended
$ pip3 install -r requirements.txt
-
Switch to the repository directory and just type
screen
at the command prompt. Then the screen will show with interface exactly as the command prompt. -
When you enter the screen, you can do all your work as you are in the normal CLI environment. But since the screen is an application, so it have command or parameters.
-
And now, we can run the program:
python3 main.py -p -l
-
While the program is running, you can press
Ctrl + A
andd
to detach the screen. Then you can disconnect your SSH session. -
When you want to check the status of the crawler, just reconnect to your server via ssh. Then use this command
screen -r
to restore the screen. For more information aboutscreen
command, you can visit here.
python3 main.py (-h | --help)
python3 main.py (-v | --version)
python3 main.py [-l | --loop] [-p | --push] [--frequency=<f>]
Options:
-h --help Show this screen.
-v --version Show version.
-l --loop Run this program cyclically.
-p --push Use git to push the markdown and the image.
--frequency=<f> The frequency of crawling [default: daily].
- Refactor code with object-oriented methods
- Split single python file into several files
- Improve exception handling
- Add logging feature
- Use
docopt
to enhance command-line usage - Update requirements