SStuBs Miner

Data mining and analysis for the ManySStuBs4J dataset being used in the MSR 2021 Data Mining Challenge.

Program Execution

Running the program requires two things:

A tokens file, containing one or more personal access tokens for authenticating with GitHub. This must be located in the data directory.
The ManySStuBs4J dataset, named sstubs.json. This must be located in the data directory.

The code can then be executed simply by running python Miner.py or python Analyser.py on the command line

This program was built on a custom setup of Arch Linux, and has not been tested on other operating systems.
The code is only designed to retrieve data related to our needs, but is extensible and can be adapted for different data.
A single personal access token can only send 5000 requests to GitHub per hour, limiting the rate at which data is mined.
Multiple tokens can be placed in the token file, and the program will automatically choose the best one throughout execution (based on remaining requests and reset time).
Data mining takes an extremely long time (roughly 1 hour per 800 entries).

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
sstubs_miner		sstubs_miner
.gitignore		.gitignore
Analyser.py		Analyser.py
LICENSE		LICENSE
Miner.py		Miner.py
README.md		README.md