This repository contains scripts for analyzing and mining GitHub repositories. The scripts are used in the context of an MSR study that examines developer's working schedule changes during the decades. The scripts utilize the GitHub API for retrieving repository information and performing analyses.
This script allows users to check whether a given GitHub repository is private or public by providing the owner name, repository name, and a GitHub personal access token. It utilizes the GitHub API to retrieve repository information and provides clear output indicating the privacy status of the repository.
python repo_privacy_checker.py <owner> <repo_name> <gh_token>
This script processes a CSV file containing repository information from the GitHub Search mining tool. It filters repositories based on the timestamp of the first commit, and writes the repositories that meet the criteria to a text file. The script aims to identify repositories that were created before 2005.
python historic_repo_picker.py <file_input> <file_output> <gh_token>
This is the basic script for the analysis. The script samples repositories based on activity criteria and existence duration. The script retrieves repositories from a CSV file generated from the GitHub Search mining tool and identifies repositories that fit the study's requirements. The output contains repositories that need to be sampled for further analysis.
python repo_sampler.py <file_input> <gh_token>
The data directory contains CSV and TXT files used by the scripts for input and output. The ghs_results.csv file is the input CSV file from the GitHub Search mining tool, and the script_results.txt file is the output text file generated by the ghs_repo_filter.py script.
To get started with using the scripts in this repository, follow these steps:
- Clone this repository to your local machine:
git clone https://github.com/vtalos/historical-repos-mining.git
- Execute the desired script with the appropriate arguments as described in the script's documentation.
Contributions to this project are welcome! If you have any suggestions, feature requests, or bug reports, please open an issue or submit a pull request.
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.