Skip to content

This repository contains scripts for analyzing and mining GitHub repositories. The scripts are used in the context of an MSR study that examines developer's working schedule changes during the decades. The scripts utilize the GitHub API for retrieving repository information and performing analyses.

License

Notifications You must be signed in to change notification settings

vtalos/historical-repos-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 

Repository files navigation

Historical Repos Mining

This repository contains scripts for analyzing and mining GitHub repositories. The scripts are used in the context of an MSR study that examines developer's working schedule changes during the decades. The scripts utilize the GitHub API for retrieving repository information and performing analyses.

Scripts

1. repo_privacy_checker.py

This script allows users to check whether a given GitHub repository is private or public by providing the owner name, repository name, and a GitHub personal access token. It utilizes the GitHub API to retrieve repository information and provides clear output indicating the privacy status of the repository.

Usage:

python repo_privacy_checker.py <owner> <repo_name> <gh_token>

2. historic_repo_picker.py

This script processes a CSV file containing repository information from the GitHub Search mining tool. It filters repositories based on the timestamp of the first commit, and writes the repositories that meet the criteria to a text file. The script aims to identify repositories that were created before 2005.

Usage:

python historic_repo_picker.py <file_input> <file_output> <gh_token>

3. repo_sampler.py

This is the basic script for the analysis. The script samples repositories based on activity criteria and existence duration. The script retrieves repositories from a CSV file generated from the GitHub Search mining tool and identifies repositories that fit the study's requirements. The output contains repositories that need to be sampled for further analysis.

Usage:

python repo_sampler.py <file_input> <gh_token>

Data

The data directory contains CSV and TXT files used by the scripts for input and output. The ghs_results.csv file is the input CSV file from the GitHub Search mining tool, and the script_results.txt file is the output text file generated by the ghs_repo_filter.py script.

Getting Started

To get started with using the scripts in this repository, follow these steps:

  1. Clone this repository to your local machine:
git clone https://github.com/vtalos/historical-repos-mining.git
  1. Execute the desired script with the appropriate arguments as described in the script's documentation.

Contributing

Contributions to this project are welcome! If you have any suggestions, feature requests, or bug reports, please open an issue or submit a pull request.

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

About

This repository contains scripts for analyzing and mining GitHub repositories. The scripts are used in the context of an MSR study that examines developer's working schedule changes during the decades. The scripts utilize the GitHub API for retrieving repository information and performing analyses.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages