This project involves creating a program that interacts with the GitHub API using OAuth authentication to fetch data about repositories and their owners, and store the normalized and deduplicated data in a Postgres database. The program handles network failures and other errors with proper error handling and retries, and log all relevant information to aid in debugging. The data is normalized before being stored in the database, and duplicates are checked and updated instead of creating new records. The program is able to fetch both public and private repositories of a user, providing a dynamic and versatile solution for managing GitHub repository data.
- Python 3.11 installed on the system. Get Python.
- Access to a Postgres database instance.
- Docker & Docker Compose (Optional).
- Clone the repository using the git clone command.
https://github.com/Tanmay000009/GitHubRepoMirror
- Navigate to the project directory.
cd GitHubRepoMirror
- Create a new virtual environment using the venv module. Run the command:
python3 -m venv <virtual-env name>
- Activate the virtual environment using the source command. Run the command:
source <virtual-env name>/bin/activate
Note: If you are using Windows, the command to activate the virtual environment will be slightly different. You can run the command
.\env\Scripts\activate
- Install the required packages using the pip install command and the requirements.txt file. Run the command:
pip install -r requirements.txt
-
Create a GitHub account (if you don't already have one). Create Github Account.
-
Register a new OAuth application on GitHub by going to
Settings -> Developer settings -> OAuth Apps -> New OAuth App
.
Click here -
Enter the following information for the application:
- Application name: Choose a name for your application.
- Homepage URL: Set this to
http://127.0.0.1:8000/
or whatever your local server address is. - Application description: Optionally, enter a description for your application.
- Authorization callback URL: Set this to
http://127.0.0.1:8000/social-auth/complete/github
.
-
After creating the application, you will be provided with a Client ID and a Client Secret. Keep these values safe, as you will need them later to configure the Django application.
-
Copy the values from .env.sample to a new .env file in your project's root directory.
-
Replace the placeholder values with your actual database credentials and GitHub OAuth application credentials.
SECRET_KEY
: Generate a new Django secret key.DB_ENGINE
: The database engine you are using (e.g.django.db.backends.postgresql
).DB_NAME
: The name of the database.DB_USERNAME
: The username for accessing the database.DB_PASSWORD
: The password for accessing the database.DB_HOST
: The hostname for the database.DB_PORT
: The port number for the database.GITHUB_CLIENT_ID
: The Client ID provided by GitHub for your OAuth application.GITHUB_CLIENT_SECRET
: The Client Secret provided by GitHub for your OAuth application.
- Run the development server using the command
python manage.py runserver
This will start the server on the default port 8000.
2. Open your web browser and navigate to http://127.0.0.1:8000/ to see your Django project up and running.
Note: Make sure the database instance is accessible through Docker container.
- Build a docker image for the container.
docker build -t <app-name> .
- Run the docker image.
docker run -it -p 8000:8000 <app-name>
- Compose the docker image for the container.
docker-compose up
Tanmay Vyas