Skip to content

🌟 AI-powered tool to analyze GitHub stargazers, identify companies, and evaluate them as potential customers for AI scraping infrastructure

License

Notifications You must be signed in to change notification settings

ScrapeGraphAI/ScrapeHubAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ScrapeHubAI

πŸš€ Powered by ScrapeGraphAI - The most advanced AI-powered web scraping API

An open-source LangGraph-based agent that analyzes GitHub stargazers, traces them to their companies, and evaluates these companies as potential sales targets for AI scraping infrastructure.

🌟 Features

  • GitHub Stargazer Analysis: Fetches and analyzes users who starred a repository
  • Company Identification: Traces GitHub users to their affiliated companies
  • Intelligent Evaluation: Scores companies based on size, industry, and technology fit
  • Web Scraping: Uses ScrapeGraphAI API to gather additional company information
  • Beautiful UI: Streamlit-based interface for easy interaction
  • Export Results: Download analysis results as CSV for further processing

πŸš€ Quick Start

Prerequisites

  • Python 3.10 or higher
  • API keys for GitHub, OpenRouter, and ScrapeGraphAI

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/ScrapeHubAI.git
cd ScrapeHubAI
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure API keys:

Getting a GitHub Personal Access Token

  1. Go to GitHub β†’ Settings β†’ Developer settings β†’ Personal access tokens β†’ Tokens (classic)
  2. Click "Generate new token" β†’ "Generate new token (classic)"
  3. Give your token a descriptive name (e.g., "ScrapeHub")
  4. Select the following scopes:
    • public_repo (required for reading public repositories)
    • read:user (optional, for better user information)
    • read:org (optional, for organization information)
  5. Click "Generate token" at the bottom
  6. Important: Copy your token immediately - you won't be able to see it again!

Setting up your .env file

Create a .env file in the project root:

GITHUB_TOKEN=your_github_personal_access_token
OPENROUTER_API_KEY=your_openrouter_api_key
OPENROUTER_API_BASE=https://openrouter.ai/api/v1
SGAI_API_KEY=your_scrapegraphai_api_key

Or copy from the example:

cp .env.example .env
# Then edit .env with your actual API keys

Getting Other API Keys

  1. Run the application:
streamlit run src/app.py

πŸ“Š How It Works

  1. Fetch Stargazers: The agent retrieves users who starred the specified GitHub repository
  2. Trace to Companies: Identifies companies through user profiles and organization memberships
  3. Gather Intelligence: Uses ScrapeGraphAI to scrape additional company information
  4. Evaluate & Rank: Scores companies based on multiple criteria:
    • Technology relevance (AI, ML, data analytics, scraping)
    • Industry fit (e-commerce, SaaS, fintech)
    • Company size and growth indicators
    • Explicit data processing needs

🎯 Use Cases

  • Sales Intelligence: Identify potential customers for AI/data infrastructure products
  • Market Research: Understand which companies are interested in specific technologies
  • Partnership Discovery: Find companies with complementary technology needs
  • Competitive Analysis: See which companies are following competitor repositories

πŸ› οΈ Architecture

  • LangGraph: Orchestrates the multi-step analysis workflow
  • OpenRouter: Provides LLM capabilities for intelligent evaluation
  • ScrapeGraphAI: Powers web scraping for company information
  • Streamlit: Creates an intuitive user interface

πŸ“ Project Structure

github-star-to-company-agent/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agent.py          # LangGraph workflow definition
β”‚   β”œβ”€β”€ tools.py          # GitHub and scraping tools
β”‚   β”œβ”€β”€ evaluator.py      # Company scoring logic
β”‚   └── app.py            # Streamlit UI
β”œβ”€β”€ tests/
β”‚   └── test_agent.py     # Unit tests
β”œβ”€β”€ docs/
β”‚   └── usage.md          # Detailed usage guide
β”œβ”€β”€ .env                  # API keys (create this)
β”œβ”€β”€ requirements.txt      # Python dependencies
└── README.md            # This file

πŸ§ͺ Testing

Run the test suite:

python -m pytest tests/

πŸ”§ Configuration

Environment Variables

  • GITHUB_TOKEN: Personal access token for GitHub API
  • OPENROUTER_API_KEY: API key for OpenRouter LLM service
  • OPENROUTER_API_BASE: OpenRouter API endpoint (default: https://openrouter.ai/api/v1)
  • SGAI_API_KEY: API key for ScrapeGraphAI service

Advanced Settings

Customize analysis parameters through the Streamlit UI:

  • Maximum number of companies to display
  • Minimum score threshold for filtering
  • Analysis depth and timeout settings

βš–οΈ Ethical Considerations

  • Respect Rate Limits: The agent implements automatic rate limiting
  • Privacy: Only analyzes publicly available information
  • Terms of Service: Ensure compliance with all platform ToS
  • Responsible Use: Designed for legitimate business intelligence only

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“ž Support

For issues, questions, or suggestions:

  1. Check the usage guide
  2. Open an issue on GitHub
  3. Review existing issues for solutions

Made with ❀️ from ScrapeGraphAI team

About

🌟 AI-powered tool to analyze GitHub stargazers, identify companies, and evaluate them as potential customers for AI scraping infrastructure

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages