π Powered by ScrapeGraphAI - The most advanced AI-powered web scraping API
An open-source LangGraph-based agent that analyzes GitHub stargazers, traces them to their companies, and evaluates these companies as potential sales targets for AI scraping infrastructure.
- GitHub Stargazer Analysis: Fetches and analyzes users who starred a repository
- Company Identification: Traces GitHub users to their affiliated companies
- Intelligent Evaluation: Scores companies based on size, industry, and technology fit
- Web Scraping: Uses ScrapeGraphAI API to gather additional company information
- Beautiful UI: Streamlit-based interface for easy interaction
- Export Results: Download analysis results as CSV for further processing
- Python 3.10 or higher
- API keys for GitHub, OpenRouter, and ScrapeGraphAI
- Clone the repository:
git clone https://github.com/yourusername/ScrapeHubAI.git
cd ScrapeHubAI
- Install dependencies:
pip install -r requirements.txt
- Configure API keys:
- Go to GitHub β Settings β Developer settings β Personal access tokens β Tokens (classic)
- Click "Generate new token" β "Generate new token (classic)"
- Give your token a descriptive name (e.g., "ScrapeHub")
- Select the following scopes:
public_repo
(required for reading public repositories)read:user
(optional, for better user information)read:org
(optional, for organization information)
- Click "Generate token" at the bottom
- Important: Copy your token immediately - you won't be able to see it again!
Create a .env
file in the project root:
GITHUB_TOKEN=your_github_personal_access_token
OPENROUTER_API_KEY=your_openrouter_api_key
OPENROUTER_API_BASE=https://openrouter.ai/api/v1
SGAI_API_KEY=your_scrapegraphai_api_key
Or copy from the example:
cp .env.example .env
# Then edit .env with your actual API keys
- OpenRouter: Sign up at OpenRouter.ai and get your API key from the dashboard
- ScrapeGraphAI: Register at dashboard.scrapegraphai.com and obtain your API key
- Run the application:
streamlit run src/app.py
- Fetch Stargazers: The agent retrieves users who starred the specified GitHub repository
- Trace to Companies: Identifies companies through user profiles and organization memberships
- Gather Intelligence: Uses ScrapeGraphAI to scrape additional company information
- Evaluate & Rank: Scores companies based on multiple criteria:
- Technology relevance (AI, ML, data analytics, scraping)
- Industry fit (e-commerce, SaaS, fintech)
- Company size and growth indicators
- Explicit data processing needs
- Sales Intelligence: Identify potential customers for AI/data infrastructure products
- Market Research: Understand which companies are interested in specific technologies
- Partnership Discovery: Find companies with complementary technology needs
- Competitive Analysis: See which companies are following competitor repositories
- LangGraph: Orchestrates the multi-step analysis workflow
- OpenRouter: Provides LLM capabilities for intelligent evaluation
- ScrapeGraphAI: Powers web scraping for company information
- Streamlit: Creates an intuitive user interface
github-star-to-company-agent/
βββ src/
β βββ agent.py # LangGraph workflow definition
β βββ tools.py # GitHub and scraping tools
β βββ evaluator.py # Company scoring logic
β βββ app.py # Streamlit UI
βββ tests/
β βββ test_agent.py # Unit tests
βββ docs/
β βββ usage.md # Detailed usage guide
βββ .env # API keys (create this)
βββ requirements.txt # Python dependencies
βββ README.md # This file
Run the test suite:
python -m pytest tests/
GITHUB_TOKEN
: Personal access token for GitHub APIOPENROUTER_API_KEY
: API key for OpenRouter LLM serviceOPENROUTER_API_BASE
: OpenRouter API endpoint (default: https://openrouter.ai/api/v1)SGAI_API_KEY
: API key for ScrapeGraphAI service
Customize analysis parameters through the Streamlit UI:
- Maximum number of companies to display
- Minimum score threshold for filtering
- Analysis depth and timeout settings
- Respect Rate Limits: The agent implements automatic rate limiting
- Privacy: Only analyzes publicly available information
- Terms of Service: Ensure compliance with all platform ToS
- Responsible Use: Designed for legitimate business intelligence only
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with LangGraph for agent orchestration
- Powered by ScrapeGraphAI for web scraping
- UI created with Streamlit
- LLM capabilities via OpenRouter
For issues, questions, or suggestions:
- Check the usage guide
- Open an issue on GitHub
- Review existing issues for solutions
Made with β€οΈ from ScrapeGraphAI team