This project contains the source code for an AI-powered evaluator built using Langflow to prioritize GitLab self-hosted servers based on their likelihood of exposing sensitive data. The tool helps identify and scan the most critical GitLab servers by scoring them based on factors such as company size, repository activity, and data sensitivity.
GitLab self-hosted servers, by default, expose an "explore" endpoint that allows anyone on the internet to access public data, including sensitive resources like projects and groups. In our research, we identified 30,000 public GitLab servers and faced the challenge of prioritizing which servers to scan for exposed secrets. This evaluator, built with Langflow, uses AI to score and prioritize servers based on relevant criteria.
This project uses Langflow, an AI workflow platform, to create an intelligent system that:
- Identifies the company behind each GitLab server based on its URL.
- Evaluates the server's repository activity (e.g., forks, commits, last updated dates).
- Scores the server based on the likelihood of containing sensitive data.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
langflow run
- Open Langflow, click Create New Project
- Click Import and select the
langflow_components/Gitlab Server Evaluator.json
file.
- Generate an API key at platform.openai.com.
- Add the API key in Langflow settings.
- Click API in Langflow and copy the endpoint's URL (e.g.,
http://127.0.0.1:7860/api/v1/run/<workflow-id>?stream=false
).
python utils/query_langflow.py -u http://127.0.0.1:7860/api/v1/run/<workflow-id>?stream=false \
-i gitlab_server_urls.txt -o outputs
python utils/calculate_domain_score.py -i outputs/langflow_query_results.json -o outputs/prioritized_gitlab_servers.json
Open outputs/prioritized_gitlab_servers.json
to view the prioritized GitLab servers based on their scores.