Skip to content

SpringBoot App to Scrape Content From Various Online Platforms


Notifications You must be signed in to change notification settings


Repository files navigation

GitHub stars GitHub Forks License PRs Welcome Last Commit Powered by Spring Boot Runs on Docker Wiki


web-scrybe is an open-source SERP (Search Engine Results Page) scraping and social media web scraping tool built using Spring Boot. It provides a simple and efficient way to extract data from websites and integrate it into your applications.


  • Available SERP scraping Integrations: Currently, Google, Bing and DuckDuckGo are supported. More features will be added subsequently.
  • Available Social Media Scraping Integrations: Reddit Hot Topic. More features will be added subsequently.
  • Easy-to-use API: Developers can quickly integrate web scraping functionality into their applications using the provided API.
  • Scalable and Distributed: The application is designed to be highly scalable and can be deployed in a distributed environment using Docker.
  • Unlimited Scraping: The automation-based approach doesn't have the same rate limits or charges as Paid APIs, allowing users to scrape data at scale without facing throttling or downtime.
  • Docker and Docker Compose Support: web-scrybe is designed to be easily deployable and scalable using Docker and Docker Compose. This makes it an ideal choice for quick prototyping, development, and testing environment deployments.

NB: Search engines may have strict anti-scraping measures, read the terms and conditions before using SERP scraping in production. Social Media web scraping is implemented using official APIs making them good to use in any environment

Getting Started


  • Java 21 or higher
  • Docker (optional, for running the application in a container)
  • Reddit Account (optional)


  1. Clone the repository:

    [git clone](
  2. Navigate to the project directory

  3. Add Following Environment Variables (Not required if not planning to use Reddit API) REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, USER_AGENT

    Environment variables can be set in runtime configurations, yaml or property files.

    In Visual Studio Code, environment variables can be set inside launch.json of .vscode folder as shown below,

        "type": "java",
        "name": "Launch Java Program",
        "request": "launch",
        "mainClass": "com.r7b7.webscrybe.WebSearchApplication",
        "env": {
            "USER_AGENT": "<YOUR_USER_AGENT>"

Alternatively, Keys can be set during runtime as well - See Step 5, #2 for details.

Reddit API keys can be generated from -> Create Another app Button -> Fill Details -> Copy generated key and secret. User-Agent Header can be set to a unique string similar to "redditdev scraper by x/MyRedditId"

  1. Build App Using Maven:
    mvn clean package
  2. Run App (Application can be run using any of the following options)
    1. In IDE - Use Run Option
    2. From Terminal
      • export Environment variables if not done already
        export USER_AGENT="<YOUR_USER_AGENT>"
      • Run app using mvn
        mvn spring-boot:run
      • Alternatively, Run app as a jar . From the target directory within the project folder, run the following command
        target % java -jar web-scrybe-0.0.1-SNAPSHOT.jar
    3. In Docker
      • Run the following commands in terminal
        docker build -t web-scrybe .
        docker-compose up -d
      • Check logs
        docker-compose logs -f
  3. Available APIs
    1. Reddit Search API
    2. Google Search API
    3. Bing Search API
  4. Swagger Documentation
  5. Stop Application
    1. Terminal Use Ctrl + C to stop a running application

    2. Docker

      1. To stop container, use the following command
        docker-compose down
      2. To remove container and image
        docker rm -f <container-name>
        docker rmi <image-name>
      3. If you don't know the container ID or name, list all running containers using:
        docker ps

Contributing Guidelines

Contributions are welcome from the open-source community! Please read the contributing guidelines to get started.

To maintain code quality and ensure stability, please follow these guidelines before submitting a Pull Request (PR).

  1. Fork and Clone the Repository Fork the repository on GitHub. Clone your fork locally

    git clone
  2. Code Formatting Ensure your code follows the standard Java code style. Use the built-in code formatting feature of your IDE (IntelliJ IDEA, Eclipse, or VS Code).

  3. Run Tests Locally Make sure that all existing tests pass before submitting your PR:

    ./mvnw test
  4. Run SonarQube Analysis (To run Sonarqube in local, please read the section - "Run Sonarqube Locally") Run SonarQube locally to check for code quality issues and ensure the code meets the standards.

        mvn clean verify sonar:sonar -Dsonar.login=<YOUR_TOKEN>
  5. Ensure the build is stable

  6. Commit Message Guidelines Use meaningful and concise commit messages. Follow this format:

    [Type]: Brief description
    Types can include:
    feat: New feature
    fix: Bug fix
    docs: Documentation changes
    test: Adding or updating tests
    refactor: Code refactoring
  7. Creating a Pull Request Before creating a Pull Request, ensure that your branch is up to date with the main branch:

  8. Code Review Process The PR will be reviewed by maintainers and other contributors. Please be patient and respond to any requested changes.

Run Sonarqube Locally

  1. Install Docker
  2. Run the following command
    docker run -d --name sonarqube -p 9000:9000 sonarqube
  3. Sonarqube server should be up and running at - http://localhost:9000
  4. To Stop server, use this command
    docker stop sonarqube
  5. To run the sonarqube locally, you would need to pass a login token for authentication. Follow these steps to generate the token from your local sonarqube server
    1. Go to your SonarQube instance at http://localhost:9000.
    2. Log in with your admin credentials (admin/admin by default).
    3. Navigate to My Account > Security > Generate Tokens.
    4. Enter a token name (e.g., my-project-token) and click Generate.


If you encounter any issues, please open a discussion or create an issue. We're here to help!


SpringBoot App to Scrape Content From Various Online Platforms







No releases published


No packages published