Skip to content

Orion Crawler is a powerful hidden web crawling tool built with Docker Compose, designed for secure and anonymous web scraping. It supports two variants: a orion crawler for broad data collection and a specific crawler using custom parsers for targeted, fine-tuned crawling.

Notifications You must be signed in to change notification settings

msmannan00/Orion-Crawler

Repository files navigation

Codacy Badge CodeQL Analysis MDN HTTP Observatory Security Headers SSLLabs PageSpeed Insights

homepage

Orion Platform


Orion Platform is a comprehensive, web-based solution that combines the functionality of a browser, search engine, crawler, and data aggregation tools to empower OSINT (Open Source Intelligence) experts. Built on top of Docker, Orion provides a user-friendly interface to explore, search, and visualize data extracted by its powerful Orion Crawler.

The platform integrates seamlessly with machine learning models, enhancing search relevance and enabling advanced content analysis. Orion supports a broad range of functionalities, including the ability to search, filter, and visualize data across multiple categories, making it an invaluable tool for data exploration and intelligence gathering.

Designed with flexibility and scalability in mind, Orion enables OSINT experts to feed data directly into the platform, ensuring up-to-date and comprehensive datasets. Whether for investigative research, competitive analysis, or general information gathering, Orion provides a unified ecosystem that enhances the workflow of professionals who rely on actionable insights.

1. Repository Quality and Build Status

Repository Codacy CodeQL MDN HTTP Observatory Security Headers SSLLabs
Orion Search Codacy Badge CodeQL Status Status Status
Orion Crawler Codacy Badge CodeQL - - -
Orion Collector Codacy Badge CodeQL - - -
Globaleaks Canary Codacy Badge - - - -
Orion Browser Codacy Badge CodeQL - - -

2. Technology Stack

The Orion platform is built using various technologies to provide optimal search capabilities and data handling. Below is the list of libraries and frameworks used:

MongoDB Redis Celery Python Tor Traefik elastic java kotlin

3. Associated Repositories

Repository Description Stats
Orion Browser A harvester-based browser used to scrape data as you browse. Stars Forks
Orion Crawler Used for monitoring and continuously crawling the hidden web. Stars Forks
Orion Search A platform to visualize extracted data. Stars Forks
Orion Collector Simplifies the task of creating custom crawling scripts for multiple websites. Stars Forks
Globaleaks Canary A tool for passive intelligence and whistleblowing. Stars Forks

4. Data Extraction Techniques

This is a comprehensive flow diagram illustrating the functioning of the multithreaded crawler. It outlines the entire process, from initializing threads and managing task distribution to efficiently retrieving and processing data from multiple sources concurrently. The diagram highlights key components, such as task queues, thread synchronization mechanisms, and data handling workflows, providing a clear and detailed representation of the crawler's architecture and operational flow

image(1)

5. Deep Data Linting Roadmap

This document outlines the proposed solution and future roadmap for deep data linting, focusing on integrating insights from multiple sources into a unified platform. The solution emphasizes advanced data validation, cross-source correlation, and seamless integration to ensure comprehensive data quality checks. The roadmap highlights phased development, scalability enhancements, and feature expansions aimed at providing a robust and centralized approach to data insight and linting

linting(2)

6. Browser Support

Orion Browser is an Android application designed to provide a secure, private browsing experience by leveraging onion routing technology. This browser empowers users to access hidden web content anonymously, unblock restricted sites, and browse freely while safeguarding their online identity.

JPJ pdf

🌟Contribution

We welcome contributions to improve Orion Search. If you'd like to contribute, please fork the repository and submit a pull request.

Steps to Contribute

  1. Fork the repository.
  2. Create a new feature branch (git checkout -b feature-branch).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature-branch).
  5. Create a new Pull Request.

License

Orion Search is licensed under the MIT License.

Disclaimer

This project is intended for research purposes only. The authors of Orion Search do not support or endorse illegal activities, and users of this project are responsible for ensuring their actions comply with the law.

GitHub Repository

GitHub Repository URL: https://github.com/msmannan00/Orion-Search.git

Project Information

https://www.canva.com/design/DAF8Sa8KkDE/1H8z3RVausdHIMcE98Kvfg/edit

About

Orion Crawler is a powerful hidden web crawling tool built with Docker Compose, designed for secure and anonymous web scraping. It supports two variants: a orion crawler for broad data collection and a specific crawler using custom parsers for targeted, fine-tuned crawling.

Resources

Security policy

Stars

Watchers

Forks

Packages

No packages published