RelaxSearch is a lightweight, demonstration-based search engine project built with Go and Elasticsearch. It comprises two main components:
- RelaxEngine: A web scraper using cron jobs to periodically crawl and index content.
- RelaxWeb: An API server that enables full-text search capabilities on the indexed data.
This project showcases the essentials of a search engine by scraping content from specified URLs and storing them in Elasticsearch, where they are accessible for keyword-based searches.
Inspire by the Full Document Search in go: Blog Here
Also Inspired by the Web Crawler in Cpp
- Function: A web crawler that scrapes and indexes content from the web. It runs periodically via cron jobs.
- Technology: Go and Elasticsearch.
- Details: RelaxEngine reads a list of seed URLs, crawls the web starting from these URLs, and indexes the retrieved content in Elasticsearch.
- Function: Provides a REST API to search the indexed content.
- Technology: Go and Elasticsearch.
- API Endpoint:
/search
- Accepts a keyword query and returns relevant content with optional pagination, date filtering, and content highlighting.
- Golang: v1.16 or higher.
- Docker: For containerized deployment.
- Elasticsearch: v8.6.0 (configured in Docker Compose).
-
Clone the repository:
git clone https://github.com/Ravikisha/RelaxSearch.git cd RelaxSearch
-
Configure
.env
files:- RelaxEngine and RelaxWeb use configuration files for connecting to Elasticsearch and defining scraping parameters.
- Set up your Elasticsearch credentials in the
.env
files under each service as required.
-
Build and Run with Docker Compose:
docker-compose up --build
- This will start the Elasticsearch service, RelaxEngine (scraping in background), and RelaxWeb (serving the search API on port 7000).
- Endpoint:
GET /search
- Parameters:
keyword
(required): The term to search for.from
(optional): Pagination start index.size
(optional): Number of results to return (default: 10).dateRangeStart
anddateRangeEnd
(optional): Filter results based on timestamp.
Example using curl
:
curl -X GET "http://localhost:7000/search?keyword=example" -H "Authorization: Basic <base64_credentials>"
RelaxSearch/
├── relaxengine/ # Crawler and indexing component
│ ├── cmd/ # Main application for crawling
│ ├── config/ # Configuration files
│ └── crawler/ # Crawler logic and utilities
└── relaxweb/ # Search API server
├── config/ # Configuration files
└── search/ # Search functionality and API
- Error handling: Current error handling can be improved for more robust operation.
- Scalability: Limited to demonstration use; suitable for small to medium datasets but you scale with the elasticsearch nodes.
This project is open-source and available under the MIT License.