SimpleElastic

A simple elasticsearch like search engine of your own.

Before starting on this assignment you will need to do a little bit of reading (unless of course, if you are already familiar with these).

Get a context on following things

full text search
indexing (inverted index)
term frequency (aka TF)
inverse document frequency (aka IDF)
REST

Follow the links below to give it a read. (You are free to explore on your own too)

quick elasticsearch tutorial
inverted index
TF & IDF
getting started with REST

The Assignment

Create a lightweight search engine which

indexes documents into inverted index and saves them on disk for searching on them later. You have to design your own algorithm and system to store and cannot use any existing database system
allows searching on previously indexed items; search works as follows: 1) break every search query into different terms (words/token) 2) look up every term on inverted index and build the result set 3) modify the rank of results using TF (Extra points if you also implement IDF)
exposes 2 REST API endpoints; one for indexing data and another for searching
along with normal search, it should also have functionality to do phrase queries. A phrase query does a search based on the phrases instead of individual term based search. ie "fox brown" should match documents with fox and brown that appears together
bonus point: Allow mentioning which field to be used for searching, ie if the document contains title and content, then you must be able to specify that query the string only in title field.
bonus point: add caching for the phrase based queries

While designing this, you need to think about how would this search engine scale if you had million of documents to index and search. Think about how would you structure and store the document in file system, also think about how would you keep it in memory and access it at the time of querying. You might also need to think about how would you make this system distributed, ie having multiple machines working together to index and search.

Example

An api endpoint for indexing

POST /index
{
"id": "1",
"title": "quick fox",
"data": "A fox is usually quick and brown."
}

200 OK

POST /index
{
"id": "2",
"title": "lazy dog"
"data": "A quick brown fox jumped over lazy dog. A fox is always jumping."
}

200 OK

An api endpoint for search

GET /search?q=quick%20fox

[
{
    "id": "1",
    "title": "quick fox",
    "data": "A fox is usually quick and brown."
},
{
    "id": "2",
    "title": "lazy dog"
    "data": "A quick brown fox jumped over lazy dog. A fox is always jumping."
}]


GET /search?q=dog

[
{
    "id": "2",
    "title": "lazy dog"
    "data": "A quick brown fox jumped over lazy dog. A fox is always jumping."
}]


GET /search?q=quick%20dog

[
{
    "id": "2",
    "title": "lazy dog"
    "data": "A quick brown fox jumped over lazy dog. A fox is always jumping."
},
{
    "id": "1",
    "title": "quick fox",
    "data": "A fox is usually quick and brown."
}]

Directions

you can use any programming language of your choice
try and complete the assignment in given time. It's totally fine if it's not complete. We would love to see the assignment anyway.
keeping above point in mind, attempt the tasks in the order that they are mentioned.
you should design your solution keeping scalability in mind, a search engine is usually scalable to terrabytes of data
call us if you have any trouble understading anything

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mini_elastic.md

mini_elastic.md

SimpleElastic

The Assignment

Example

Directions

Files

mini_elastic.md

Latest commit

History

mini_elastic.md

File metadata and controls

SimpleElastic

The Assignment

Example

Directions