Lightweight Text Search Engine written in Python
Prepare a JSON file which contains lines of "id" and "text".
For example, stn/search-backend-game has made such corpus. You can use it.
$ git clone https://github.com/stn/search-benchmark-game.git
$ cd search-benchmark-game
$ make corpus
This will result in a corpus file corpus.json, which is about 8GB. The corpus has more than 5 million documents, but it is too large for our development, so we will extract only the first some lines.
$ head -n 100 corpus.json > corpus100.json
To run a sample script,
$ python -m pysearchlite.commands.main < corpus100.json
To run search-backend-game,
# Go to the search-benchmark-game dir.
# assume it's next of this repo.
$ cd ../search-benchmark-game
$ make index
$ make bench
$ make serve