An Inverted Indexer written in Python
This creates an Inverted Index for the given Corpora. Inverted Index is a mapping of content (Words, Numbers etc) to its location in a document(s) and is used for fast full text searching. This python script creates a word-level inverted index.
To get a local copy up and running follow these simple steps.
- Clone the repo
git clone https://github.com/saeenyoda/Inverted_Indexing.git
- Install Requirements
pip3 install -r requirements.txt
- Open up command line or terminal and navigate to the cloned repo's directory
cd "PATH-TO-DIRECTORY"
- Run the indexer.py file (use python if you have created it as an alias for python3)
python3 indexer.py
This will present you with the following Menu Screen:
Now enter a number corresponding to the given Menu Options:
- Search Only: If you have already created the inverted index, you can simply search.
- Rebuild Index and Search: If you want to rebuild index, or create it for the first time, and then search. This will ask for a path to the corpus (sample corpora provided).
- Exit: Simply Exit the program.
NOTE:
The Corpora can have subdirectories, path in Menu Option 2 has to be for the root directory. Each Subdirectory will be merged once preprocessed and individual files have been created.
- When a subdirectory is being preprocessed
- When the inverted index for that subdirectory is being stored
- Asks for Query
- Shows documents found, time taken and document names
Distributed under the MIT License. See LICENSE
for more information.