The program is a search engine that works on boolean queries and is based on automobile and property policy papers. It also has an option to filter search results by automobile or property tags.
This program is part of an assignment for CS F469 - Information Retrieval course of BITS Pilani, Hyderabad Campus.
- Install all the Python libraries required:
pip install -r requirements.txt
. - Run the GUI using the command:
streamlit run GUI.py
. - Search for any term and select the appropriate filter to get search results.
NOTE: Please delete the files in inverted_index
and affixes
directories and run the pre-processsing step to ensure accurate index and presence of all indexed files for the first run.
Operator Type | Operator | Example |
---|---|---|
Boolean | AND | example AND demo |
Boolean | OR | example OR demo |
Boolean | ! (NOT) | !(example) |
Phrase | " " | "example demo" |
Sentence | \s | exmaple \s demo |
Proximity | \d | exmaple \2 demo |
Wildcard | * (pre) | exm* |
Wildcard | * (suffix) | *ple |
Wildcard | * (sub) | exm*ple |
Complex | example AND !(demo) |
- Add the parsed json file into the respective sub-directory of
parsed_json
directory. - Run the command:
python preprocessing.py
orpython3 preprocessing.py
.
The parsed json file for a document should have the format:
{
"title" : "demo",
"sections" :
[
{
"section_heading" : "demo",
"paragraphs":
[
"demo",
"demo"
]
},
{
"section_heading" : "demo",
"paragraphs":
[
"demo",
"demo"
]
}
]
}