Database-Fulltext-Search

Improving the effectiveness of keyword search on relational data using effective retrieval subsets.

Project Structure:

Cache Enhancement: This project gives the capability of indexing all documents in a directory and its subdirectories, filtering the parts of the generated index that satisfy a specific condition, and measuring query difficulty metrics against the selected part of the index. Main modules are:
- build.py: used to build an index based on a directory of documents;
- partition.py: used to build a virtual partition on top of the main index;
- querydifficulty.py: our main query difficulty metrics;
- enhancer/describe.py: used to compare two different virtual partitions, treating them as two giant documents;
- enhancer/solutions.py: used to recursively refine a virtual partition by removing documents from it to increase its difference from another base partition.
MSLR: applies Learn-to-rank on the MSLR dataset;
rrank-analysis: effect of cache size on the reciprocal ranks;
ML-evaluate: evaluates machine learninng models as the cache selection algorithm;
ML-prepare
Cluster Analysis
text-classification
wiki13
wikipagecount

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
cache_enhancement		cache_enhancement
data_analysis		data_analysis
.gitignore		.gitignore
README.md		README.md
SQL notes.sql		SQL notes.sql
requirements.txt		requirements.txt