Improving the effectiveness of keyword search on relational data using effective retrieval subsets.
- Cache Enhancement: This project gives the capability of indexing all documents in a directory and its subdirectories, filtering the parts of the generated index that satisfy a specific condition, and measuring query difficulty metrics against the selected part of the index. Main modules are:
- build.py: used to build an index based on a directory of documents;
- partition.py: used to build a virtual partition on top of the main index;
- querydifficulty.py: our main query difficulty metrics;
- enhancer/describe.py: used to compare two different virtual partitions, treating them as two giant documents;
- enhancer/solutions.py: used to recursively refine a virtual partition by removing documents from it to increase its difference from another base partition.
- MSLR: applies Learn-to-rank on the MSLR dataset;
- rrank-analysis: effect of cache size on the reciprocal ranks;
- ML-evaluate: evaluates machine learninng models as the cache selection algorithm;
- ML-prepare
- Cluster Analysis
- text-classification
- wiki13
- wikipagecount