This repository provides the data and implementation of the paper:
Event-Driven Query Expansion
Guy D. Rosin, Ido Guy, and Kira Radinsky
WSDM 2021
Preprint: https://arxiv.org/abs/2012.12065
- Install Terrier.
- Obtain a TREC dataset (e.g., Robust04) and index it using Terrier.
- Download the Wikipedia2vec model (see the full list of models here).
- Obtain a collection of temporal word2vec models (e.g., from the New York Times).
- Run
event_projection.py
to enrich the temporal embeddings with events from Wikipedia. - Run
trec_search.py
to perform retrieval with or without query expansion and evaluate, after setting the relevant parameters (model paths, dataset, QE method, etc.).
The word embeddings used in the paper can be downloaded from here.
The embeddings were created using Gensim. Each model is based on data from a single year of the New York Times, and is then enriched with events from Wikipedia. There are 38 models in total, for the years of 1981-2018.
- Python 3.7
- trectools (custom version: https://github.com/guyrosin/trectools)
- Terrier 5.1 (http://terrier.org)
- numpy
- scipy
- scikit-learn
- nltk
- gensim 3.8
- pandas
- tqdm
- sqlitedict