Author: Guy Rosin (guyrosin@cs.technion.ac.il)
This repository provides the data and implementation of the paper:
Learning Word Relatedness over Time
Guy D. Rosin, Eytan Adar and Kira Radinsky
EMNLP 2017
https://arxiv.org/abs/1707.08081
The main folder contains:
- code for creating word embeddings using word2vec, either from a single corpus (
word2vec_model_alltime.py
), or from a temporal corpus (models_builder.py
) - framework for running and evaluating various types of ML classifiers (
classifier.py
) - a peak detection algorithm that we used (
peak_detection.py
)
search
contains code for temporal query expansion, in particular:
- searching the New York Times archive, using Apache Solr, and evaluating search results (
temporal_search.py
) - performing temporal query expansion. The query can be either a single entity (
qe_single_entity.py
) or multiple entities (qe_multiple_entities.py
)
- Relations, in the format of: <entity1, entity2, start_year, end_year, relation_type>
- Binary relations that were generated from the relations file, in the format of: <entity1, entity2, year, true/false>
- Python 3.5
- gensim
- spacy
- sklearn
- numpy
- scikit-learn
- scipy
- pysolr
- unidecode
- matplotlib
- gensim