Automatic Sense Disambiguation of Potentially Idiomatic Expressions

This is the source code for a system to automatically disambiguate potentially idiomatic expressions (PIEs, for short) in text. It implements four methods of doing so: a baseline most-frequent-sense method, a baseline canonical form-based method (Fazly et al., 2009), a lexical cohesion graph-based method (Sporleder & Li, 2009), and a variation on that method using literal representations of idioms' figurative senses. It evaluates those methods on a combination of four corpora, the VNC-Tokens corpus, the IDIX corpus, the PIE Corpus, and the SemEval-2013 Task 5b dataset. For a detailed description of the systems, see our LAW-MWE-CxG paper.

Requirements

To run this code, you'll need the following Python setup:

Python 2.7.6
beautifulsoup4 4.5.1
numpy 1.14.0
scipy 0.19.1
spacy 2.0.6 + en_core_web_sm 2.0.0

Different versions might work just as well, but cannot be guaranteed.

You'll also need:

the British National Corpus
the GloVe embeddings
the VNC-Tokens Dataset
the IDIX Corpus
the PIE Corpus
the SemEval-2013 Task 5b Dataset

Getting Started

Clone the repository
Create subdirectories called working and ext
Add these symlinks (or edit config.py):
- create a symlink ext/BNC to the Texts directory of your copy of the BNC
- create a symlink ext/glove to the directory containing the GloVe embeddings
- create symlinks ext/VNC, ext/IDIX, ext/PIE_Corpus, and ext/SemEval to the main directory of the respective corpora
Try and run the system with python psd.py -c 0 -m cg -gs 0s. This should run a basic lexical cohesion graph method and evaluate on the development set of the combined corpora.
Get an overview of all options by simply running python psd.py --help

Contact

For any questions about (running) the system, feel free to contact me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automatic Sense Disambiguation of Potentially Idiomatic Expressions

Requirements

Getting Started

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automatic Sense Disambiguation of Potentially Idiomatic Expressions

Requirements

Getting Started

Contact