RNNG Notebook

About

These notes and accompanying code were created as a presentation aid for the paper Recurrent Neural Network Grammars, Dyer et al. 2016, at the Berlin Machine Learning seminar.

The code in RNNG.py is a reimplementation of Dyer et al. using Python bindings to DyNet, and borrows heavily from two sources:

The original RNNG code, implemented in C++
The Python implementation of the stack LSTM parser from Graham Neubig's NN4NLP course

Both are released under the Apache license v2.0, as is this work.

Installing

Dependencies in this project are managed with Pipenv - follow link to directions on how to install.

Once you have it, run to install dependencies:

pipenv install

Launch environment shell to run subsequent code steps:

pipenv shell

Getting + preparing data

The data used for this notebook is the NLTK release of ~10% of the Penn Treebank (Marcus et al., 1994). To get the data, download the file from the NLTK data repo and unzip it in the directory data within this repo.

To get the treebank data in the necessary format and divide into train/dev/test sets, run:

python split_training_data.py

See source code to adjust filepaths, relative size of training / dev sets, etc.

To get the oracle data sets:

python get_oracle_gen.py data/train.ptb data/train.ptb > data/train.oracle
python get_oracle_gen.py data/train.ptb data/dev.ptb > data/dev.oracle
python get_oracle_gen.py data/train.ptb data/test.ptb > data/test.oracle

To get the Brown clusters used to support word generation (generated as described in Koo et al. 2008), download them and unzip in the data directory.

Notebook

Follow these instructions to install the jupyter notebook kernel within your virtual environment (after calling pipenv shell):

python -m ipykernel install --user --name=<rnng-notebook-[your local environment hash]>

Then launch the notebook server from within the shell:

jupyter notebook

Within the notebook, use the Kernel > Change kernel menu to use the kernel local to your virtual environment.

License

This software is released under the Apache license v2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RNNG Notebook

About

Installing

Getting + preparing data

Notebook

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

RNNG Notebook

About

Installing

Getting + preparing data

Notebook

License