Pytorch implementation of various attention-based language models

Purpose

Implements attention based language models in a unified structure, assuring code accuracy. Any PRs are warmly welcomed!

How to run

Set up env

pip3 install -r requirements.txt

Prepare dataset

First download related spacy language model

python -m spacy download en
python -m spacy download de

Then, download related datasets of each attention model. Currently, transformer, bert are supported.

sh download.sh ${model}

Run language model

You can run implemented language models using attention/run_attention_model.py file. Example of training transformer model is given below.

Detailed parameters of each models are given in next section.

python3 transformer/run_attention_model.py \
	--language_model transformer \
	--data_pkl .data/multi30k/m30k_deen_shr.pkl \
	--d_model 512 \
	--d_word_vec 512 \
	--d_inner_hid 2048 \
	--d_k 64 \
	--d_v 64 \
	--n_head 8 \
	--n_layers 6 \
	--batch_size 256 \
	--embs_share_weight \
	--proj_share_weight \
	--label_smoothing \
	--output_dir output \
	--no_cuda \
	--n_warmup_steps 128000 \
	--epoch 400

Currently, transformer, bert models are supported. To see the related arguments of each model, see below.

List of implemented models and related parameters

Models	Referred content	Run example
Transformer	link	link
Bert	link	link

Transformer

Should set --language_model parameter as transformer.

python3 attention/run_attention_model.py \
	--language_model transformer \
	--data_pkl .data/multi30k/m30k_deen_shr.pkl \
	--d_model 512 \
	--d_word_vec 512 \
	--d_inner_hid 2048 \
	--d_k 64 \
	--d_v 64 \
	--n_head 8 \
	--n_layers 6 \
	--batch_size 256 \
	--embs_share_weight \
	--proj_share_weight \
	--label_smoothing \
	--output_dir output \
	--no_cuda \
	--n_warmup_steps 128000 \
	--epoch 400

Parameter explanations

Parameter name	Explanation
`language_model`	Name of `transformer`
`data_pkl`	Directory of preprocessed pickle file
`d_model`	Projection dimension of `q,k,v`
`d_word_vec`	Dimension of word vectors
`d_inner_hid`	Inner dimension when `feedforward network` is done
`d_k`	`d_model=d_k * n_head`
`d_v`	`d_model=d_v * n_head`
`n_head`	Number of multi-head attention
`batch_size`	Size of a batch
`embs_share_weight`	Whether sharing embedding weight between source, target vocab or not. If set, both of vocab sizes will become same
`proj_share_weight`	Whether sharing embedding weight between target vocab and final projection vocab embedding
`label_smoothing`	Whether smoothing when calculating cross entropy
`output_dir`	Directory of model history, check point
`no_cuda`	Whether use cuda or not
`n_warmup_steps`	Warmup steps before training
`epoch`	number of epochs

Bert

Should set --language_model parameter as Bert.

python3 attention/run_attention_model.py \
    --language_model bert \
    --d_model 512 \
    --d_word_vec 512 \
    --d_inner_hid 2048 \
    --d_k 64 \
    --d_v 64 \
    --n_head 8 \
    --n_layers 6 \
    --batch_size 256 \
    --output_dir output \
    --no_cuda \
    --epoch 400 \
    --movie_conversations ./data/movie_conversations.txt \
    --movie_lines ./data/movie_lines.txt \
    --raw_text ./data \
    --output ./data

Parameter explanations

Parameter name	Explanation
`language_model`	Name of `bert`
`d_model`	Projection dimension of `q,k,v`
`d_word_vec`	Dimension of word vectors
`d_inner_hid`	Inner dimension when `feedforward network` is done
`d_k`	`d_model=d_k * n_head`
`d_v`	`d_model=d_v * n_head`
`n_head`	Number of multi-head attention
`batch_size`	Size of a batch
`no_cuda`	Whether use cuda or not
`n_warmup_steps`	Warmup steps before training
`epoch`	number of epochs
`movie_conversations`	Directory where `movie_conversations.txt` exists
`movie_lines`	Directory where `movie_lines.txt` exists
`raw_text`	Directory where preprocessed text will be saved
`output`	Directory where results will be saved

How to run pytest

pytest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pytorch implementation of various attention-based language models

Purpose

How to run

Set up env

Prepare dataset

Run language model

List of implemented models and related parameters

Transformer

Bert

How to run pytest

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github		.github
attention		attention
test		test
.gitignore		.gitignore
README.md		README.md
download.sh		download.sh
requirements.txt		requirements.txt

bohyunshin/attention

Folders and files

Latest commit

History

Repository files navigation

Pytorch implementation of various attention-based language models

Purpose

How to run

Set up env

Prepare dataset

Run language model

List of implemented models and related parameters

Transformer

Bert

How to run pytest

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages