Block Recurrent Transformer

A PyTorch implementation of Hutchins & Schlag et al.. Owes very much to Phil Wang's x-transformers. Very much in-progress.

Dockerfile, requirements.txt, and environment.yaml because I love chaos.

Differences from the Paper (as of 2022/05/04)

Keys and values are not shared between the "vertical" and "horizontal" directions (the standard input -> output information flow and the recurrent state flow, respectively).
The state vectors are augmented with Rotary Embeddings for positional encoding, instead of using learned embeddings.
The special LSTM gate initialization is not yet implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
block_recurrent_transformer		block_recurrent_transformer
configs		configs
data		data
images		images
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py